Hello! I'm Yu-Chun
I'm a new and young Data Scientist ready to contribute to team success with skills and hard work. I've graduated from UC Berkeley with MEng. Electrical Engineering & Computer Science, and from UC San Diego with B.S. Data Science and B.S. Management Science. Not only am I skilled in Python programming, statistical theories, and machine learning model development and evaluation, but I'm also an effective communicator!
YU-CHUN CHEN
DATA SCIENTIST & ML ENGINEER
Phone:
714-307-5095
Email:
Location:
Berkeley, CA
GitHub:
EXPERIENCE
2018-2020
Instructional Assistant
University of California, San Diego
As a tutor for several Data Science classes, I supported my supervising instructors on course planning, performed group and individual tutoring, and maintained student records. I assisted students in their thought process and helped build their problem-solving skills.
2019 Jul-Aug
Database Engineer Intern
China Construction Bank Fintech
I developed PySpark programs that analyze over 500,000 pieces of payment data from Alipay daily, filter out fraud payment, and detect possible money laundering.
2018 Aug-Sep
Data Scientist Intern
Cornupay
I designed and completed a project from scratch after learning the company's services and operations. The results are predictions of money amount loaned by platform users with machine learning tools, including Linear Regression, Logistic Classification.
EDUCATION
2020-2021
Master of Engineering
University of California, Berkeley
This spring, I graduated from CAL with a MEng degree in Electrical Engineering & Computer Science, specialized in Data Science and Systems. Courses I took include Machine Learning, Optimization Models in Engineering, Deep Learning, etc.
2016-2020
Bachelor of Science
University of California, San Diego
I graduated from my undergraduate institution with double major degrees in Data Science and Management Science. Courses I took for Data Science include AI Stat Approach, Recommender Sys & Web Mining, Data Visualization, System for Scalable Analytics, etc., while courses I took for Management Science include Operation Research, Corporate Finance, Decisions Under Uncertainty, etc.
SKILLS
Pandas ∙ NumPy ∙ Matplotlib ∙ Scikit-Learn ∙ Keras ∙
Request ∙ BeautifulSoup ∙ PyTorch ∙ PySpark
Python
Management Science
Operation Research ∙ Financial Markets/Institutions ∙
CAPM ∙ Investment Project Evaluation
Machine Learning
Feature Engineering ∙ Regression ∙ SVM ∙ Random Forest ∙ Bayes Theory ∙ Unsupervised ∙ Graph Embeddings
EXPERTISE
PROGRAMMING/TOOLS
I'm skilled in several programming languages, including Python, Java, JavaScript, SQL, etc. that are essential to a Data Scientist not only when obtaining and cleaning data, but also when implementing machine learning models. I'm also familiar with the most used tools in the industry, such as Jupyter and MATLAB.
DATA ADAPTIBILITY
My data adaptibility is a strength that includes both the skills to mine and work with various types of unstructured data, and also the ability to apply different feature engineering techniques and machine learning models to suit different data types and purposes.
COMMUNICATION
Result communication is a fundamental part of Data Scientists' work. I'm able to communicate my understanding of data as well as models effectively verbally. Moreover, I'm also skilled in data visualization tools that can tell the story behind data clearly to both technical and non-technical audiences.
PROJECTS
Methods:
Graph Embeddings ∙ Heterogeneous Information Network
HATEFUL POST CLASSIFICATION
Reddit Hateful Post Classification
In this project, we investigated contents from Reddit, which is a popular social network that carries rich potential information of contents and their authors. Our goal was to classify hateful posts from the normal ones. Being able to identify hateful posts not only enables platforms to improve user experiences, but also helps to maintain a positive online environment.
By obtaining the relations among users, posts, and comments, we construct a heterogeneous information network, which we then used as our feature through graph embedding techniques to classify whether a Reddit post is hateful or not.
Methods: TF-IDF ∙
Jaccard Similarity ∙
Logistic Classification
BOOK RECOMMENDER
BOOK RECOMMENDER
This was a book recommender program that uses read histories of users to calculate similarities in order to make recommendations. It also predicts book genre based on users' text reviews.
ENVIRONMENTAL INEQUALITY IN CALIFORNIA
Methods:
Random Forest Classifier ∙ ArcGIS
ENVIRONMENTAL
INEQUALITY IN CALIFORNIA
In this project, we were interested in exploring social factors that were related to worse air quality in Southern California compared to Northern California. The focus was using ArcGIS and its API to obtain and manipulate spacial features that were then used to predict a county's air quality.
MALWARE CLASSIFICATION
Methods:
Graph Embeddings ∙ Heterogeneous Information Network
MALWARE CLASSIFICATION
Efforts have been put in to classify harmful apps from benign ones. The paper Hindroid looks for potential relationships among API calls that are helpful to distinguish malwares after creating an information network. This project is an implementation and replication of the paper.
Methods:
Feature Engineering ∙ Multilayer Perceptron ∙ Logistic Regression
DOTA 2 GAME PREDICTION
The popularity of eSport recently has inspired attempts to predict game results using machine learning techniques. Here, an evaluation of models and feature representations are presented to predict the results of DotA 2, a well-known game in eSport. Our features aim to represent both individual hero selections and different combinations of heroes in game.
DOTA 2 GAME PREDICTION
HOBBIES & INTERESTS
Cinema
Travel
Gaming
Food
Musics