top of page
  • White LinkedIn Icon
  • kindpng_1280233
  • White Facebook Icon
  • White Instagram Icon

Hello! I'm Yu-Chun

 

I'm a new and young Data Scientist ready to contribute to team success with skills and hard work. I've graduated from UC Berkeley with MEng. Electrical Engineering & Computer Science, and from UC San Diego with B.S. Data Science and B.S. Management Science. Not only am I skilled in Python programming, statistical theories, and machine learning model development and evaluation, but I'm also an effective communicator!

EImg_DigDlvry_Gen_FullSize_1.jpg

YU-CHUN CHEN

DATA SCIENTIST & ML ENGINEER

 

Phone:

714-307-5095

 

Email:

winniecyc327@gmail.com 

 

Location:

Berkeley, CA

GitHub:

https://github.com/yuc330

EXPERIENCE
EXPERIENCE
2018-2020

Instructional Assistant

University of California, San Diego

As a tutor for several Data Science classes, I supported my supervising instructors on course planning, performed group and individual tutoring, and maintained student records. I assisted students in their thought process and helped build their problem-solving skills.

2019 Jul-Aug

Database Engineer Intern

China Construction Bank Fintech

I developed PySpark programs that analyze over 500,000 pieces of payment data from Alipay daily, filter out fraud payment, and detect possible money laundering.

2018 Aug-Sep

Data Scientist Intern

Cornupay

I designed and completed a project from scratch after learning the company's services and operations. The results are predictions of money amount loaned by platform users with machine learning tools, including Linear Regression, Logistic Classification.

EDUCATION
EDUCATION
2020-2021

Master of Engineering

University of California, Berkeley

This spring, I graduated from CAL with a MEng degree in Electrical Engineering & Computer Science, specialized in Data Science and Systems. Courses I took include Machine Learning, Optimization Models in Engineering, Deep Learning, etc.

2016-2020

Bachelor of Science

University of California, San Diego

I graduated from my undergraduate institution with double major degrees in Data Science and Management Science. Courses I took for Data Science include AI Stat Approach, Recommender Sys & Web Mining, Data Visualization, System for Scalable Analytics, etc., while courses I took for Management Science include Operation Research, Corporate Finance, Decisions Under Uncertainty, etc.

SKILLS
SKILLS

Pandas ∙ NumPy ∙ Matplotlib ∙ Scikit-Learn ∙ Keras ∙

Request ∙ BeautifulSoup ∙ PyTorch ∙ PySpark

Python

Cloud Computing

PySpark ∙ Dask ∙ Linux ∙ AWS

Management Science

Operation Research ∙ Financial Markets/Institutions 

CAPM ∙ Investment Project Evaluation

Machine Learning

Feature Engineering ∙ Regression ∙ SVM ∙ Random Forest ∙ Bayes Theory ∙ Unsupervised ∙ Graph Embeddings

Data Visualization

HTML ∙ CSS ∙ JavaScript ∙ Ajax ∙ Tableau ∙ Infogram

Other Tools

SQL ∙ R Studio ∙ Stata ∙ Git ∙ ArcGIS ∙

Jupyter Notebook ∙ LaTeX

EXPERTISE
EXPERTISE
PROGRAMMING/TOOLS

I'm skilled in several programming languages, including Python, Java, JavaScript, SQL, etc. that are essential to a Data Scientist not only when obtaining and cleaning data, but also when implementing machine learning models. I'm also familiar with the most used tools in the industry, such as Jupyter and MATLAB.

DATA ADAPTIBILITY

My data adaptibility is a strength that includes both the skills to mine and work with various types of unstructured data, and also the ability to apply different feature engineering techniques and machine learning models to suit different data types and purposes.

COMMUNICATION

 

Result communication is a fundamental part of Data Scientists' work. I'm able to communicate my understanding of data as well as models effectively verbally. Moreover, I'm also skilled in data visualization tools that can tell the story behind data clearly to both technical and non-technical audiences. 

PROJECTS

Methods:

Graph Embeddings  Heterogeneous Information Network

REDDIT

HATEFUL POST CLASSIFICATION

Reddit Hateful Post Classification

In this project, we investigated contents from Reddit, which is a popular social network that carries rich potential information of contents and their authors. Our goal was to classify hateful posts from the normal ones. Being able to identify hateful posts not only enables platforms to improve user experiences, but also helps to maintain a positive online environment. 

By obtaining the relations among users, posts, and comments, we construct a heterogeneous information network, which we then used as our feature through graph embedding techniques to classify whether a Reddit post is hateful or not. 

Methods: TF-IDF ∙ 

Jaccard Similarity  

Logistic Classification

BOOK RECOMMENDER

BOOK RECOMMENDER

This was a book recommender program that uses read histories of users to calculate similarities in order to make recommendations. It also predicts book genre based on users' text reviews.

ENVIRONMENTAL INEQUALITY IN CALIFORNIA 

Methods:

Random Forest Classifier  ArcGIS

ENVIRONMENTAL

INEQUALITY IN CALIFORNIA 

In this project, we were interested in exploring social factors that were related to worse air quality in Southern California compared to Northern California. The focus was using ArcGIS and its API to obtain and manipulate spacial features that were then used to predict a county's air quality.

MALWARE CLASSIFICATION 

Methods:

Graph Embeddings  Heterogeneous Information Network

MALWARE CLASSIFICATION 

Efforts have been put in to classify harmful apps from benign ones. The paper Hindroid  looks for potential relationships among API calls that are helpful to distinguish malwares after creating an information network. This project is an implementation and replication of the paper.

Methods:

Feature Engineering  Multilayer Perceptron  Logistic Regression

DOTA 2 GAME PREDICTION 

The popularity of eSport recently has inspired attempts to predict game results using machine learning techniques. Here, an evaluation of models and feature representations are presented to predict the results of DotA 2, a well-known game in eSport. Our features aim to represent both individual hero selections and different combinations of heroes in game. 

DOTA 2 GAME PREDICTION

PROJECTS
HOBBIES & INTERESTS
INTERESTS

Cinema

Travel

Gaming

Food

Musics

CONTACT
bottom of page