"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem."
-- John Tukey

ABOUT ME

Hi, this is Bing!

I'm an aspiring Data Scientist with previous experience in material science, startup and financial industry. I am passionate to wrestle with complex data, apply machine learning models to explore stories behind the big data, and solve business problems to make peoples' life better.

I will soon graduate from Master's of Business Statistics program at the University of Maryland, where I am a straight A student and have developed a strong programming and statistics skill set that can tackle business problems involving big data.

I would like to share the projects I have completed in data science field and blogs sharing my thoughts about things around data science. Feel free to contact me if you have any questions when you are exploring the projects and blogs.

Click "MORE" for details and source code about the project on github.

Featured Projects

Image generation and transformation with GAN

Develop two generative adversarial networks with the ability to generate digits and transform pictures in one domain to another domain using conditional-GAN.
(Keras, GAN, AWS, EC2, MINIST, C-GAN, Computer Vision, Convolutional Neural Networks)

Hospital Return Rate Prediction

As a member of analytics team, I utilize patient information to predict whether a patient will return to a hospital within 30 days of being discharged.
With return prediction information, hospital or emergency center could prepare in advance to meet further room requirements and make recommendations to alleviate overcrowding.
(R, Healthcare Industry, Classification Prediction Modeling, Logistics Regression, Random Forest, Ensemble Methods)

Movie Industry Development Analysis

As a analytics team, we analyze the development information of movie industry in the past 100 years. Our dataset includes movie basic information and its relavant rotten tomatoes scores information. Our goal is to provide valuable insights for movie producer newbies on how movie industry changed and developed through the past 100 years and what factors might affect movie's quality, helping new movie producers to be more successful on the start of their career.

(Jupyter Notebook, Python, NumPy, Pandas, Matplotlib, Seaborn, Visualization)


Customer Relationship Database Management System

A database management system for HVAC Mechanics LLC. The system allows the company to efficiently manage the customer, supplier, service, transaction and requests information, improving the efficiency and enhance the management of HVAC.

(SQL, LucidChart, Microsoft sql server management studio, Database management, Relational Database)

Data Science for Good: City of Los Angeles

The content, tone, and format of job bulletins can influence the quality of the applicant pool. Overly-specific job requirements may discourage diversity. The Los Angeles Mayor’s Office wants to reimagine the city’s job bulletins by using text analysis to identify needed improvements.
The goal is to (1) identify language that can negatively bias the pool of applicants; (2) improve the diversity and quality of the applicant pool; and/or (3) make it easier to determine which promotions are available to employees in each job class.

(Jupyter Notebook, Text Analysis, Natural Language Processing, Regular Expression)

Stock Prediction in Supply Chain Industry

After analyzing the problem, I found the reason that delivery estimated accuracy is underperforming is because of the mislabeled items. In general, the buyable attribute of the ASINs are updated by in-stock managers manually, which is time-consuming and inefficient. With the development of big data, machine learning modelling technics are widely utilized in the IT industry to improve business performance and work efficiency.

(R, Binary Classification Prediction, Supply Chain Industry, Random Forest)

Stay Connected