- The following projects are exclusively conducted in R. They ranged from exploring and visualizing data to apply various algorithms or models in data mining and machine learning to solve challenging real-world issues such as regression, classification and clustering methods.
-
Developed classification models to maximize the net profit and make mailings more cost-effective including logistic regression, linear and quadratic discriminant analysis, K nearest-neighbors, regression tree, random forest and support vector machine
-
Built seven regression models to predict donation amounts and minimize the mean squared prediction errors including least square, stepwise, best subset, ridge, lasso and principal components regression
-
Identified an optimal classification model (RF) with more than 95% accuracy to predict which manner participants did the exercise including multinomial logit regression, linear and quadratic discriminant analysis, and random forest (RF)
-
Conducted principal components analysis to choose appropriate predictors
- Developed regression models to predict the response using ten predictor variables including least square, stepwise, best subset, ridge and lasso regression
- Identified interacted genes using classification methods from protein-protein interaction data including logistic regression, linear and quadratic discriminant analysis, K nearest-neighbors and generalized additive model
-
Practiced the whole process of consulting – communicating, data analysis and reporting
-
Wrote an R function that reads all Excel sheets and puts them in a list