Subjects Review Introduction to Statistical Learning Midterm: Thursday, October 15th :00-16:00 ADV2
Chp 1 Interpreting some important plots – Histogram – Box plot – Linear function – Quadratic function Type of data – Numeric values – Nominal values
Chp 2 What is statistical learning? Why is it important? What kind of tasks are included in it? – Supervised vs. Unsupervised Learning – Regression vs. Classification – Interpretability vs. Flexibility What are the differences between machine learning and statistical learning? How to fit the model? – Mean square error (MSE) k-nearest neighbors classification example
Chp 3 Hypothesis testing H 0 vs. H 1 on linear regression coefficients and response Linear model performance – Residual standard error – R 2 statistic – F statistic How to decide important variables – Forward selection – Backward selection – Mixed selection Collinearity problem
Chp 4 What are the limitation of linear regression? How to extend from linear regression to logistic regression? – From probability to logit – From logit to regression – Use the the logistic function for mapping back the probability
Chp 5 Validation set – Training and test set Cross validation – LOOCV – k-fold – Advantage and disadvantages Bootstrap Confusion matrix
R Programming Load data Summary Correlation of variables Plotting Linear model – Coefficient significance level – Model quality – Cross validation Generalized linear model (family= binomial) – Coefficient significance level – Model quality – Cross validation – Contrast – Convusion matrix