Lecture 19. Review CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.

Slides:



Advertisements
Similar presentations
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Advertisements

Boosting Ashok Veeraraghavan. Boosting Methods Combine many weak classifiers to produce a committee. Resembles Bagging and other committee based methods.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Structural Equation Modeling Continued: Lecture 2 Psy 524 Ainsworth.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
LARGE MARGIN CLASSIFIERS David Kauchak CS 451 – Fall 2013.
Today Ensemble Methods. Recap of the course. Classifier Fusion
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Assignments CS fall Assignment 1 due Generate the in silico data set of 2sin(1.5x)+ N (0,1) with 100 random values of x between.
Classification Ensemble Methods 1
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Global predictors of regression fidelity A single number to characterize the overall quality of the surrogate. Equivalence measures –Coefficient of multiple.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning.
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
Lecture 16. Bagging Random Forest and Boosting¶
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning with Spark MLlib
Lecture 18. Support Vector Machine (SVM)
Boosting and Additive Trees (2)
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Machine Learning – Regression David Fenyő
CSE 4705 Artificial Intelligence
CH 5: Multivariate Methods
ECE 5424: Introduction to Machine Learning
COMP61011 : Machine Learning Ensemble Models
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Machine Learning – Regression David Fenyő
Logistic Regression Classification Machine Learning.
Statistical Methods For Engineers
CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off
The regression model in matrix form
10701 / Machine Learning Today: - Cross validation,
Presenter: Georgi Nalbantov
Machine Learning Math Essentials Part 2
Junheng, Shengming, Yunsheng 11/2/2018
Contact: Machine Learning – (Linear) Regression Wilson Mckerrow (Fenyo lab postdoc) Contact:
Overfitting and Underfitting
Basis Expansions and Generalized Additive Models (1)
Model generalization Brief summary of methods
Recitation 10 Oznur Tastan
Where does the error come from?
Multivariate Methods Berlin Chen
Machine learning overview
Junheng, Shengming, Yunsheng 11/09/2018
Multivariate Methods Berlin Chen, 2005 References:
The Bias-Variance Trade-Off
Linear Discrimination
Shih-Yang Su Virginia Tech
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Support Vector Machines 2
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Lecture 19. Review CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan

Announcements Lecture on Wednesday is on Experimental Design and more reviews Labs on Thursday and Friday will be reviews on material for midterm #2 No quizzes Monday Nov. 28: Guest Lecture by Jeff Palmucci, Director, Machine Intelligence at TripAdvisor

Start with some data

Calculate mean and covariances for each class assuming Normal distributions.

Now forget training data

Original distributions’ mean and cov for each class.

We only care for means and covariances.

Prediction: measure distance from point in units of cov

Trump Clinton More Blues More Reds

Not same S: QDA

When do we need to regularize? Overfitting Colinearity Lasso/Ridge/Random Sampling/Dropout Model Selection when, what, how?

Overfitting How do we know we overfit? Test error is rising as a function of some tuning parameter that captures the flexibility of the model A model has overfit the training data when its test error is larger than its training error Model is too complex How many parameters vs how many training points Variance of the training or testing error is too large Trade-off bias and variance

Test error is rising as a function of some tuning parameter that captures the flexibility of the model This looks good and it is almost everywhere but the problem with this definition is that it's not testable for a single model. That is, I hand you a trained model f, can you tell me if it's overfit. Do have to sweep hyperparameters (which may or may not be swappable)?

A model has overfit the training data when its test error is larger than its training error Best model Underfit Overfit Taken from: http://nlpers.blogspot.com/2015/09/overfitting.html

A model has overfit the training data when its test error is larger than its training error Underfit Overfit Taken from: http://nlpers.blogspot.com/2015/09/overfitting.html

Overfit Underfit Problem: Distributions of errors are estimates and their variances are overestimated for test set and underestimated for the train set.

How not to overfit? Method Why How Linear Regression Too many predictors Shrinkage or subset selection Degree of polynomial Shrinkage or subset selection KNN Small K Control K Logistic Regression Same as linear regression Same as linear regression just in the odds space QDA Different covariance matrices Control of how much common cov and individual cov Decision Trees Depth of the tree Control max_depth, max_num_cells via pruning Boosting # iteration (B) Control B or learning rate.

Model selection using AIC/BIC VS Regularization We know that the more predictors, or more terms in the polynomial we have, the more likely is to overfit. Subset selection using AIC/BIC just removes terms with the hope that overfitting goes away (makes sense). No testing on training, no testing on overfitting (no matter the definition of overfitting) Shrinkage methods on the other hand hope that by restricting the coefficients we can avoid overfitting. At the end: We ignore overfitting and just find the best model on the test set.

Imbalanced Data Subsample Oversampe Re-weight sample points Use clustering to reduce majority class Re-calibrate classifier output

Imbalanced Data The problem: Oversample: Subsample

Example: Random Forest Subsampling sample train

Class weights http://scikit-learn.org/stable/_images/sphx_glr_plot_separating_hyperplane_unbalanced_001.png