CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS9840 - Learning and Computer Vision.

Slides:

Advertisements

Similar presentations

Chapter 5 Multiple Linear Regression

Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.

Model generalization Test error Bias, variance and complexity

Model Assessment, Selection and Averaging

Model assessment and cross-validation - overview

© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.

Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

Copyright © 2008 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics, 9e Managerial Economics Thomas Maurice.

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Statistics for Managers Using Microsoft® Excel 5th Edition

Statistics for Managers Using Microsoft® Excel 5th Edition

Chapter 4 Multiple Regression.

MAE 552 Heuristic Optimization

Evaluating Hypotheses

Experimental Evaluation

Objectives of Multiple Regression

CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.

Evaluating Classifiers

Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.

2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,

CLassification TESTING Testing classifier accuracy

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人：黃子齊

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

1 Least squares procedure Inference for least squares lines Simple Linear Regression.

by B. Zadrozny and C. Elkan

Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.

Chapter 9 – Classification and Regression Trees

Experimental Evaluation of Learning Algorithms Part 1.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Managerial Economics Demand Estimation & Forecasting.

Copyright © 2005 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics Thomas Maurice eighth edition Chapter 4.

MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.

Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.

Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.

Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.

VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.

Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.

ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Validation.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.

Correlation & Regression Analysis

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

Machine Learning 5. Parametric Methods.

Validation methods.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Computacion Inteligente Least-Square Methods for System Identification.

PREDICT 422: Practical Machine Learning Module 3: Resampling Methods in Machine Learning Lecturer: Nathan Bastian, Section: XXX.

Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.

Stats Methods at IC Lecture 3: Regression.

Data Science Credibility: Evaluating What’s Been Learned

7. Performance Measurement

Support Vector Machines

Chapter 4: Basic Estimation Techniques

Chapter 4 Basic Estimation Techniques

Basic Estimation Techniques

Basic Estimation Techniques

Product moment correlation

Model generalization Brief summary of methods

Introduction to Machine learning

MGS 3100 Business Analysis Regression Feb 18, 2016

Support Vector Machines 2

F test for Lack of Fit The lack of fit test..

Presentation transcript:

CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision

How to check if a model fit is good? The R 2 statistic has become the almost universally standard measure for model fit in linear models. What is R 2 ? It is the ratio of error in a model over the total variance in the dependent variable. Hence the lower the error, the higher the R2 value.

How to check if a model fit is good?

OVERFITTING Modeling techniques tend to overfit the data. Multiple regression: Every time you add a variable to the regression, the model’s R 2 goes up. Naïve interpretation: every additional predictive variable helps to explain yet more of the target’s variance. But that can’t be true! Left to its own devices, Multiple Regression will fit too many patterns. A reason why modeling requires subject-matter expertise.

OVERFITTING Error on the dataset used to fit the model can be misleading › Doesn’t predict future performance. Too much complexity can diminish model’s accuracy on future data. › Sometimes called the Bias- Variance Tradeoff.

OVERFITTING What are the consequences of overfitting? ›“Overfitted models will have high R 2 values, but will perform poorly in predicting out-of-sample cases”

WHY WE NEED CROSS-VALIDATION? R squared, also known as coefficient of determination, is a popular measure of quality of fit in regression. However, it does not offer any significant insights into how well our regression model can predict future values. When an MLR equation is to be used for prediction purposes it is useful to obtain empirical evidence as to its generalizability, or its capacity to make accurate predictions for new samples of data. This process is sometimes referred to as “validating” the regression equation.

One way to address this issue is to literally obtain a new sample of observations. That is, after the MLR equation is developed from the original sample, the investigator conducts a new study, replicating the original one as closely as possible, and uses the new data to assess the predictive validity of the MLR equation. This procedure is usually viewed as impractical because of the requirement to conduct a new study to obtain validation data, as well as the difficulty in truly replicating the original study. An alternative, more practical procedure is cross-validation.

CROSS-VALIDATION In cross-validation the original sample is split into two parts. One part is called the training (or derivation) sample, and the other part is called the validation (or validation + testing) sample. 1)What portion of the sample should be in each part? If sample size is very large, it is often best to split the sample in half. For smaller samples, it is more conventional to split the sample such that 2/3 of the observations are in the derivation sample and 1/3 are in the validation sample.

CROSS-VALIDATION 2) How should the sample be split? The most common approach is to divide the sample randomly, thus theoretically eliminating any systematic differences. One alternative is to define matched pairs of subjects in the original sample and to assign one member of each pair to the derivation sample and the other to the validation sample. Modeling of the data uses one part only. The model selected for this part is then used to predict the values in the other part of the data. A valid model should show good predictive accuracy. One thing that R-squared offers no protection against is overfitting. On the other hand, cross validation, by allowing us to have cases in our testing set that are different from the cases in our training set, inherently offers protection against overfitting.

CROSS VALIDATION – THE IDEAL PROCEDURE 1.Divide data into three sets, training, validation and test sets 2.Find the optimal model on the training set, and use the test set to check its predictive capability 3.See how well the model can predict the test set 4.The validation error gives an unbiased estimate of the predictive power of a model

TRAINING/TEST DATA SPLIT Talked about splitting data in training/test sets training data is used to fit parameters test data is used to assess how classifier generalizes to new data What if classifier has “non‐tunable” parameters? a parameter is “non‐tunable” if tuning (or training) it on the training data leads to overfitting

TRAINING/TEST DATA SPLIT What about test error? Seems appropriate degree 2 is the best model according to the test error Except what do we report as the test error now? Test error should be computed on data that was not used for training at all Here used “test” data for training, i.e. choosing model

VALIDATION DATA Same question when choosing among several classifiers our polynomial degree example can be looked at as choosing among 3 classifiers (degree 1, 2, or 3) Solution: split the labeled data into three parts

TRAINING/ VALIDATION

Training/Validation/Test Data Training Data Validation Data d = 2 is chosen Test Data 1.3 test error computed for d = 2

LOOCV (Leave ‐ one ‐ out Cross Validation) For k=1 to R 1. Let (x k,y k ) be the k example

LOOCV (Leave ‐ one ‐ out Cross Validation)

LOOCV for Quadratic Regression

LOOCV for Join The Dots

Which kind of Cross Validation?

K-FOLD CROSS VALIDATION ›Since data are often scarce, there might not be enough to set aside for a validation sample ›To work around this issue k-fold CV works as follows: 1. Split the sample into k subsets of equal size 2. For each fold estimate a model on all the subsets except one 3. Use the left out subset to test the model, by calculating a CV metric of choice 4. Average the CV metric across subsets to get the CV error ›This has the advantage of using all data for estimating the model, however finding a good value for k can be tricky

K-fold Cross Validation Example 1.Split the data into 5 samples 2.Fit a model to the training samples and use the test sample to calculate a CV metric. 3.Repeat the process for the next sample, until all samples have been used to either train or test the model

Which kind of Cross Validation?

Improve cross-validation Even better: repeated cross-validation Example: 10-fold cross-validation is repeated 10 times and results are averaged (reduce the variance)

Cross Validation - Metrics How do we determine if one model is predicting better than another model?

Cross Validation Metrics

Best Practice for Reporting Model Fit 1.Use Cross Validation to find the best model 2.Report the RMSE and MAPE statistics from the cross validation procedure 3.Report the R Squared from the model as you normally would. The added cross-validation information will allow one to evaluate not how much variance can be explained by the model, but also the predictive accuracy of the model. Good models should have a high predictive AND explanatory power!