Where does the error come from?

Slides:



Advertisements
Similar presentations
Multiple Regression Analysis
Advertisements

Model generalization Test error Bias, variance and complexity
Model Assessment and Selection
Model Assessment, Selection and Averaging
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Classification and Prediction: Regression Analysis
Copyright © 2005 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics Thomas Maurice eighth edition Chapter 4.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Machine Learning 5. Parametric Methods.
CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016.
Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Chapter 4. The Normality Assumption: CLassical Normal Linear Regression Model (CNLRM)
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
Support Vector Machines
Chapter 4: Basic Estimation Techniques
Evaluating Classifiers
Lecture 19. Review CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Basic Estimation Techniques
Classification: Logistic Regression
Chapter 7: Sampling Distributions
CSE 4705 Artificial Intelligence
ECE 5424: Introduction to Machine Learning
Multiple Regression Analysis
Hypothesis Testing: Hypotheses
Bias and Variance of the Estimator
Logistic Regression Classification Machine Learning.
Basic Estimation Techniques
Statistical Methods For Engineers
Introduction to Instrumentation Engineering
CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off
10701 / Machine Learning Today: - Cross validation,
Chapter 9: Sampling Distributions
Linear Model Selection and regularization
Deep Learning for Non-Linear Control
Overfitting and Underfitting
Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Jan. 9, 2001
Bias-variance Trade-off
Sampling and Power Slides by Jishnu Das.
Chapter 7: Sampling Distributions
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
Test Drop Rules: If not:
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 9: Sampling Distributions
Chapter 7: Sampling Distributions
Multivariate Methods Berlin Chen, 2005 References:
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
The Bias-Variance Trade-Off
The Practice of Statistics – For AP* STARNES, YATES, MOORE
Chapter 7: Sampling Distributions
Shih-Yang Su Virginia Tech
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Chapter 7: Sampling Distributions
Support Vector Machines 2
STT : Intro. to Statistical Learning
Presentation transcript:

Where does the error come from? Disclaimer: This PPT is modified based on Dr. Hung-yi Lee http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17.html http://cs229.stanford.edu/materials/ML-advice.pdf

Bias and variability: Arrow shooting as an example Figure 3.14 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company

Review Average Error on Testing Data error due to "bias" and error due to "variance" We have to know where is the error come from? Model Complexity A more complex model does not always lead to better performance on testing data.

Estimator 𝑦 = 𝑓 𝑓 Bias + Variance 𝑓 ∗ Only Niantic knows 𝑓 f_hat: ground truth f^*: estimation From training data, we find 𝑓 ∗ 𝑓 𝑓 ∗ is an estimator of 𝑓

Bias and Variance of Estimator unbiased Estimate the mean of a variable x assume the mean of x is 𝜇 assume the variance of x is 𝜎 2 Estimator of mean 𝜇 Sample N points: 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑁 𝑚 2 𝑚 5 𝑚 1 𝜇 𝑚= 1 𝑁 𝑛 𝑥 𝑛 ≠𝜇 Red point: ground truth 𝑚 4 𝑚 3 𝐸 𝑚 =𝐸 1 𝑁 𝑛 𝑥 𝑛 = 1 𝑁 𝑛 𝐸 𝑥 𝑛 =𝜇 𝑚 6

Bias and Variance of Estimator unbiased Estimate the mean of a variable x assume the mean of x is 𝜇 assume the variance of x is 𝜎 2 Estimator of mean 𝜇 Sample N points: 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑁 Smaller N Larger N 𝑚= 1 𝑁 𝑛 𝑥 𝑛 ≠𝜇 Var(x_bar) = sigma/N 𝜇 𝜇 Variance depends on the number of samples V𝑎𝑟 𝑚 = 𝜎 2 𝑁

Bias and Variance of Estimator Increase N Estimate the mean of a variable x assume the mean of x is 𝜇 assume the variance of x is 𝜎 2 Estimator of variance 𝜎 2 Sample N points: 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑁 𝑠 3 𝜎 2 𝜎 2 𝑠 1 𝑚= 1 𝑁 𝑛 𝑥 𝑛 𝑠= 1 𝑁 𝑛 𝑥 𝑛 −𝑚 2 𝑠 4 𝑠 5 Biased estimator 𝑠 2 𝑠 6 𝐸 𝑠 = 𝑁−1 𝑁 𝜎 2 ≠ 𝜎 2

𝐸 𝑓 ∗ = 𝑓 𝑓 ∗ 𝑓 Variance Bias 𝐸 𝑓 ∗ = 𝑓 Variance 𝑓 ∗ If we can do the experiments several times f_hat: truth f^*: sample mean Bias 𝑓

Parallel Universes In all the universes, we are collecting (catching) 10 Pokémons as training data to find 𝑓 ∗ Universe 1 Universe 2 Universe 3

Parallel Universes In different universes, we use the same model, but obtain different 𝑓 ∗ Universe 123 Universe 345 Same model. With different inputted data, the regression lines are different y = b + w ∙ xcp y = b + w ∙ xcp

𝑓 ∗ in 100 Universes y = b + w ∙ xcp y = b + w1 ∙ xcp + w2 ∙ (xcp)2 100 re-samples; With three different polynomial regression models y = b + w1 ∙ xcp + w2 ∙ (xcp)2 + w3 ∙ (xcp)3 + w4 ∙ (xcp)4 + w5 ∙ (xcp)5

Simpler model is less influenced by the sampled data Variance y = b + w1 ∙ xcp + w2 ∙ (xcp)2 + w3 ∙ (xcp)3 + w4 ∙ (xcp)4 + w5 ∙ (xcp)5 y = b + w ∙ xcp Left: larger bias, smaller variance Right: smaller bias, larger variance f(x) = 5: variance equal to zero Small Variance Large Variance Simpler model is less influenced by the sampled data Consider the extreme case f(x) = 5

Bias 𝐸 𝑓 ∗ = 𝑓 Bias: If we average all the 𝑓 ∗ , is it close to 𝑓 ? 𝐸 𝑓 ∗ = 𝑓 Bias: If we average all the 𝑓 ∗ , is it close to 𝑓 ? Large Bias Assume this is 𝑓 We don’t really know the F^ Small Bias

Black curve: the true function 𝑓 Red curves: 5000 𝑓 ∗ Degree=1 Black curve: the true function 𝑓 Red curves: 5000 𝑓 ∗ Blue curve: the average of 5000 𝑓 ∗ = 𝑓 Degree=5 Degree=3

Bias Large Small Bias Bias y = b + w1 ∙ xcp + w2 ∙ (xcp)2 Function space is getting bigger when model gets more complicated Large Bias Small Bias model model

Bias v.s. Variance Underfitting Overfitting Large Bias Small Bias Error from bias Error from variance Error observed Underfitting Overfitting We have to know where is the error come from? Simple model  complex model Large Bias Small Bias Small Variance Large Variance

What to do with large bias? Diagnosis: If your model cannot even fit the training examples, then you have large bias If you can fit the training data, but large error on testing data, then you probably have large variance For bias, redesign your model: Add more features as input A more complex model Underfitting Overfitting large bias

What to do with large variance? Collect more data Regularization Very effective, but not always practical 10 examples 100 examples May increase bias How can I know simulation to have more data. Second column: left (no regularization)  mid (some regularization)  right (More regularization). no regularization Some regularization More regularization

Model Selection Training Set Testing Set Real Testing Set There is usually a trade-off between bias and variance. Select a model that balances two kinds of error to minimize total error What you should NOT do: Training Set Testing Set Real Testing Set Model 1 Err = 0.9 (not in hand) Model 2 Err = 0.7 Model 3 Err = 0.5 Err > 0.5

Kaggle competition Training Set Testing Set Testing Set public private Model 1 Err = 0.9 Model 2 Err = 0.7 Model 3 Err = 0.5 Err > 0.5 I beat baseline! No, you don’t What will happen? Public testing set is available for baseline; Private testing set is only available after submission deadline in Kaggle. http://www.chioka.in/how-to-select-your-final-models-in-a-kaggle-competitio/

Cross Validation Training Set Testing Set Testing Set Training Set public private Training Set Testing Set Testing Set Training Set Validation set Using the results of public testing data to tune your model You are making public set better than private set. For validation set method Model 1 Err = 0.9 Model 2 Err = 0.7 Not recommend Model 3 Err = 0.5 Err > 0.5 Err > 0.5

N-fold Cross Validation (eg: N=3) Training Set Model 1 Model 2 Model 3 Train Train Val Err = 0.2 Err = 0.4 Err = 0.4 Train Val Train Err = 0.4 Err = 0.5 Err = 0.5 Val Train Train Err = 0.3 Err = 0.6 Err = 0.3 Model selection by N-fold CV. Avg Err = 0.3 Avg Err = 0.5 Avg Err = 0.4 Testing Set Testing Set public private

Reference Bishop: Chapter 3.2