Where does the error come from?

Where does the error come from?
Disclaimer: This PPT is modified based on Dr. Hung-yi Lee

Bias and variability: Arrow shooting as an example
Figure 3.14 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company

Review Average Error on Testing Data
error due to "bias" and error due to "variance" We have to know where is the error come from? Model Complexity A more complex model does not always lead to better performance on testing data.

Estimator 𝑦 = 𝑓 𝑓 Bias + Variance 𝑓 ∗ Only Niantic knows 𝑓
f_hat: ground truth f^*: estimation From training data, we find 𝑓 ∗ 𝑓 𝑓 ∗ is an estimator of 𝑓

Bias and Variance of Estimator
unbiased Estimate the mean of a variable x assume the mean of x is 𝜇 assume the variance of x is 𝜎 2 Estimator of mean 𝜇 Sample N points: 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑁 𝑚 2 𝑚 5 𝑚 1 𝜇 𝑚= 1 𝑁 𝑛 𝑥 𝑛 ≠𝜇 Red point: ground truth 𝑚 4 𝑚 3 𝐸 𝑚 =𝐸 1 𝑁 𝑛 𝑥 𝑛 = 1 𝑁 𝑛 𝐸 𝑥 𝑛 =𝜇 𝑚 6

unbiased Estimate the mean of a variable x assume the mean of x is 𝜇 assume the variance of x is 𝜎 2 Estimator of mean 𝜇 Sample N points: 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑁 Smaller N Larger N 𝑚= 1 𝑁 𝑛 𝑥 𝑛 ≠𝜇 Var(x_bar) = sigma/N 𝜇 𝜇 Variance depends on the number of samples V𝑎𝑟 𝑚 = 𝜎 2 𝑁

Increase N Estimate the mean of a variable x assume the mean of x is 𝜇 assume the variance of x is 𝜎 2 Estimator of variance 𝜎 2 Sample N points: 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑁 𝑠 3 𝜎 2 𝜎 2 𝑠 1 𝑚= 1 𝑁 𝑛 𝑥 𝑛 𝑠= 1 𝑁 𝑛 𝑥 𝑛 −𝑚 2 𝑠 4 𝑠 5 Biased estimator 𝑠 2 𝑠 6 𝐸 𝑠 = 𝑁−1 𝑁 𝜎 2 ≠ 𝜎 2

𝐸 𝑓 ∗ = 𝑓 𝑓 ∗ 𝑓 Variance Bias
𝐸 𝑓 ∗ = 𝑓 Variance 𝑓 ∗ If we can do the experiments several times f_hat: truth f^*: sample mean Bias 𝑓

Parallel Universes In all the universes, we are collecting (catching) 10 Pokémons as training data to find 𝑓 ∗ Universe 1 Universe 2 Universe 3

Parallel Universes In different universes, we use the same model, but obtain different 𝑓 ∗ Universe 123 Universe 345 Same model. With different inputted data, the regression lines are different y = b + w ∙ xcp y = b + w ∙ xcp

𝑓 ∗ in 100 Universes y = b + w ∙ xcp y = b + w1 ∙ xcp + w2 ∙ (xcp)2
100 re-samples; With three different polynomial regression models y = b + w1 ∙ xcp + w2 ∙ (xcp)2 + w3 ∙ (xcp)3 + w4 ∙ (xcp)4 + w5 ∙ (xcp)5

Simpler model is less influenced by the sampled data
Variance y = b + w1 ∙ xcp + w2 ∙ (xcp)2 + w3 ∙ (xcp)3 + w4 ∙ (xcp)4 + w5 ∙ (xcp)5 y = b + w ∙ xcp Left: larger bias, smaller variance Right: smaller bias, larger variance f(x) = 5: variance equal to zero Small Variance Large Variance Simpler model is less influenced by the sampled data Consider the extreme case f(x) = 5

Bias 𝐸 𝑓 ∗ = 𝑓 Bias: If we average all the 𝑓 ∗ , is it close to 𝑓 ?
𝐸 𝑓 ∗ = 𝑓 Bias: If we average all the 𝑓 ∗ , is it close to 𝑓 ? Large Bias Assume this is 𝑓 We don’t really know the F^ Small Bias

Black curve: the true function 𝑓 Red curves: 5000 𝑓 ∗
Degree=1 Black curve: the true function 𝑓 Red curves: 𝑓 ∗ Blue curve: the average of 𝑓 ∗ = 𝑓 Degree=5 Degree=3

Bias Large Small Bias Bias y = b + w1 ∙ xcp + w2 ∙ (xcp)2
Function space is getting bigger when model gets more complicated Large Bias Small Bias model model

Bias v.s. Variance Underfitting Overfitting Large Bias Small Bias
Error from bias Error from variance Error observed Underfitting Overfitting We have to know where is the error come from? Simple model  complex model Large Bias Small Bias Small Variance Large Variance

What to do with large bias?
Diagnosis: If your model cannot even fit the training examples, then you have large bias If you can fit the training data, but large error on testing data, then you probably have large variance For bias, redesign your model: Add more features as input A more complex model Underfitting Overfitting large bias

What to do with large variance?
Collect more data Regularization Very effective, but not always practical 10 examples 100 examples May increase bias How can I know simulation to have more data. Second column: left (no regularization)  mid (some regularization)  right (More regularization). no regularization Some regularization More regularization

Model Selection Training Set Testing Set Real Testing Set
There is usually a trade-off between bias and variance. Select a model that balances two kinds of error to minimize total error What you should NOT do: Training Set Testing Set Real Testing Set Model 1 Err = 0.9 (not in hand) Model 2 Err = 0.7 Model 3 Err = 0.5 Err > 0.5

Kaggle competition Training Set Testing Set Testing Set public private
Model 1 Err = 0.9 Model 2 Err = 0.7 Model 3 Err = 0.5 Err > 0.5 I beat baseline! No, you don’t What will happen? Public testing set is available for baseline; Private testing set is only available after submission deadline in Kaggle.

Cross Validation Training Set Testing Set Testing Set Training Set
public private Training Set Testing Set Testing Set Training Set Validation set Using the results of public testing data to tune your model You are making public set better than private set. For validation set method Model 1 Err = 0.9 Model 2 Err = 0.7 Not recommend Model 3 Err = 0.5 Err > 0.5 Err > 0.5

N-fold Cross Validation (eg: N=3)
Training Set Model 1 Model 2 Model 3 Train Train Val Err = 0.2 Err = 0.4 Err = 0.4 Train Val Train Err = 0.4 Err = 0.5 Err = 0.5 Val Train Train Err = 0.3 Err = 0.6 Err = 0.3 Model selection by N-fold CV. Avg Err = 0.3 Avg Err = 0.5 Avg Err = 0.4 Testing Set Testing Set public private

Reference Bishop: Chapter 3.2

Where does the error come from?

Similar presentations

Presentation on theme: "Where does the error come from?"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Where does the error come from?

Similar presentations

Presentation on theme: "Where does the error come from?"— Presentation transcript:

Similar presentations

About project

Feedback