Download presentation
Presentation is loading. Please wait.
Published byJoseph Farmer Modified over 6 years ago
1
Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Jan. 9, 2001
Overfitting Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Jan. 9, 2001
2
Outline Simple Example Fitting Cause of Overfitting Resolution
3
Simple Example Overfitting: fitted model is too complex x N
Overfitting is the phenomenon when the model used to fit existing data is more complex than the real model where the data came from. Here is a simple example. Assume that the 3 x’s represent the noisy data from a linear model, represented by the straight line in the figure. If we fit the 3 points to a quadratic model, we could get a perfect fit, represent by the concave curve. N
4
Simple Example (cont.) Overfitting poor predictive power x N
A consequence of overfitting is that the overfitted model has very poor predictive power. For example, the new point, represented by the “x” in the box, is faraway from the prediction by the overfitted quadratic model. In contrast, the difference between the linear model and the new points is much smaller. N
5
Fitting A common problem – fit a model to existing data: Regression
Training Neural Networks Approximation Data Mining Learning All of the items listed above can be though of fitting a model to existing data. All of them, potentially, can run into the problem of overfitting. N
6
Cause of Ovefitting Noise in the system:
Greater variability in data Complex model many parameters higher degree of freedom greater variability Most of the data that we collect or observe has noise in it. Noise would create a greater amount of variability in the observed data compare to the true data. Therefore, the noisy data is “complex.” With a more complex model, one had more degree of freedom. Thus, one could fit the complex model to the “complex” data while resulting less error than if we fit the simple model to the “complex” model. In other words, the perceived gain in accuracy from fitting with complex model is resulted by capturing the variability resulted from noise. Hence, a overfitted model has little predictive power since it is predicting the wrong thing – noise. That is why often a more parsimonious model is often more robust (the parsimonity principle). On the other hand, underfitting is also a concern – the model is not complex enough to capture the dynamics of the data. In this case, we still have very little predictive power. N
7
Resolution More data points: average out the noise Multiple models N
As a rule of thumb, use 10 times as many points as there are model parameters to be estimated. For example, if you are fitting a quadratic model, there are three parameters to be determined. Use at least 30 data points to estimate parameters of the the best fitting quadratic model. Another resolution is to fit the data to multiple models. N
8
References: Estienne, F., web-site, Stokes, L and J. Ghosh, web-site,
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.