Download presentation
Presentation is loading. Please wait.
Published byWilliam Shelton Modified over 9 years ago
1
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible. Because those vertical distances represent “left- over” variation in the response after fitting the regression line, these distances are called residuals.
2
Or in other words, the residuals are the distances from the points to the LSRL.
3
Calculating a Residual One subject's NEA rose by 135 calories and he gained 2.7 kg of fat. The predicted gain for 135 calories from the regression equation is: The residual for this subject is therefore: observed - predicted
4
Fat Gain & NEA (yet again!) Here are the residuals for all 16 data values from the NEA experiment: Although residuals can be calculated from any model that is fitted to the data, the residuals from the least- squares line have a special property: the sum of the least-squares residuals is always zero. (Try adding the numbers above- - they add up to zero!)
5
The line y=0 corresponds with the regression line, and also marks the mean of our residuals. The residuals plot magnifies the deviations from the line to make patterns easier to see.
6
Residual Plots What to look for when examining a residual plot: 1. Residual plots should have no pattern.
7
Residual Plots What to look for when examining a residual plot: A curved pattern shows that the relationships may not be linear. Increasing spread about the line as x increases indicates the prediction will be less accurate for larger x values. Similarly, decreasing spread indicates the prediction will be less accurate for smaller x values.
8
Residual Plots What to look for when examining a residual plot: 1. The residual plot should show no pattern. 2.The residuals should be relatively small in size.
9
The role of r 2 in regression A residual plot is a graphical tool for evaluating how well a linear model fits the data. Look at the residual plot first to see if a linear model is a good fit. If the linear model is a good fit, then there is also a numerical quantity that tells us how well the LSRL does at predicting values of the response variable y. It is r 2, the coefficient of determination.
10
The role of r 2 in regression r 2 is actually the correlation squared, but there's more to the story... The idea of r 2 is this: how much better is the least- squares line at predicting responses y than if we just used our mean?
11
The role of r 2 in regression Is the LSRL better at predicting the data values than the mean? r 2 tells us how much better. Here's the line that represents the y mean of our data. Here's our LSRL
12
Note: Remember we defined the variance back when we talked about standard deviation. r 2 compares the variance from the mean (the SST part of the equation) with the residuals (the SSE part of the equation). Here's the formula:
13
For example, if r 2 =0.606 (as it does in the NEA example), then about 61% of the variation in fat gain among the individual subjects is due to the straight-line relationship between fat gain and NEA. The other 39% is individual variation among subjects that is not explained by the linear relationship.
14
When you report a regression, give r 2 as a measure of how successful the regression was in explaining the response. When you see a correlation, square it to get a better feel for the strength of the linear relationship.
15
Review Facts About Least-Square Regression The distinction between explanatory and response variables is essential in regression. In the regression setting you must know clearly which variable is explanatory!
16
Review Facts About Least-Square Regression There is a close connection between correlation and the slope of the LSRL. The slope is This equation says that along the regression line, a change of one standard deviation in x corresponds to a change of r standard deviations in y.
17
The least-squares regression line of y on x always passes through the point (mean of x values, mean of y values) Review Facts About Least-Square Regression
18
The correlation r describes the strength of a straight-line relationship. The square of the correlation, r 2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.