M23- Residuals & Minitab 1 Department of ISM, University of Alabama, ResidualsResiduals A continuation of regression analysis
M23- Residuals & Minitab 2 Department of ISM, University of Alabama, Lesson Objectives Continue to build on regression analysis. Learn how residual plots help identify problems with the analysis.
M23- Residuals & Minitab 3 Department of ISM, University of Alabama, Example 1: Sample of n = 5 students, Y = Weight in pounds, X = Height in inches. Case X Y Wt = – Ht ^ Prediction equation: r-square = ? Std. error = ? To be found later. continued …
M23- Residuals & Minitab 4 Department of ISM, University of Alabama, HEIGHT Y = – X ^^ Residuals = distance from point to line, measured parallel to Y- axis. WEIGHT Example 1, continued
M23- Residuals & Minitab 5 Department of ISM, University of Alabama, Calculation: For each case, ^ e i = y i - y i residual = observed valueestimated mean For the i th case,
M23- Residuals & Minitab 6 Department of ISM, University of Alabama, Compute the fitted value and residual for the 4 th person in the sample; i.e., X = 72 inches, Y = 207 lbs. ^ fitted value = y = ( ) = _________ residual = e 4 = ^ y 4 - y 4 = = __________ Example 1, continued
M23- Residuals & Minitab 7 Department of ISM, University of Alabama, Residual Plots Scatterplot of residuals vs. the predicted means of Y, Y; or an X-variable. ^
M23- Residuals & Minitab 8 Department of ISM, University of Alabama, HEIGHT Y = – X ^ WEIGHT Residuals = distance from point to line, measured parallel to Y- axis. Example 1, continued e 4 =
M23- Residuals & Minitab 9 Department of ISM, University of Alabama, HEIGHT Residuals Residual Plot e 4 is the residual for the 4 th case, = Example 1, continued Regression line from previous plot is rotated to horizontal.
M23- Residuals & Minitab 10 Department of ISM, University of Alabama, Residual Plot Expect random dispersion around a horizontal line at zero. Problems occur if: Unusual patterns Unusual cases Scatterplot of residuals versus the predicted means of Y, Y; or an X-variable, or Time. ^
M23- Residuals & Minitab 11 Department of ISM, University of Alabama, Residuals versus X Good random pattern 0 Residuals X, or time
M23- Residuals & Minitab 12 Department of ISM, University of Alabama, Residuals versus X Outliers? 0 Residuals X, or time Next step: ________ to determine if a recording error has occurred.
M23- Residuals & Minitab 13 Department of ISM, University of Alabama, X, or time Nonlinear relationship Residuals versus X 0 Residuals Next step: Add a “quadratic term,” or use “ ______.”
M23- Residuals & Minitab 14 Department of ISM, University of Alabama, Variance is increasing Residuals Residuals versus X X, or time Next step: Stabilize variance by using “________.”
M23- Residuals & Minitab 15 Department of ISM, University of Alabama, Residual Plots help identify Unusual patterns: Possible curvature in the data. Variances that are not constant as X changes. Unusual cases: Outliers High leverage cases Influential cases
M23- Residuals & Minitab 16 Department of ISM, University of Alabama, Three properties of Residuals illustrated with some computations.
Y = Weight X = Height Y = Weight X = Height Y = – X ^ X Y Y ^ e = Y – Y ^.01 – Residuals Find the sum of the residuals. Find the sum of the residuals. Property 1. round-off error
M23- Residuals & Minitab 18 Department of ISM, University of Alabama, Residuals always sum to zero. Properties of Least Squares Line e i = 0.
Y = Weight X = Height Y = Weight X = Height Y = – X ^ X Y Y ^ e = Y – Y ^ – – e2e Property 2. Find the sum of squares of the residuals.
M23- Residuals & Minitab 20 Department of ISM, University of Alabama, Residuals always sum to zero. Properties of Least Squares Line 2.This “least squares” line produces a smaller “Sum of squared residuals” than any other straight line can. e i 2 = SSE = < “SSE for any other line”.
M23- Residuals & Minitab 21 Department of ISM, University of Alabama, HEIGHT X = 68.4, Y = 159 X Y WEIGHT Property 3.
M23- Residuals & Minitab 22 Department of ISM, University of Alabama, Residuals always sum to zero. 2.This “least squares” line produces a smaller “Sum of squared residuals” than any other straight line can. Properties of Least Squares Line 3.Line always passes through the point ( x, y ).
M23- Residuals & Minitab 23 Department of ISM, University of Alabama, Illustration of unusual cases: Outliers Leverage Influential
M23- Residuals & Minitab 24 Department of ISM, University of Alabama, Y X outlieroutlier X not pattern near the X-mean “Unusual point” does not follow pattern. It’s near the X-mean; the entire line pulled toward it.
M23- Residuals & Minitab 25 Department of ISM, University of Alabama, Y X outlieroutlier X not pattern twisted slightly “Unusual point” does not follow pattern. The line is pulled down and twisted slightly.
M23- Residuals & Minitab 26 Department of ISM, University of Alabama, Y X High leverage X far fromX-mean follows pattern “Unusual point” is far from the X-mean, but still follows the pattern.
M23- Residuals & Minitab 27 Department of ISM, University of Alabama, Y X leverage & outlier,influential X far from the X-mean not pattern really twists “Unusual point” is far from the X-mean, but does not follow the pattern. Line really twists !
M23- Residuals & Minitab 28 Department of ISM, University of Alabama, High Leverage Case: extreme X value An extreme X value relative to the other X values. Outlier: pattern An unusual y-value relative to the pattern of the other cases. Usually has a large residual. Definitions:
M23- Residuals & Minitab 29 Department of ISM, University of Alabama, has an unusually large effect on the slope of the least squares line. Influential Case Definitions: continued
M23- Residuals & Minitab 30 Department of ISM, University of Alabama, High leverage Definitions: continued High leverage & Outlier influential!! potentially influential. Conclusion:
M23- Residuals & Minitab 31 Department of ISM, University of Alabama, not resistant The least squares regression line is not resistant to unusual cases. Why do we care about identifying unusual cases?
M23- Residuals & Minitab 32 Department of ISM, University of Alabama, Regression Analysis in Minitab
M23- Residuals & Minitab 33 Department of ISM, University of Alabama, Lesson Objectives Learn two ways to use Minitab to run a regression analysis. Learn how to read output from Minitab.
M23- Residuals & Minitab 34 Department of ISM, University of Alabama, Can height be predicted using shoe size? Example 3, continued … Step 1? DTDP
M23- Residuals & Minitab 35 Department of ISM, University of Alabama, Can height be predicted using shoe size? Example 3, continued … “Jitter” added in X-direction. Scatterplot Graph Plot … The scatter for each subpopulation is about the same; i.e., there is “constant variance.” Female Male
M23- Residuals & Minitab 36 Department of ISM, University of Alabama, Stat Regression Regression … Y = a + bX Example 3, continued … Method 1
M23- Residuals & Minitab 37 Department of ISM, University of Alabama, Regression Analysis: Height versus Shoe Size The regression equation is Height = Shoe Size Predictor Coef SE Coef T P Constant Shoe Siz S = R-Sq = 79.1% R-Sq(adj) = 79.0% Analysis of Variance Source DF SS MS F P Regression Error Total Can height be predicted using shoe size? Example 3, continued … Copied from “Session Window.”
M23- Residuals & Minitab 38 Department of ISM, University of Alabama, Regression Analysis: Height versus Shoe Size The regression equation is Height = Shoe Size Predictor Coef SE Coef T P Constant Shoe Siz S = R-Sq = 79.1% R-Sq(adj) = 79.0% Analysis of Variance Source DF SS MS F P Regression Error Total Can height be predicted using shoe size? Example 3, continued … Least squares estimated coefficients. Total “Degrees of Freedom” = Number of cases - 1
M23- Residuals & Minitab 39 Department of ISM, University of Alabama, Regression Analysis: Height versus Shoe Size The regression equation is Height = Shoe Size Predictor Coef SE Coef T P Constant Shoe Siz S = R-Sq = 79.1% R-Sq(adj) = 79.0% Analysis of Variance Source DF SS MS F P Regression Error Total Can height be predicted using shoe size? Example 3, continued … R-Sq = SSR TSS =
M23- Residuals & Minitab 40 Department of ISM, University of Alabama, Regression Analysis: Height versus Shoe Size The regression equation is Height = Shoe Size Predictor Coef SE Coef T P Constant Shoe Siz S = R-Sq = 79.1% R-Sq(adj) = 79.0% Analysis of Variance Source DF SS MS F P Regression Error Total Can height be predicted using shoe size? Example 3, continued … S = MSE= 3.8 Standard Error of Regression. Standard Error of Regression. Measure of variation around the regression line. Mean Squared Error MSE Sum of squared residuals
M23- Residuals & Minitab 41 Department of ISM, University of Alabama, Are there any problems visible in this plot? ___________ No “Jitter” added. Can height be predicted using shoe size? Example 3, continued …
M23- Residuals & Minitab 42 Department of ISM, University of Alabama, Can height be predicted using shoe size? Example 3, continued … r-square = 79.1%, Std. error = inches Least squares regression equation: Height = Shoe The two summary measures that should always be given with the equation.
M23- Residuals & Minitab 43 Department of ISM, University of Alabama, Stat Regression Fitted Line Plot … Y = a + bX Can height be predicted using shoe size? Example 3, continued … This program gives a scatterplot with the regression superimposed on it. Method 2
M23- Residuals & Minitab 44 Department of ISM, University of Alabama, Can height be predicted using shoe size? Example 3, continued … The fit looks The fit looks
M23- Residuals & Minitab 45 Department of ISM, University of Alabama, Regression Analysis: Height versus Shoe Size The regression equation is Height = Shoe Size Predictor Coef SE Coef T P Constant Shoe Siz S = R-Sq = 79.1% R-Sq(adj) = 79.0% Analysis of Variance Source DF SS MS F P Regression Error Total Can height be predicted using shoe size? Example 3, continued … What information do these values provide?
M23- Residuals & Minitab 46 Department of ISM, University of Alabama, How do you determine if the X-variable is a useful predictor? Use the “t-statistic” or the F-stat. “t” measures how many standard errors the estimated coefficient is from “zero.” “F” = t 2 for simple regression. 1
M23- Residuals & Minitab 47 Department of ISM, University of Alabama, A “P-value” is associated with “t” and “F”. The further “t” and “F” are from zero, in either direction, the smaller the corresponding P-value will be. P-value: a measure of the “likelihood that the true coefficient IS ZERO.” How do you determine if the X-variable is a useful predictor? 2
M23- Residuals & Minitab 48 Department of ISM, University of Alabama, If the P-value is NOT SMALL (i.e., “> 0.10”), then conclude: 1. For all practical purposes the true coefficient MAY BE ZERO; therefore 2. The X variable IS NOT a useful predictor of the Y variable. Don’t use it. then conclude: 1. It is unlikely that the true coefficient is really zero, and therefore, 2. The X variable IS a useful predictor for the Y variable. Keep the variable! If the P-value IS SMALL (typically “< 0.10”), 3
M23- Residuals & Minitab 49 Department of ISM, University of Alabama, Regression Analysis: Height versus Shoe Size The regression equation is Height = Shoe Size Predictor Coef SE Coef T P Constant Shoe Siz S = R-Sq = 79.1% R-Sq(adj) = 79.0% Analysis of Variance Source DF SS MS F P Regression Error Total P-value: a measure of the likelihood that the true coefficient is “zero.” “t” measures how many standard errors the estimated coefficient is from “zero.” Can height be predicted using shoe size? Example 3, continued … The P-value for Shoe Size IS SMALL (< 0.10). Conclusion: The “shoe size” coefficient is NOT zero! “Shoe size” IS a useful predictor of the mean of “height”. The P-value for Shoe Size IS SMALL (< 0.10). Conclusion: The “shoe size” coefficient is NOT zero! “Shoe size” IS a useful predictor of the mean of “height”. Could “shoe size” have a true coefficient that is actually “zero”?
M23- Residuals & Minitab 50 Department of ISM, University of Alabama, The logic just explained is statistical inference. This will be covered in more detail during the last three weeks of the course.