Residuals, Residual Plots, and Influential points Chapter 5 Residuals, Residual Plots, and Influential points
Residuals Vertical distance between data & LSRL Sum of residuals is always zero Residual = observed – expected
Residual Plot Scatterplot of (x, residual) points Purpose: See if x and y have a linear relationship No pattern in the residual plot linear relationship
Linear Not Linear
Residuals Get the regression equation on the home screen (Stat Calc 8) Stat Edit Highlight "L3" List (2nd Stat) , choose 7: RESID Bottom of screen says "L3 = LRESID" Hit Enter Residual Plot: With residuals in L3, make a scatterplottwith: Xlist: L1 Ylist: L3
Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 One measure of the success of knee surgery is post-surgical range of motion for the knee joint following a knee dislocation. Is there a linear relationship between age & range of motion? Sketch a residual plot. x Residuals No pattern in the residual plot, so there is a linear relationship between age and range of motion.
Sketch a residual plot using the y-hat values on the horizontal axis. Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 Sketch a residual plot using the y-hat values on the horizontal axis. Residuals
Residual plots are the same plotted against x or y-hat. Residuals Residuals Residual plots are the same plotted against x or y-hat.
Coefficient of Determination Proportion of variation in y that can be explained by the linear relationship between x & y Remains the same no matter which variable is called x
Sum of the squared residuals (errors) using the mean of y Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 Let’s examine r2. Suppose we want to predict a y-value, but we don't have an x-value. Best guess: the mean of the y’s we have. Sum of the squared residuals (errors) using the mean of y SSEy = 1564.917
Sum of the squared residuals (errors) using the LSRL Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 Now suppose we want to predict a y-value and we DO have an x-value we can use y-hat on the LSRL for that x-value. Sum of the squared residuals (errors) using the LSRL SSEy = 1085.735
Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 SSEy = 1564.917 SSEy = 1085.735 How much does the sum of the squared error go down when we go from the "mean model" to the "regression model"? This is r2 – the amount of the variation in the y-values that is explained by the x-values
Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 How well does age predict the range of motion after knee surgery? Approximately 30.6% of the variation in range of motion after knee surgery can be explained by the LSRL of age and range of motion.
Interpretation of r2 Approximately r2% of the variation in y can be explained by the LSRL of x & y.
What is the equation of the LSRL? Find the slope & y-intercept. Computer-generated regression analysis of knee surgery data: Predictor Coef Stdev T P Constant 107.58 11.12 9.67 0.000 Age 0.8710 0.4146 2.10 0.062 s = 10.42 R-sq = 30.6% R-sq(adj) = 23.7% NEVER use adjusted r2! What are the correlation coefficient and the coefficient of determination? What is the equation of the LSRL? Find the slope & y-intercept.
Outlier In regression, an outlier is a point with a large residual
Influential Point Influences where the LSRL is located If removed, slope of LSRL changes significantly
Racket Resonance Acceleration (Hz) (m/sec/sec) 1 105 36.0 2 106 35.0 3 110 34.5 4 111 36.8 5 112 37.0 6 113 34.0 7 113 34.2 8 114 33.8 9 114 35.0 10 119 35.0 11 120 33.6 12 121 34.2 13 126 36.2 14 189 30.0 One factor in the development of tennis elbow is the impact-induced vibration of the racket and arm at ball contact. Sketch a scatterplot of these data. Calculate the LSRL & correlation coefficient. Does there appear to be an influential point? If so, remove it and calculate the new LSRL & correlation coefficient.
(189, 30) could be influential. Remove & recalculate LSRL.
(189, 30) was influential since it moved the LSRL
Which of these measures are resistant? LSRL Correlation coefficient (r) Coefficient of determination (r2) NONE – all are affected by outliers