Residuals, Influential Points, and Outliers
Objective To develop an understanding of the impact of unusual features in the relationship between two quantitative variables.
Observed y – Predicted y Residual = Observed y – Predicted y for a given value of x Residuals are used in order to find the best LSRL (line of fit)
Residual Plot We use this to decide whether or not the original data actually follows a linear pattern random scatter = true linear relationship
Bad Residual Plots Curved Patterns Increasing or Decreasing spread in scatter
Properties of Residual Plots Always make your y-axis the set of residuals You may use either the x-value or the y-value for you x-axis (though minitab will use x-values as a default). In either case your graph should look the same On your graphing calculator RESID appears in the LIST menu after you have run LinReg(a + bx). Be sure to update LinReg(a + bx) for each new set of data.
Additional Items that can Influence LSRL Outliers Influential Points Leverage
Outliers will create large residuals Large residual changes LSRL Notice that the regression line does not change drastically by an outlier in the y-direction
Leverage: x-value far from the mean
Influential Point An observed value is said to be influential if when it is removed for the data set it would significantly change the value of the LSRL. Most texts will only use outliers with leverage in the x-direction as influential points (in the y-direction they are simply called outliers).
Note: Though it is tempting, we cannot just simply remove outliers or influential point from our data set. The best thing to do is create a LSRL for the data with this point and then without this point. Once you compare these two lines of fit, you will often learn a great deal about the data that your are trying to model.
2000 Presidential Election
Resource: http://arts.bev.net/roperldavid/politics/fl2000.htm