1 Reg12M G Multiple Regression Week 12 (Monday) Quality Control and Critical Evaluation of Regression Results An example Identifying Residuals Leverage: X(X’X) -1 X’ Residuals (Discrepancies)
2 Reg12M Quality Control and Critical Evaluation of Regression Results Multiple regression programs have the potential to hide details of the data »Regression output provides only summary information »When several variables are considered, bivariate plots may not be immediately revealing Regression diagnostic indicators can help with quality control »Which subjects are not well fit: Discrepancy or Residuals »Which subjects are affecting results: Influence Points
3 Reg12M An example Suppose X were a measure of SES advantage, such as years of education, W is an indictor of social disadvantage, such as immigrant status, and Y is a number of stressful life events Nothing necessarily jumps out from these data as being amiss.
4 Reg12M Example, Continued Here are the regression results Here is the plot of the fit
5 Reg12M Identifying Residuals Outlying residuals should be examined. They may stand out when they are in the center of the X distribution. When the residual in the plot is eliminated, we get the following results: An important question is whether it is proper to delete an outlying point.
6 Reg12M Sometimes Data Errors Don’t Stand Out If an erroneous point in the Y data is associated with an extreme X value, it may not show as a residual. »The OLS fit will be influenced by the bad point We can define an extreme point in the X space as Leverage An observation will have leverage if it has a relatively large value of h i, where this is the diagonal element of the matrix X(X’X) -1 X’
7 Reg12M Leverage: X(X’X) -1 X’ X(X’X) -1 X ’ is an n by n matrix that transforms Y into fitted values, Y »Y = XB = X[(X’X) -1 X’Y] = [X(X’X) -1 X’]Y It is a square, symmetric matrix with a special property: It’s square is itself! »[X(X’X) -1 X’] [X(X’X) -1 X’] =[X(X’X) -1 X’] This so-called hat matrix can be thought of as a camera that projects an image of data Y on the regression plane. ^ ^ ^
8 Reg12M Standardized Residuals Assume that fitted values of Y have been obtained, »Y= XB Residuals are calculated »E = Y - Y SPSS computes the standardized residual as the ratio of the unstandardized residual to the square root of the MSE. »The square root of MSE is S E »The SPSS standardized residual is E i /[S E ] »Values greater than 3 are suspect ^ ^ ^ ^
9 Reg12M Variance of Regression Estimates Let Y be an n x 1 vector and X be an n x q matrix of predictors: Y = XB + e
10 Reg12M Studentized Residuals The precision of the estimate of the residual varies with the leverage of the observation Instead of comparing a residual to its distribution, compare it to its own standard error »Called Studentized residual »Cohen et al. call this Internally studentized residual The SPSS studentized residual is E i /[S E (1-h i ) 1/2 ]
11 Reg12M Externally Studentized Residuals If the i th residual reveals a mistake, then S E on (n-k-1) df will be too big »Externalized Studentized residuals adjust for this »The regression is re-estimated dropping the i th observation, and a new S E(i) is computed on (n-k-2) df. »The discrepancy of the point is computed by comparing Y i to the fitted Y from the model that excludes to point in question. »The externally studentized residual is e i /[S E(i) (1-h i ) 1/2 ]
12 Reg12M Example Sorted by Discrepancy Measures Both the studentizing and deleting operations tend to increase the size of the standardized residuals
13 Reg12M Discrepancies and Quality Control of Data When carrying out data analysis, it is important to make sure the data are clean. »Initially we look at distributions, scatterplots. »Out-of-range values especially are important Discrepancy analysis is a second order cleaning step »Some discrepancies may be clear errors »Other discrepancies may reveal special populations or circumstances