Download presentation
Presentation is loading. Please wait.
1
Multiple Linear Regression
Regression Diagnostics
2
Find Scores That Contribute to violation of assumptions.
Are suspect because they are far removed from the centroid (multidimensional mean) Have undue influence on the solution.
3
Outliers Among the Predictors
Leverage, hi or Hat Diagonal The larger this statistic, the greater the distance between the data point and the centroid in p-dimensional space. Investigate cases with hi greater than 2(p-1)/N. p is the number of parameters in the model, including the intercept.
4
Distance from the Regression Surface
Standardized Residual (aka Studentized Residual) Difference between actual Y and predicted Y divided by an appropriate standard error Rstudent (aka Studentized Deleted Residual) – same except for each case the regression surface is that obtained when this individual case is removed. Investigate if greater than 2.
5
Influence on the Solution
Cook’s D – how much would the regression surface change if this case were removed Investigate cases with D > 1. Dfbetas – how much would one parameter (slope or intercept) change if this case were removed Investigate cases with values > 2.
6
SAS Code data regdiag; input SpermCount Together LastEjac @@;
SR_LastEjac = sqrt(1+LastEjac); cards; *<data here>; proc univariate plot; var SpermCount -- SR_LastEjac; proc reg; model SpermCount = Together SR_LastEjac / influence r ; run; *<nonsignificant results>; data culled; set regdiag; If SpermCount < 700; proc reg; model SpermCount = Together SR_LastEjac / influence r ; *<Significant results>; title 'One Outlier Culled'; run;
7
Simple Example Y = sperm count X1 = % time recently spent with mate
X2 = time since last ejaculation Output Statistics Obs Student Residual Cook's D RStudent Hat Diag H DFBETAS Intercept Together SR_LastEjac 5 1.012 0.426 1.0139 0.5551 0.0288 8 -0.183 0.006 0.3605 0.1083 0.0437 9 -1.240 0.098 0.1600 0.0999 10 -1.270 0.261 0.3270 0.4657 11 2.643 1.183 6.9409 0.3369 1.6194 1.0137
8
Leverage Investigate cases with values greater than 2(3)/11 = .55.
Case 5 is above this cutoff. It is a univariate outlier on the LastEjac variable. Further investigation indicates the case is valid, so we retain it.
9
Residuals Case 11 has large residuals, it should be investigated.
Notice that Rstudent is much larger than the standardized residual This indicates that removing this case has a large effect on the solution. Output Statistics Obs Student Residual Cook's D RStudent Hat Diag H DFBETAS Intercept Together SR_LastEjac 11 2.643 1.183 6.9409 0.3369 1.6194 1.0137
10
Influence Case 11 has a high value of Cook’s D.
It has a high Dfbeta for the time since last ejaculation predictor, even after I transformed that variable to reduce skewness. Upon investigation, it was found that this subject did not follow the instructions for gathering the data. His scores were deleted.
11
Plots of Residuals These can also be useful, but
It takes some practice to get good at detecting problems from such plots Plot the residual versus predicted Y
12
Heteroscedasticity
13
Trying Squaring One Predictor
14
Residuals not Normal and Variance not Constant
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.