Download presentation
Presentation is loading. Please wait.
Published byRosalyn Owen Modified over 8 years ago
1
Anaregweek11 Regression diagnostics
2
Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s D, DFBETAS Variance inflation factor Tolerance
3
NKNW Example NKNW p 389, section 11.1 Y is amount of life insurance X 1 is average annual income X 2 is a risk aversion score n = 18 managers
4
Manajer i Income X i1 Risk X i2 Life Insurance Y i Manajer i Income X i1 Risk X i2 Life Insurance Y i 1 66.290724010 37.408555 2 40.96457311 54.3762130 3 72.9961031112 46.1867112 4 45.01069113 46.130491 5 57.204416214 30.366314 6 26.85251115 39.060563 7 38.12245416 79.3801316 8 35.84065317 52.7668154 9 75.796932618 55.9166164
5
Partial regression plots Also called added variable plots or adjusted variable plots One plot for each X i
6
Partial regression plots (2) Consider X 1 –Use the other X’s to predict Y –Use the other X’s to predict X 1 –Plot the residuals from the first regression vs the residuals from the second regression
7
Partial regression plots (3) These plots can detect –Nonlinear relationships –Heterogeneous variances –Outliers
8
Output Source DF F Value Pr > F Model 2 542.33 <.0001 Error 15 C Total 17 Root MSE 12.66267 R-Square 0.9864
9
Output (2) Par St Var Est Err t Pr > |t| Int -205.72 11 -18.06 <.0001 income 6.288.20 30.80 <.0001 risk 4.738 1.3 3.44 0.0037
10
Plot the residuals vs each Indep Variables From the regression of Y on X 1 and X 2 we plot the residual against each of indep. Variable. The plot of residual against X 1 indicates a curvelinear effect. Therefore, we need to check further by looking at the partial regression plot
11
Plot the residuals vs Risk
12
Plot the residuals vs income
13
The partial regression plots To generate the partial regression plots Regress Y and X 1 each on X 2. Get the residual from each regression namely e(Y|X 2 ) and e(X 1 |X 2 ) Plot e(Y|X 2 ) against e(X 1 |X 2 ) Do the same for Y and X 2 each on X 1.
14
The partial regression plots (2)
15
The partial regression plots(3)
16
Residuals There are several versions –Residuals e i = Y i – Ŷ i –Studentized residuals e i / √MSE –Deleted residuals : d i = e i / (1-h ii ) where h ii is the leverage –Studentized deleted residuals d i * = d i / s(d i ) Where Or equivalenly
17
Residuals (2) We use the notation (i) to indicate that case i has been deleted from the computations X (i) is the X matrix with case i deleted MSE (i) is the MSE with case i deleted
18
Residuals (3) When we examine the residuals we are looking for –Outliers –Non normal error distributions –Influential observations
19
Hat matrix diagonals h ii is a measure of how much Y i is contributing to the prediction Y i (hat) Ŷ 1 = h 11 Y 1 + h 12 Y 2 + h 13 Y 3 + … h ii is sometimes called the leverage of the i th observation
20
Hat matrix diagonals (2) 0 < h ii < 1 Σ h ii = p We would like h ii to be small The average value is p/n Values far from this average point to cases that should be examined carefully
21
Hat diagonals Hat Diag Obs H 1 0.0693 2 0.1006 3 0.1890 4 0.1316 5 0.0756
22
DFFITS A measure of the influence of case i on Ŷ i It is a standardized version of the difference between Ŷ i computed with and without case i It is closely related to h ii
23
Cook’s Distance A measure of the influence of case i on all of the Ŷ i ’s It is a standardized version of the sum of squares of the differences between the predicted values computed with and without case i
24
DFBETAS A measure of the influence of case i on each of the regression coefficients It is a standardized version of the difference between the regression coefficient computed with and without case i
25
Variance Inflation Factor The VIF is related to the variance of the estimated regression coefficients We calculate it for each explanatory variable One suggested rule is that a value of 10 or more indicates excessive multicollinearity
26
Tolerance TOL = (1 – R 2 k ) Where R 2 k is the squared multiple correlation obtained in a regression where all other explanatory variables are used to predict X k TOL = 1/VIF Described in comment on p 411
27
Output (Tolerance) Variable Tolerance Intercept. income 0.93524 risk 0.93524
28
Last slide Read NKNW Chapter 11
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.