Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lab 9 – Regression Diagnostics

Similar presentations


Presentation on theme: "Lab 9 – Regression Diagnostics"— Presentation transcript:

1 Lab 9 – Regression Diagnostics
November 12, 2015

2 Are the residuals normally distributed?
open ops2004.dta Drop envhat res regress env_con educat inc com3 hlthprob epht3, beta

3 Are the residuals normally distributed?
predict res, residual (this command provides the residual for each observation based on the last regression conducted) summarize res, detail

4 Are the residuals normally distributed?
sktest res (this command does a normality test based on skewness and kurtosis) Have in mind that the largest the sample the more likely it is to find statistically significant results. But a large sample also means that distribution will approach normality.

5 Are the residuals normally distributed?
rvfplot (plot residuals versus fitted plot) alternative way: Statistics –> Linear models and related –> Regression diagnostics –> Residual-versus-fitted plot

6 Are the residuals normally distributed?
regress env_con educat inc com3 hlthprob epht3, beta predict envhat preserve set seed 111 sample 100, count twoway (scatter env_con envhat) (lfit env_con envhat) restore

7 Are the residuals normally distributed?
Scattergram: predicted versus actual values of the dependent variable

8 What to do when distribution of residuals is problematic?
Option 1: robust regression regress env_con educat inc com3 hlthprob epht3, vce(robust) This command estimates the variance-covariance matrix of errors without assuming normality.

9 What to do when distribution of residuals is problematic?
Difference is on t-values and standard errors.

10 What to do when distribution of residuals is problematic?
Option 2: bootstrap estimation of standard errors regress env_con educat inc com3 hlthprob epht3, vce(bootstrap, reps(1000)) Stata draws several random samples with replacement, and conducts a regression for each of these. Then, with the variances of the distribution of samples, Stata estimates standard errors.

11 What to do when distribution of residuals is problematic?
Again, difference is on t-values and standard errors.

12 Outliers Outliers as cases that we cannot predict.
regress env_con educat inc com3 hlthprob epht3, beta predict yhat predict residual, residual predict rstandard, rstandard list respnum env_con yhat residual rstandard if abs(rstandard) > & rstandard <. Alternative way: Statistics -> Protoestimation -> Predictions -> Predictions and … Z score for two-tailed 0.01 level of significance

13 Outliers

14 Influential cases 2/√N dfbeta
Alternative way: Statistics -> Linear models and related -> Regression diagnostics -> DFBETAs This command the dfbetas as new variables Next: list respnum rstandard _dfbeta_1 if (abs(_dfbeta_1) > 2/sqrt(3769) & _dfbeta_1 <. 2/√N

15 Collinearity and multicollinearity
If two or more independent variables are correlated, we cannot know which of the variables if having an effect on the dependent variable. regress env_con educat inc com3 hlthprob epht3, beta estat vif

16 Collinearity and multicollinearity
Variance inflation factor (VIF): if the value is more than 10 for any variable, or if the average value is substantially greater than 1, there might be a problem. These results do not show any problems. 1/VIF is 1 - R-squared value (1-R2) of the regression of the variable (taken as dependent) on the other independent variables. If 1/VIF is less than 0.10, there might be a problem.


Download ppt "Lab 9 – Regression Diagnostics"

Similar presentations


Ads by Google