Download presentation
Presentation is loading. Please wait.
Published byKerry Mosley Modified over 9 years ago
1
Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals Review for Exam 2
2
Week 5Slide #2 Exercise Review –Use the caschool.dta dataseet –Run a model in Stata using Average Income (avginc) to predict Average Test Scores (testscr) –Examine the univariate distributions of both variables and the residuals Walk through the entire interpretation Build a Stata do-file as you go
3
Week 5Slide #3 Exercise Review, continued
4
Week 5Slide #4 Exercise Review, Continued
5
Week 5Slide #5 Adjusted R 2 : An Alternative “Goodness of Fit” Measure Recall that R 2 is calculated as: Hypothetically, as K approaches n, R 2 approaches one (why?) – “degrees of freedom” Adjusted R 2 compensates for that tendency “explained sum of squares”“total sum of squares”
6
Week 5Slide #6 Calculating Adjusted R 2 The bigger the sample size (n), the smaller the adjustment The more complex the model (the bigger K is), the larger the adjustment The bigger R 2 is, the smaller the adjustment
7
Week 5Slide #7 Residual Analysis: Trouble Shooting Conceptual use of residuals –e, or what the model can’t explain Visual Diagnostics –Ideal: a “Sneeze plot” –Diagnostics using Residual Plots: Checking for heteroscedasticity Checking for non-linearity Checking for outliers Saving and Analyzing Residuals in Stata
8
Week 5Slide #8 Review: Assumptions Necessary for Estimating Linear Models 1.Errors have identical distributions Zero mean, same variance, across the range of X 2.Errors are independent of X and other i 3.Errors are normally distributed i =0 X
9
Week 5Slide #9 The Ideal: Sneeze Splatter e Predicted Y Problems: It is possible to “over-interpret” residual plots; it is also possible to miss patterns when there are large numbers of observations
10
Week 5Slide #10 Heteroscedasticity e Predicted Y Problem: Standard errors are not constant; hypothesis tests invalid
11
Week 5Slide #11 Non-Linearity e Predicted Y Problem: Biased estimated coefficients, inefficient model
12
Week 5Slide #12 Checking for Outliers e Predicted Y Problem: Under-specified model; measurement error Residuals for model using all data Possible Outliers Residuals for model with outliers deleted
13
Week 5Slide #13 Stata Regression Model: Regressing “testscr” onto “avginc”
14
Week 5Slide #14 Regression Plot (again)
15
Week 5Slide #15 Residual Plot
16
Week 5Slide #16 Examination of Residuals gsort e (or you can use “-e”) list observat testscr avginc yhat e in 1/5.list observat testscr avginc yhat e in 1/5 +---------------------------------------------------+ observat testscr avginc yhat e --------------------------------------------------- 1. 393 683.4 13.567 650.8699 32.53016 2. 386 681.6 14.177 652.0157 29.5842 3. 419 672.2 9.952 644.0789 28.12111 4. 366 675.7 11.834 647.6143 28.08568 5. 371 676.95 12.934 649.6807 27.26921 +---------------------------------------------------+ Use the case ID number to find the relevant observation in the data set
17
Week 5Slide #17 Residuals v. Predicted Values Using an “ocular test,” non-linearity seems probable, but heteroscedasticity is not obvious here. But should we trust our eyeballs?
18
Week 5Slide #18 Formal Test for Non-linearity: Omitted Variables Tests whether adding 2nd, 3rd and 4th powers of X will improve the fit of the model: Y=b 0 +b 1 X+b 2 X 2 +b 3 X 3 +b 4 X 4 +e
19
Week 5Slide #19 Formal Tests for Heteroscedasticity Tests to see whether the squared standardized residuals are linearly related to the predicted value of Y: std(e 2 )=b 0 +b 1 (Predicted Y)
20
Week 5Slide #20 Case-wise Influence Analysis The Leverage versus Squared Residual Plot
21
Week 5Slide #21 What to Do? Nonlinearity –Polynomial regression: try X and X 2 –Variable transformation: logged variables –Use non-OLS regression (curve fitting) Heteroscedasticity –Re-specify model Omitted variables? Use non-OLS regression (WLS) Use robust standard errors Influential and Deviant Cases –Evaluate the cases –Run with controls (multivariate model) –Omit cases (last option)
22
Week 5Slide #22 Next Week Review regression diagnostics Introduction to Matrix Algebra Review for Exam
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.