. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] S | WEIGHT85 | _cons | Here is a regression of the logarithm of hourly earnings on years of schooling and weight in pounds. The weight coefficient implies than an extra pound leads to 0.24% increase in earnings, so four extra pounds leads to a 1% increase. Can you really believe this? VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS
2 Perhaps not, but the t statistic is very highly significant. What is going on? VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] S | WEIGHT85 | _cons |
3 Older people tend to have more work experience, which increases their earnings. They also tend to weigh more. This could be an explanation. VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] S | WEIGHT85 | _cons |
. reg LGEARN S EXP WEIGHT85 Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] S | EXP | WEIGHT85 | _cons | Here we have controlled for work experience. The weight coefficient is lower, but still almost significant at the 1% level. Can you think of any other variable that might be correlated with both earnings and weight? VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS
. reg LGEARN S EXP MALE WEIGHT85 Source | SS df MS Number of obs = F( 4, 535) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] S | EXP | MALE | WEIGHT85 | _cons | The MALE dummy is such a variable. When it is included, the weight effect disappears. VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS
6 The point of this example is that model misspecification – variable misspecification or indeed any kind of misspecification – in general will invalidate the regression diagnostics, and as a consequence the diagnostics may lead you to the wrong conclusions. VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS. reg LGEARN S EXP MALE WEIGHT85 Source | SS df MS Number of obs = F( 4, 535) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] S | EXP | MALE | WEIGHT85 | _cons |
7 In the original model, we had two kinds of variable misspecification. We omitted EXP and MALE, and we included the irrelevant variable WEIGHT85. VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] S | WEIGHT85 | _cons |
8 Including an irrelevant variable is one of the few types of misspecification that does not lead to the invalidation of the regression diagnostics. However, omitting relevant variables certainly does. This is why the t statistic in the original specification misled us. VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] S | WEIGHT85 | _cons |
Copyright Christopher Dougherty These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 6.3 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics or the University of London International Programmes distance learning course EC2020 Elements of Econometrics