Microeconometric Modeling William Greene Stern School of Business New York University New York NY USA 1.1 Descriptive Statistics and Linear Regression
Linear Regression Model Data Description Linear Regression Model Basic Statistics Tables Histogram Box Plot Kernel Density Estimator Linear Model Specification & Estimation Nonlinearities Interactions Inference - Testing Wald F LM Prediction and Model Fit Endogeneity 2SLS Control Function Hausman Test
Cornwell and Rupert Panel Data Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years Variables in the file are EXP = work experience WKS = weeks worked OCC = occupation, 1 if blue collar, IND = 1 if manufacturing industry SOUTH = 1 if resides in south SMSA = 1 if resides in a city (SMSA) MS = 1 if married FEM = 1 if female UNION = 1 if wage set by union contract ED = years of education LWAGE = log of wage = dependent variable in regressions These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155. 3
Application: Is there a relationship between (log) Wage and Education? Regression: Least squares
A First Look at the Data Descriptive Statistics Basic Measures of Location and Dispersion Graphical Devices Box Plots Histogram Kernel Density Estimator
Histogram for LWAGE
Shows trend in median log wage Box Plots Shows trend in median log wage
From Jones and Schurer (2011) Stylized Box Plot
Kernel Density Estimator
The kernel density estimator is a histogram (of sorts).
From Jones and Schurer (2011)
Objective: Impact of Education on (log) Wage Specification: What is the right model to use to analyze this association? Estimation Inference Analysis
Simple Linear Regression LWAGE = 5.8388 + 0.0652*ED
Multiple Regression
Nonlinear Specification: Quadratic Effect of Experience
A Model Relating (Log)Wage to Gender and Education & Experience
Partial Effects Coefficients do not tell the story Education: .05654 Experience .04045 - 2*.00068*Exp FEM -.38922
Effect of Experience = .04045 - 2*.00068*Exp Positive from 1 to 30, negative after.
Interaction Effect Gender Difference in Partial Effects
Partial Effect of a Year of Education E[logWage]/ED=ED + ED. FEM Partial Effect of a Year of Education E[logWage]/ED=ED + ED*FEM *FEM Note, the effect is positive. Effect is larger for women.
The Gender Effect Varies by Years of Education
Hypothesis Tests About Coefficients Nested Models: Model A: A theory about the world (Alternative) Model 0: A restriction on model A (Null) Model 0 is contained in (nested in) Model A. Hypothesis (for now) Null: Restriction on β: Rβ – q = 0 Alternative: Not the null Approaches Fitting Criterion: R2 decrease under the null? Wald: Rb – q close to 0 under the alternative? LM: Does the null model appear to be inadequate
Model Fit Predict the outcome and assess how well the predictions match the actual. R2 = squared corr.
Endogeneity y = X+ε, Definition: E[ε|x]≠0 Why not? The most common reasons: Omitted variables Unobserved heterogeneity (equivalent to omitted variables) Measurement error on the RHS (equivalent to omitted variables) Endogenous sampling and attrition
Instrumental Variable Estimation One “problem” variable – the “last” one yi = 1x1i + 2x2i + … + KxKi + εi E[εi|xKi] ≠ 0. (0 for all others) There exists a variable zi such that [RELEVANCE] E[xKi| x1i, x2i,…, xK-1,i,zi] = g(x1i, x2i,…, xK-1,i,zi) In the presence of the other variables, zi “explains” xKi [EXOGENEITY] E[εi| x1i, x2i,…, xK-1,i,zi] = 0 In the presence of the other variables, zi and εi are uncorrelated. A projection interpretation: In the projection xKi = θ1x1i,+ θ2x2i + … + θK-1xK-1,i + θK zi, θK ≠ 0.
The Effect of Education on LWAGE
An Exogenous Influence
Instrumental Variables Structure LWAGE (ED,EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) ED (MS, FEM) Reduced Form: LWAGE[ ED (MS, FEM), EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION ]
Two Stage Least Squares Strategy Reduced Form: LWAGE[ ED (MS, FEM,X), EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION ] Strategy (1) Purge ED of the influence of everything but MS, FEM (and the other variables). Predict ED using all exogenous information in the sample (X and Z). (2) Regress LWAGE on this prediction of ED and everything else. Standard errors must be adjusted for the predicted ED
OLS
The extreme result for the coefficient on ED is probably due to the fact that the instruments, MS and FEM are dummy variables. There is not enough variation in these variables.
Source of Endogeneity LWAGE = f(ED, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + ED = f(MS,FEM, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + u
Remove the Endogeneity by Using a Control Function LWAGE = f(ED, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + u + LWAGE = f(ED, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + u + Problem: ED is correlated with (u+) so it is endogenous Strategy Estimate u Add u to the equation. ED is uncorrelated with when u is in the equation.
Auxiliary Regression for ED to Obtain Residuals
OLS with Residual (Control Function) Added 2SLS
A Warning About Control Functions Sum of squares is not computed correctly because U is in the regression. A general result. Control function estimators usually require a fix to the estimated covariance matrix for the estimator.
Endogeneity Test: Wu Considerable complication in Hausman test Simplification: Wu test. Regress y on X and estimated for the endogenous part of X. Then use an ordinary Wald test. Variable addition test
Wu Test
A Regression Based Endogeneity Test
Testing Endogeneity of WKS (1) Regress WKS on 1,EXP,EXPSQ,OCC,SOUTH,SMSA,MS. U=residual, WKSHAT=prediction (2) Regress LWAGE on 1,EXP,EXPSQ,OCC,SOUTH,SMSA,WKS, U or WKSHAT +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| Constant -9.97734299 .75652186 -13.188 .0000 EXP .01833440 .00259373 7.069 .0000 19.8537815 EXPSQ -.799491D-04 .603484D-04 -1.325 .1852 514.405042 OCC -.28885529 .01222533 -23.628 .0000 .51116447 SOUTH -.26279891 .01439561 -18.255 .0000 .29027611 SMSA .03616514 .01369743 2.640 .0083 .65378151 WKS .35314170 .01638709 21.550 .0000 46.8115246 U -.34960141 .01642842 -21.280 .0000 -.341879D-14 WKS .00354028 .00116459 3.040 .0024 46.8115246 WKSHAT .34960141 .01642842 21.280 .0000 46.8115246