Instrumental variables Anant Nyshadham
Instrumental Variables What is a natural experiment? “situations where the forces of nature or government policy have conspired to produce an environment somewhat akin to a randomized experiment” Angrist and Krueger (2001, p. 73) Natural experiments can provide a useful source of exogenous variation in problematic regressors But they require detailed institutional knowledge
Instrumental Variables and Natural Experiments Some natural experiments in economics Existing policy differences, or changes that affect some jurisdictions (or groups) but not others Minimum wage rate Excise taxes on consumer goods Unemployment insurance, workers’ compensation Unexpected “shocks” to the local economy Coal prices and the Middle East oil embargo (1973) Agricultural production and adverse weather events
Instrumental Variables and Natural Experiments Some potential pitfalls Not all policy differences/changes are exogenous Political factors and past realizations of the response variable can affect existing policies or policy changes Generalizability of causal effect estimates Results may not generalize beyond the units under study Heterogeneity in causal effects Results may be sensitive to the natural experiment chosen in a specific study (L.A.T.E.)
Instrumental Variables and Natural Experiments Some natural experiments used as IV which are of interest to development economists Acemoglu Johnson & Robinson (2001): settler mortality Paxson (1992): rainfall Schultz & Tansel (1997): healthcare prices
True Model Suppose true model is: Do not observe V Can only estimate: Y = a + bX + cV + e a, b, and c are parameters to be estimated; e is error term Do not observe V Can only estimate: Y = a + bX + e What do we do to get b instead of b?
Methods Y = a + bX + η; η = cV + e Differencing/FE Find groups with common V (assumption), but variation in X Subtract off V to remove it from error term Instrumental Variable Find instrument Z; X = j + kZ + i Predict portion of X which does not correlate with V Use this portion in original estimating equation
IV Criteria and Assumptions Step/Stage 1: X = j + kZ + I X’ = k’Z Step/Stage 2: Y = a + bX’ + η; recover true b Criteria for Z Z must sufficiently predict X: k>>0 or k<<0 Testable using estimate of k from first stage Z must only impact Y through X Cov(Z,η)=0; Cov(Z,V)=0 & Cov(Z,e)=0 Z does not belong original estimation equation Assumption, untestable
An IV example: Angrist and Krueger (1991), J.L.E. Returns to education (Y = wages) Problem of omitted “ability bias” Years of schooling vary by quarter of birth Compulsory schooling laws, age-at-entry rules Someone born in Q1 is a little older and will be able to drop out sooner than someone born in Q4 Q.O.B. can be treated as a useful source of exogeneity in schooling
Angrist and Krueger (1991), J.L.E. People born in Q1 do obtain less schooling But pay close attention to the scale of the y-axis Mean difference between Q1 and Q4 is only 0.124, or 1.5 months So...need large N since R2X,Z will be very small A&K had over 300k for the 1930-39 cohort Source: Angrist and Krueger (1991), Figure I
Angrist and Krueger (1991), J.L.E. Final 2SLS model interacted QOB with year of birth (30), state of birth (150) OLS: b = .0628 (s.e. = .0003) 2SLS: b = .0811 (s.e. = .0109) Least squares estimate does not appear to be badly biased by omitted variables But...replication effort identified some pitfalls in this analysis that are instructive
Bound, Jaeger, and Baker (1995), J.A.S.A. Potential problems with QOB as an IV Correlation between QOB and schooling is weak Small Cov(X,Z) introduces finite-sample bias, which will be exacerbated with the inclusion of many IV’s QOB may not be exogenous (correlated with unobservable determinants of wages, e.g. family income) QOB may not satisfy exclusion restriction (e.g. age relative to peers changes social dynamics, competition, leadership skill etc.)
Bound, Jaeger, and Baker (1995), J.A.S.A. Even if the instrument is “good,” matters can be made far worse with IV as opposed to LS Weak correlation between IV and endogenous regressor can pose severe finite-sample bias And…really large samples won’t help, especially if there is even weak endogeneity between IV and error First-stage diagnostics provide a sense of how good an IV is in a given setting F-test and partial-R2 on IV’s
Useful Diagnostic Tools for IV Models Tests of instrument relevance Weak IV’s → Large variance of bIV as well as potentially severe finite-sample bias Tests of instrument exogeneity Endogenous IV’s → Inconsistency of bIV that makes it no better (and probably worse) than bLS Durbin-Wu-Hausman test Endogeneity of the problem regressor(s)
Tests of Instrument Relevance Diagnostics based on the F-test for the joint significance of the IV’s Nelson and Startz (1990); Staiger and Stock (1997) Bound, Jaeger, and Baker (1995) Partial R-square for the IV’s Shea (1997) There is a growing econometric literature on the “weak instrument” problem
Tests of Instrument Exogeneity Model must be overidentified, i.e., more IV’s than endogenous X’s H0: All IV’s uncorrelated with structural error Overidentification test: 1. Estimate structural model 2. Regress IV residuals on all exogenous variables 3. Compute NR2 and compare to chi-square df = # IV’s – # endogenous X’s
Application: Adolescent Work and Delinquent Behavior Prior research shows a positive correlation between teenage work and delinquency Reasons to suspect serious endogeneity bias 2nd wave of the NLSY97 (N = 8,368) Y = 1 if committed delinquent act (31.9%) X = 1 if worked in a formal job (52.6%) Z1 = 1 if child labor law allows 40+ hours (14.2%) Z2 = 1 if no child labor restriction in place (39.6%)
Regression Model Ignoring Endogeneity . reg pcrime work if nomiss==1 & wave==2 Source | SS df MS Number of obs = 8368 -------------+------------------------------ F( 1, 8366) = 6.33 Model | 1.37395379 1 1.37395379 Prob > F = 0.0119 Residual | 1815.97786 8366 .217066443 R-squared = 0.0008 -------------+------------------------------ Adj R-squared = 0.0006 Total | 1817.35182 8367 .217204711 Root MSE = .4659 ------------------------------------------------------------------------------ pcrime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- work | .0256633 .0102005 2.52 0.012 .0056677 .0456588 _cons | .3053242 .0074009 41.26 0.000 .2908167 .3198318 Teenage workers significantly more delinquent Modest effect but consistent with prior research
First-Stage Model State child labor laws affect probability of work . reg work law40 nolaw if nomiss==1 & wave==2 Source | SS df MS Number of obs = 8368 -------------+------------------------------ F( 2, 8365) = 626.64 Model | 271.829722 2 135.914861 Prob > F = 0.0000 Residual | 1814.33364 8365 .216895832 R-squared = 0.1303 -------------+------------------------------ Adj R-squared = 0.1301 Total | 2086.16336 8367 .249332301 Root MSE = .46572 ------------------------------------------------------------------------------ work | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- law40 | .0688902 .0154383 4.46 0.000 .0386274 .099153 nolaw | .3818684 .0110273 34.63 0.000 .3602521 .4034847 _cons | .3655636 .0074883 48.82 0.000 .3508847 .3802425 State child labor laws affect probability of work This is a really strong first stage (F, R2)
Two-Stage Least Squares Model . ivreg pcrime (work = law40 nolaw) if nomiss==1 & wave==2 Instrumental variables (2SLS) regression Source | SS df MS Number of obs = 8368 -------------+------------------------------ F( 1, 8366) = 6.86 Model | -19.5287923 1 -19.5287923 Prob > F = 0.0088 Residual | 1836.88061 8366 .219564978 R-squared = . -------------+------------------------------ Adj R-squared = . Total | 1817.35182 8367 .217204711 Root MSE = .46858 ------------------------------------------------------------------------------ pcrime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- work | -.0744352 .0284206 -2.62 0.009 -.1301466 -.0187238 _cons | .3580171 .0158135 22.64 0.000 .3270187 .3890155 Instrumented: work Instruments: law40 nolaw
What Do the Models Suggest Thus Far? Completely different conclusions! OLS = Teenage work is criminogenic (b = +.026) Delinquency risk increases by 8.5 percent (base = .305) 2SLS = Teenage work is prophylactic (b = –.074) Delinquency risk decreases by 20.7 percent (base = .358) Which model should we believe? We still have some additional diagnostic work to do to evaluate the 2SLS model Overidentification test
Overidentification Test from the Software Tests of overidentifying restrictions: Sargan N*R-sq test 0.509 Chi-sq(1) P-value = 0.4757 Basmann test 0.508 Chi-sq(1) P-value = 0.4758 IV’s jointly pass the exogeneity requirement Notice that -overid- provides a global test, whereas the regression-based approach allows you to test the IV’s jointly as well as individually
So Where Do We Stand with the Work-Delinquency Question? Are child labor laws correlated with work? YES = first-stage F is large Are child labor laws good IV’s? YES = overidentification test is not rejected Is teenage work endogenous? YES = Hausman test is rejected Prior research findings that teenage work is criminogenic are selection artifacts
Now…What Happens if I Throw in a Potentially Bogus Instrument? Now there are three instrumental variables Z1 = 1 if child labor law allows 40+ hours (14.2%) Z2 = 1 if no child labor restriction in place (39.6%) Z3 = 1 if high unemployment rate in county (20.1%) A little more difficult to tell a convincing story that the unemployment rate is only related to delinquency through work experience But let’s see what happens
First-Stage Model So far so good and consistent with expectation . reg work law40 nolaw highun if nomiss==1 & wave==2 Source | SS df MS Number of obs = 8368 -------------+------------------------------ F( 3, 8364) = 427.28 Model | 277.229696 3 92.4098987 Prob > F = 0.0000 Residual | 1808.93366 8364 .216276144 R-squared = 0.1329 -------------+------------------------------ Adj R-squared = 0.1326 Total | 2086.16336 8367 .249332301 Root MSE = .46505 ------------------------------------------------------------------------------ work | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- law40 | .0636421 .0154519 4.12 0.000 .0333525 .0939317 nolaw | .3775975 .0110447 34.19 0.000 .3559472 .3992479 highun | -.0636009 .0127283 -5.00 0.000 -.0885517 -.0386502 _cons | .3808061 .0080759 47.15 0.000 .3649754 .3966368 So far so good and consistent with expectation
Two-Stage Least Squares Model . ivreg pcrime (work = law40 nolaw highun) if nomiss==1 & wave==2 Instrumental variables (2SLS) regression Source | SS df MS Number of obs = 8368 -------------+------------------------------ F( 1, 8366) = 5.47 Model | -16.0635514 1 -16.0635514 Prob > F = 0.0194 Residual | 1833.41537 8366 .219150773 R-squared = . -------------+------------------------------ Adj R-squared = . Total | 1817.35182 8367 .217204711 Root MSE = .46814 ------------------------------------------------------------------------------ pcrime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- work | -.0657624 .0281159 -2.34 0.019 -.1208765 -.0106483 _cons | .3534516 .0156602 22.57 0.000 .3227537 .3841496 Instrumented: work Instruments: law40 nolaw highun
Post-Hoc Diagnostics Overidentification gives cause for concern Tests of overidentifying restrictions: Sargan N*R-sq test 5.301 Chi-sq(2) P-value = 0.0706 Basmann test 5.301 Chi-sq(2) P-value = 0.0706 . ivendog Tests of endogeneity of: work H0: Regressor is exogenous Wu-Hausman F test: 12.32811 F(1,8365) P-value = 0.00045 Durbin-Wu-Hausman chi-sq test: 12.31438 Chi-sq(1) P-value = 0.00045 Overidentification gives cause for concern The p-value shouldn’t be anywhere near 0.05
Conclusion from Diagnostic Tests 2SLS “work effect” is similar Without unemployment, b = –.074 (s.e. = .028) With unemployment, b = –.066 (s.e. = .028) But…the second model is invalidated because the unemployment rate is not exogenous If affects criminality through other channels We need to control for all other indirect pathways, or… It should not be used as an IV at all
Closing Comments about Instrumental Variables Studies In general, a lagged value of the endogenous regressor is not a good instrument Traditional structural equation model uses lagged values of X and Y as instruments to break the simultaneity between the current values of X and Y X1 X2 Y1 Y2 These models impose the awfully strong assumption that lagged values of X and Y only affect the outcomes through current values
Rules for Good Practice with Instrumental Variables Models IV models can be very informative, but it’s your job to convince your audience Show the first-stage model diagnostics Even the most clever IV might not be sufficiently strongly related to X to be a useful source of identification Report test(s) of overidentifying restrictions An invalid IV is often worse than no IV at all Report LS endogeneity (DWH) test
Rules for Good Practice with Instrumental Variables Models Most importantly, TELL A STORY about why a particular IV is a “good instrument” Something to consider when thinking about whether a particular IV is “good” Does the IV, for all intents and purposes, randomize the endogenous regressor?