Review Section on Instrumental Variables Economics 1018 Abby Williamson and Hongyi Li October 11, 2006
Agenda Administrative Issues Instrumental Variables (IV) Review –Why would you need an IV? (Abby) –How do IV work? (Hongyi) –An IV Example (Hongyi) –Considering the Validity of IV (Abby) Questions
Administrative Issues Teaching Fellow contact information and office hours on board TF responsibilities –No weekly sections –1 or 2 concept review sections (including one in November on working with our dataset in Stata) –Answering questions on Stata – we’ll organize on-line forum or listserve for that purpose
What is the IV method for? Big Picture: –This class is interested in the effect of CULTURE ECON. OUTCOMES / ATTITUDES –Typical OLS regression equation: Y i = β 0 + β 1 X 1 + β n X n + u i Econ Devp. = β 0 + β 1 (Culture) + β n X n + u i Econ. Outcome = Constant + β 1 (Measure of Culture) + β n (Controls) + Error Term –Often, with measures of culture we have an endogeneity problem, such that we cannot estimate the effect of X on Y using simple OLS regression.
Internal Validity Problems Independent variables are correlated with the error term. Three types relevant here: –Errors-in-variables –Omitted Variable Bias These 2 usually solved by adding omitted variable or correcting error, but what if no additional data? –Simultaneous Causality (Endogeneity) When X Y AND Y X Simple OLS picks up both effects and produces biased estimate of causal effect.
What is the IV Technique? When you have endogeneity problem, you want to somehow separate out the part of the independent variable that is correlated with the error term. Once that part is separated out, you can get an unbiased causal estimate of the effect of the “uncorrelated portion” of the independent variable on the dependent variable of interest.
IV: basic idea Consider the following regression model: y i = β 0 + β 1 X i + e i Variation in the endogenous regressor X i has two parts -the part that is uncorrelated with the error (“good” variation) -the part that is correlated with the error (“bad” variation) The basic idea behind instrumental variables regression is to isolate the “good” variation and disregard the “bad” variation Slide developed by Shanna Rose, 2005
IV: conditions for a valid instrument The first step is to identify a valid instrument A variable Z i is a valid instrument for the endogenous regressor X i if it satisfies two conditions: 1. Relevance:corr (Z i, X i ) ≠ 0 2.Exogeneity:corr (Z i, e i ) = 0 Slide developed by Shanna Rose, 2005
IV: two-stage least squares The most common IV method is two-stage least squares (2SLS) Stage 1: Decompose X i into the component that can be predicted by Z i and the problematic component X i = 0 + 1 Z i + i Stage 2: Use the predicted value of X i from the first-stage regression to estimate its effect on Y i y i = 0 + 1 X-hat i + i Note: software packages like Stata perform the two stages in a single regression, producing the correct standard errors Slide developed by Shanna Rose
AN IV EXAMPLE: Hongyi’s example presented here in section …
Evaluating Instruments Two conditions: –Instrument Relevance – IV is correlated with the problematic independent variable: corr (Z i, X i ) ≠ 0 –Instrument Exogeneity – IV is NOT correlated with the error term: corr (Z i, e i ) = 0
Evaluating Instruments # POLICE CRIME (Steven Levitt 1997) –Simple OLS gives positive result – increase number of police, increase crime –Why?
Evaluating Instruments # POLICE CRIME (Steven Levitt 1997) –Simple OLS gives positive result – increase number of police, increase crime –Why? Instrument: Was there a mayoral election in the year the measurements were taken? –IV regression gives expected negative result – increase number of police, decrease crime –Why is this a good instrument?
IV: example Two-stage least squares: Stage 1: Decompose police hires into the component that can be predicted by the electoral cycle and the problematic component police i = 0 + 1 election i + i Stage 2: Use the predicted value of police i from the first-stage regression to estimate its effect on crime i crime i = 0 + 1 police-hat i + i Finding: an increased police force reduces violent crime (but has little effect on property crime) Slide developed by Shanna Rose, 2005
Evaluating Instruments AGE OF MARRIAGE EDUCATIONAL OUTCOMES (Erica Field 2004) –Why is using simple OLS problematic here?
Evaluating Instruments AGE OF MARRIAGE EDUCATIONAL OUTCOMES (Erica Field 2004) –Why is using simple OLS problematic here? Instrument: Age of menarche (woman’s first period) –Why is this a good instrument?
IV: number of instruments There must be at least as many instruments as endogenous regressors Let k = number of endogenous regressors m = number of instruments The regression coefficients are exactly identified if m=k (OK) overidentified if m>k (OK) underidentified if m<k (not OK) Slide developed by Shanna Rose, 2005
IV: testing instrument relevance How do we know if our instruments are valid? Recall our first condition for a valid instrument: 1. Relevance: corr (Z i, X i ) ≠ 0 Stock and Watson’s rule of thumb: the first-stage F-statistic testing the hypothesis that the coefficients on the instruments are jointly zero should be at least 10 (for a single endogenous regressor) A small F-statistic means the instruments are “weak” (they explain little of the variation in X) and the estimator is biased Slide developed by Shanna Rose, 2005
IV: testing instrument exogeneity Recall our second condition for a valid instrument: 2. Exogeneity: corr (Z i, e i ) = 0 If you have the same number of instruments and endogenous regressors, it is impossible to test for instrument exogeneity But if you have more instruments than regressors: Overidentifying restrictions test – regress the residuals from the 2SLS regression on the instruments (and any exogenous control variables) and test whether the coefficients on the instruments are all zero Slide developed by Shanna Rose, 2005
IV: drawbacks of this method It can be difficult to find an instrument that is both relevant (not weak) and exogenous Assessment of instrument exogeneity can be highly subjective when the coefficients are exactly identified IV can be difficult to explain to those who are unfamiliar with it Slide developed by Shanna Rose