Identification: Instrumental Variables

Identification: Instrumental Variables
Ziyodullo Parpiev, PhD Delivered at Summer school 2017 Tashkent, Uzbekistan June 16, 2017

Why do we need IV? Internal Validity Problems
Independent variables are correlated with the error term. Three types relevant here: Errors-in-variables Omitted Variable Bias These 2 usually solved by adding omitted variable or correcting error, but what if no additional data? Simultaneous Causality (Endogeneity) When X  Y AND Y  X Simple OLS picks up both effects and produces biased estimate of causal effect. Errors-in-variables – error in measurement of one of variables – Example – systematic incorrect answers on a survey or systematic data coding problem. Ideally would just correct, but if you can’t you might be able to use an IV OVB – when variable that determines Y and is correlated with X is excluded from the regression equation. – Example: - Usually solved by including omitted variable, but when that is impossible, can use IV Simultaneous Causality – Example – Yi = B0 + B1X1 + Ui Xi=A0+A1Y1+Vi (If Ui is negative, this decreases Yi in the first eqn, but also affects the value of Xi in the 2nd eqn. If A1 is positive, a low Yi will lead to a low Xi. So if Ai is positive, Xi and Ui will be correlated.)

What is the IV Technique?
When you have endogeneity problem, you want to somehow separate out the part of the independent variable that is correlated with the error term. Once that part is separated out, you can get an unbiased causal estimate of the effect of the “uncorrelated portion” of the independent variable on the dependent variable of interest. Now Hongyi will lead us through a bit more detailed presentation of how this presentation works.

IV: basic idea Consider the following regression model:
yi = β0 + β1 Xi + ei Variation in the endogenous regressor Xi has two parts the part that is uncorrelated with the error (“good” variation) the part that is correlated with the error (“bad” variation) The basic idea behind instrumental variables regression is to isolate the “good” variation and disregard the “bad” variation

IV: conditions for a valid instrument
The first step is to identify a valid instrument A variable Zi is a valid instrument for the endogenous regressor Xi if it satisfies two conditions: 1. Relevance: corr (Zi , Xi) ≠ 0 Exogeneity: corr (Zi , ei) = 0

IV: two-stage least squares
The most common IV method is two-stage least squares (2SLS) Stage 1: Decompose Xi into the component that can be predicted by Zi and the problematic component Xi = 0 + 1 Zi + i Stage 2: Use the predicted value of Xi from the first-stage regression to estimate its effect on Yi yi = 0 + 1 X-hati + i Note: software packages like Stata perform the two stages in a single regression, producing the correct standard errors

Z as an instrument for X

Clear?

Evaluating Instruments
Two conditions: Instrument Relevance – IV is correlated with the problematic independent variable: corr (Zi , Xi) ≠ 0 Instrument Exogeneity – IV is NOT correlated with the error term: corr (Zi , ei) = 0

# POLICE  CRIME (Steven Levitt 1997) Simple OLS gives positive result – increase number of police, increase crime Why? Problem with simple OLS is that there is a policy response to crime – hire more police – which causes a reverse causality effect

# POLICE  CRIME (Steven Levitt 1997) Simple OLS gives positive result – increase number of police, increase crime Why? Instrument: Was there a mayoral election in the year the measurements were taken? IV regression gives expected negative result – increase number of police, decrease crime Why is this a good instrument? Mayors hire more police in election year – so correlated with independent variable But whether there is a mayoral elections does NOT affect the level of crime.

IV: example Two-stage least squares:
Stage 1: Decompose police hires into the component that can be predicted by the electoral cycle and the problematic component policei = 0 + 1 electioni + i Stage 2: Use the predicted value of policei from the first-stage regression to estimate its effect on crimei crimei = 0 + 1 police-hati + i Finding: an increased police force reduces violent crime (but has little effect on property crime)

IV: number of instruments
There must be at least as many instruments as endogenous regressors Let k = number of endogenous regressors m = number of instruments The regression coefficients are exactly identified if m=k (OK) overidentified if m>k (OK) underidentified if m<k (not OK)

IV: testing instrument relevance
How do we know if our instruments are valid? Recall our first condition for a valid instrument: 1. Relevance: corr (Zi , Xi) ≠ 0 Stock and Watson’s rule of thumb: the first-stage F-statistic testing the hypothesis that the coefficients on the instruments are jointly zero should be at least 10 (for a single endogenous regressor) A small F-statistic means the instruments are “weak” (they explain little of the variation in X) and the estimator is biased

IV: testing instrument exogeneity
Recall our second condition for a valid instrument: 2. Exogeneity: corr (Zi , ei) = 0 If you have the same number of instruments and endogenous regressors, it is impossible to test for instrument exogeneity But if you have more instruments than regressors: Overidentifying restrictions test – regress the residuals from the 2SLS regression on the instruments (and any exogenous control variables) and test whether the coefficients on the instruments are all zero

IV: drawbacks of this method
It can be difficult to find an instrument that is both relevant (not weak) and exogenous Assessment of instrument exogeneity can be highly subjective when the coefficients are exactly identified IV can be difficult to explain to those who are unfamiliar with it

Closing Comments about Instrumental Variables Studies
In general, a lagged value of the endogenous regressor is not a good instrument Traditional structural equation model uses lagged values of X and Y as instruments to break the simultaneity between the current values of X and Y X1 X2 Y1 Y2 These models impose the awfully strong assumption that lagged values of X and Y only affect the outcomes through current values

Closing Comments about Instrumental Variables Studies
Good IV models are generally interesting in their own right, and should not be treated as “tack on” analyses Practice varies widely across disciplines Some researchers write papers about their discovery and application of a “clever” IV for some problem Other researchers “tack on” IV models at the end of their analysis, often poorly, as a way to convince readers that their results are robust

Rules for Good Practice with Instrumental Variables Models
IV models can be very informative, but it’s your job to convince your audience Show the first-stage model diagnostics Even the most clever IV might not be sufficiently strongly related to X to be a useful source of identification Report test(s) of overidentifying restrictions An invalid IV is often worse than no IV at all Report LS endogeneity (DWH) test

Rules for Good Practice with Instrumental Variables Models
Most importantly, TELL A STORY about why a particular IV is a “good instrument” Something to consider when thinking about whether a particular IV is “good” Does the IV, for all intents and purposes, randomize the endogenous regressor?

Identification: Instrumental Variables

Similar presentations

Presentation on theme: "Identification: Instrumental Variables"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Identification: Instrumental Variables

Similar presentations

Presentation on theme: "Identification: Instrumental Variables"— Presentation transcript:

Similar presentations

About project

Feedback