Graduate Methods Master Class

Slides:



Advertisements
Similar presentations
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Advertisements

PANEL DATA 1. Dummy Variable Regression 2. LSDV Estimator
Instrumental Variables Estimation and Two Stage Least Square
Economics 20 - Prof. Anderson1 Panel Data Methods y it = x it k x itk + u it.
Economics 20 - Prof. Anderson
Christopher Dougherty EC220 - Introduction to econometrics (chapter 9) Slideshow: two-stage least squares Original citation: Dougherty, C. (2012) EC220.
REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.
Economics 20 - Prof. Anderson
Economics 20 - Prof. Anderson1 Simultaneous Equations y 1 =  1 y 2 +  1 z 1 + u 1 y 2 =  2 y 1 +  2 z 2 + u 2.
There are at least three generally recognized sources of endogeneity. (1) Model misspecification or Omitted Variables. (2) Measurement Error.
Endogenous Regressors and Instrumental Variables Estimation Adapted from Vera Tabakova, East Carolina University.
Instrumental Variables Estimation and Two Stage Least Square
Sociology 601 Class 17: October 28, 2009 Review (linear regression) –new terms and concepts –assumptions –reading regression computer outputs Correlation.
Lecture 12 (Ch16) Simultaneous Equations Models (SEMs)
SREE workshop march 2010sean f reardon using instrumental variables in education research.
Econ 140 Lecture 241 Simultaneous Equations II Lecture 24.
Pooled Cross Sections and Panel Data II
Prof. Dr. Rainer Stachuletz
Simultaneous Equations Models
1Prof. Dr. Rainer Stachuletz Simultaneous Equations y 1 =  1 y 2 +  1 z 1 + u 1 y 2 =  2 y 1 +  2 z 2 + u 2.
Specific to General Modelling The traditional approach to econometrics modelling was as follows: 1.Start with an equation based on economic theory. 2.Estimate.
Chapter 9 Simultaneous Equations Models. What is in this Chapter? In Chapter 4 we mentioned that one of the assumptions in the basic regression model.
Econ 140 Lecture 231 Simultaneous Equations II Lecture 23.
Topic 3: Regression.
1 Research Method Lecture 11-1 (Ch15) Instrumental Variables Estimation and Two Stage Least Square ©
1Prof. Dr. Rainer Stachuletz Panel Data Methods y it =  0 +  1 x it  k x itk + u it.
1 In a second variation, we shall consider the model shown above. x is the rate of growth of productivity, assumed to be exogenous. w is now hypothesized.
Chapter 11 Simple Regression
Assessing Studies Based on Multiple Regression
JDS Special program: Pre-training1 Carrying out an Empirical Project Empirical Analysis & Style Hint.
Instrumental Variables: Problems Methods of Economic Investigation Lecture 16.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
OLS SHORTCOMINGS Preview of coming attractions. QUIZ What are the main OLS assumptions? 1.On average right 2.Linear 3.Predicting variables and error term.
RCTs and instrumental variables Anna Vignoles University of Cambridge.
Instrumental Variables: Introduction Methods of Economic Investigation Lecture 14.
Christel M. J. Vermeersch November 2006 Session V Instrumental Variables.
Review Section on Instrumental Variables Economics 1018 Abby Williamson and Hongyi Li October 11, 2006.
Simultaneous Equations Models A simultaneous equations model is one in which there are endogenous variables which are determined jointly. e.g. the demand-supply.
Randomized Assignment Difference-in-Differences
10-1 MGMG 522 : Session #10 Simultaneous Equations (Ch. 14 & the Appendix 14.6)
The Instrumental Variables Estimator The instrumental variables (IV) estimator is an alternative to Ordinary Least Squares (OLS) which generates consistent.
INSTRUMENTAL VARIABLES Eva Hromádková, Applied Econometrics JEM007, IES Lecture 5.
IV Estimation Instrumental Variables. Implication Estimate model by OLS and by IV, and compare estimates If But test INDIRECTLY using Wu-Hausman.
Esman M. Nyamongo Central Bank of Kenya
ECON 4009 Labor Economics 2017 Fall By Elliott Fan Economics, NTU
Pooling Cross Sections across Time: Simple Panel Data Methods
Econometrics ITFD Week 8.
Instrumental Variable (IV) Regression
Difference-in-Differences
More on Specification and Data Issues
Simultaneous equation system
STOCHASTIC REGRESSORS AND THE METHOD OF INSTRUMENTAL VARIABLES
More on Specification and Data Issues
Instrumental Variables and Two Stage Least Squares
Pooling Cross Sections across Time: Simple Panel Data Methods
Chapter 6: MULTIPLE REGRESSION ANALYSIS
Economics 20 - Prof. Anderson
Instrumental Variables and Two Stage Least Squares
Migration and the Labour Market
Identification: Instrumental Variables
Some issues in multivariate regression
Instrumental Variables
Simultaneous equation models Prepared by Nir Kamal Dahal(Statistics)
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Instrumental Variables and Two Stage Least Squares
Linear Panel Data Models
Instrumental Variables Estimation and Two Stage Least Squares
More on Specification and Data Issues
Simultaneous Equations Models
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Presentation transcript:

Econometric Approaches to Causal Inference: Difference-in-Differences and Instrumental Variables Graduate Methods Master Class Department of Government, Harvard University February 25, 2005

Overview: diff-in-diffs and IV Data Randomized experiment Observational data or natural experiment Problem We cannot observe the OVB, selection bias, counterfactual (what if simultaneous causality treatment group had not received treatment) Method Difference-in-differences Instrumental variables

Diff-in-diffs: basic idea Suppose we randomly assign treatment to some units (or nature assigns treatment “as if” by random assignment) To estimate the treatment effect, we could just compare the treated units before and after treatment However, we might pick up the effects of other factors that changed around the time of treatment Therefore, we use a control group to “difference out” these confounding factors and isolate the treatment effect

Diff-in-diffs: without regression One approach is simply to take the mean value of each group’s outcome before and after treatment Treatment group Control group Before TB CB After TA CA and then calculate the “difference-in-differences” of the means: Treatment effect = (TA - TB ) - ( CA - CB )

Diff-in-diffs: with regression We can get the same result in a regression framework (which allows us to add regression controls, if needed): yi = β0 + β1 treati + β2 afteri + β3 treati*afteri + ei where treat = 1 if in treatment group, = 0 if in control group after = 1 if after treatment, = 0 if before treatment The coefficient on the interaction term (β3 ) gives us the difference-in-differences estimate of the treatment effect

Diff-in-diffs: with regression To see this, plug zeros and ones into the regression equation: yi = β0 + β1 treati + β2 afteri + β3 treati*afteri + ei Treatment Control Group Group Difference Before β0 + β1 β0 β1 After β0 + β1 + β2 + β3 β0 + β2 β1 + β3 Difference β2 + β3 β2 β3

Diff-in-diffs: example Card and Krueger (1994) What is the effect of increasing the minimum wage on employment at fast food restaurants? Confounding factor: national recession Treatment group = NJ Before = Feb 92 Control group = PA After = Nov 92 FTEi = β0 + β1 NJi + β2 Nov92i + β3 NJi*Nov92i + ei

Diff-in-diffs: example FTEi = β0 + β1 NJi + β2 Nov92i + β3 NJi*Nov92i + e 23.33 -2.89 -2.16 2.75 FTE 23.33 Control group (PA) 21.17 20.44 Treatment group (NJ) 21.03 Time Treatment effect of minimum wage increase = + 2.75 FTE

Diff-in-diff-in-diffs A difference-in-difference-in-differences (DDD) model allows us to study the effect of treatment on different groups If we are concerned that our estimated treatment effect might be spurious, a common robustness test is to introduce a comparison group that should not be affected by the treatment For example, if we want to know how welfare reform has affected labor force participation, we can use a DD model that takes advantage of policy variation across states, and then use a DDD model to study how the policy has affected single versus married women

Diff-in-diffs: drawbacks Diff-in-diff estimation is only appropriate if treatment is random - however, in the social sciences this method is usually applied to data from natural experiments, raising questions about whether treatment is truly random Also, diff-in-diffs typically use several years of serially-correlated data but ignore the resulting inconsistency of standard errors (see Bertrand, Duflo, and Mullainathan 2004)

IV: basic idea Suppose we want to estimate a treatment effect using observational data The OLS estimator is biased and inconsistent (due to correlation between regressor and error term) if there is omitted variable bias selection bias simultaneous causality If a direct solution (e.g. including the omitted variable) is not available, instrumental variables regression offers an alternative way to obtain a consistent estimator

IV: basic idea Consider the following regression model: yi = β0 + β1 Xi + ei Variation in the endogenous regressor Xi has two parts the part that is uncorrelated with the error (“good” variation) the part that is correlated with the error (“bad” variation) The basic idea behind instrumental variables regression is to isolate the “good” variation and disregard the “bad” variation

IV: conditions for a valid instrument The first step is to identify a valid instrument A variable Zi is a valid instrument for the endogenous regressor Xi if it satisfies two conditions: 1. Relevance: corr (Zi , Xi) ≠ 0 2. Exogeneity: corr (Zi , ei) = 0

IV: two-stage least squares The most common IV method is two-stage least squares (2SLS) Stage 1: Decompose Xi into the component that can be predicted by Zi and the problematic component Xi = 0 + 1 Zi + i Stage 2: Use the predicted value of Xi from the first-stage regression to estimate its effect on Yi yi = 0 + 1 X-hati + i Note: software packages like Stata perform the two stages in a single regression, producing the correct standard errors

IV: example Levitt (1997): what is the effect of increasing the police force on the crime rate? This is a classic case of simultaneous causality (high crime areas tend to need large police forces) resulting in an incorrectly- signed (positive) coefficient To address this problem, Levitt uses the timing of mayoral and gubernatorial elections as an instrumental variable Is this instrument valid? Relevance: police force increases in election years Exogeneity: election cycles are pre-determined

IV: example Two-stage least squares: Stage 1: Decompose police hires into the component that can be predicted by the electoral cycle and the problematic component policei = 0 + 1 electioni + i Stage 2: Use the predicted value of policei from the first-stage regression to estimate its effect on crimei crimei = 0 + 1 police-hati + i Finding: an increased police force reduces violent crime (but has little effect on property crime)

IV: number of instruments There must be at least as many instruments as endogenous regressors Let k = number of endogenous regressors m = number of instruments The regression coefficients are exactly identified if m=k (OK) overidentified if m>k (OK) underidentified if m<k (not OK)

IV: testing instrument relevance How do we know if our instruments are valid? Recall our first condition for a valid instrument: 1. Relevance: corr (Zi , Xi) ≠ 0 Stock and Watson’s rule of thumb: the first-stage F-statistic testing the hypothesis that the coefficients on the instruments are jointly zero should be at least 10 (for a single endogenous regressor) A small F-statistic means the instruments are “weak” (they explain little of the variation in X) and the estimator is biased

IV: testing instrument exogeneity Recall our second condition for a valid instrument: 2. Exogeneity: corr (Zi , ei) = 0 If you have the same number of instruments and endogenous regressors, it is impossible to test for instrument exogeneity But if you have more instruments than regressors: Overidentifying restrictions test – regress the residuals from the 2SLS regression on the instruments (and any exogenous control variables) and test whether the coefficients on the instruments are all zero

IV: drawbacks It can be difficult to find an instrument that is both relevant (not weak) and exogenous Assessment of instrument exogeneity can be highly subjective when the coefficients are exactly identified IV can be difficult to explain to those who are unfamiliar with it

Sources Stock and Watson, Introduction to Econometrics Bertrand, Duflo, and Mullainathan, “How Much Should We Trust Differences-in-Differences Estimates?” Quarterly Journal of Economics February 2004 Card and Krueger, "Minimum Wages and Employment: A Case Study of the Fast Food Industry in New Jersey and Pennsylvania," American Economic Review, September 1994 Angrist and Krueger, “Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments,” Journal of Economic Perspectives, Fall 2001 Levitt, “Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crme,” American Economic Review, June 1997