Difference-in-Differences

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

Graduate Methods Master Class
Designing an impact evaluation: Randomization, statistical power, and some more fun…
Economics 20 - Prof. Anderson1 Panel Data Methods y it = x it k x itk + u it.
Economics 20 - Prof. Anderson
Welcome to Econ 420 Applied Regression Analysis
Random Assignment Experiments
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Lecture 8 (Ch14) Advanced Panel Data Method
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Omitted Variable Bias Methods of Economic Investigation Lecture 7 1.
Differences-in-Differences
Multiple Linear Regression Model
Economics Prof. Buckles1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Pooled Cross Sections and Panel Data II
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Chapter 4 Multiple Regression.
Clustered or Multilevel Data
Chapter 11 Multiple Regression.
Economics 20 - Prof. Anderson
1 Difference in Difference Models Bill Evans Spring 2008.
1Prof. Dr. Rainer Stachuletz Panel Data Methods y it =  0 +  1 x it  k x itk + u it.
Objectives of Multiple Regression
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
Error Component Models Methods of Economic Investigation Lecture 8 1.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1 G Lect 14a G Lecture 14a Examples of repeated measures A simple example: One group measured twice The general mixed model Independence.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Economics 20 - Prof. Anderson1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Experimental Evaluations Methods of Economic Investigation Lecture 4.
Estimating standard error using bootstrap
Differences-in-Differences
More Multiple Regression
ECON 4009 Labor Economics 2017 Fall By Elliott Fan Economics, NTU
Differences-in-Differences
Multiple Regression Analysis: Further Issues
Pooling Cross Sections across Time: Simple Panel Data Methods
Econometrics ITFD Week 8.
Identification: Difference-in-Difference estimator
12 Inferential Analysis.
Chapter 15 Panel Data Analysis.
Pooling Cross Sections across Time: Simple Panel Data Methods
Matching Methods & Propensity Scores
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Matching Methods & Propensity Scores
Advanced Panel Data Methods
Methods of Economic Investigation Lecture 12
Economics 20 - Prof. Anderson
Multiple Regression Analysis: Further Issues
More Multiple Regression
More Multiple Regression
Chapter 8: Weighting adjustment
Matching Methods & Propensity Scores
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Narrative Reviews Limitations: Subjectivity inherent:
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
12 Inferential Analysis.
Sampling and Power Slides by Jishnu Das.
Product moment correlation
The Productivity Effects of Privatization Longitudinal Estimates using Comprehensive Manufacturing Firm Data from Hungary, Romania, Russia, and Ukraine.
Chapter 13 Additional Topics in Regression Analysis
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Advanced Panel Data Methods
Multiple Regression Berlin Chen
Counterfactual Analysis
Presentation transcript:

Difference-in-Differences Week 9

Introduction The fixed-effects strategy requires panel data. Repeated observations on the same individuals. Often, the regressor of interest varies at a more aggregate or group level. Ex: state policies that change over time but are constant across workers within states The source of OVB must be unobserved variables at the state-year level.

Introduction (ii) In some cases, group-level omitted variables can be captured by group-level fixed effects. This approach leads to the DiD identification strategy. DiD is a version of fixed-effects estimation using aggregate data.

Example: Minimum wage and employment Do minimum wage laws hurt employment among low-wage workers? Card & Krueger (1994). They collect data on wages for fast-food workers. Before and after a minimum wage increase In New Jersey. But also in Pannsylvania, where the minimum wage didn’t change.

The setup The panel dimension: The quasi-experiment: The “treatment”: Wages observed in February and November of 1992. The quasi-experiment: NJ increased the minimum wage in April 1992 From $4.25 to $5.05 The “treatment”: the new minimum wage The “treated group”: workers in NJ The “control group”: Your pick! They use Pennsylvania, where the MW did not change in 1992.

The basic idea of DiD An extension of the “differences estimator” that we used in experiments. Compare the post-intervention means of the outcome variable for treatment versus control group. Or, simple OLS without controls. Now, this won’t work since treatment was not assigned at random. Treated and control units may be systematically different! But: DiD allows us to control for time-invariant differences between treated and control units!

The basic idea (ii) DiD is a version of fixed effects estimation using aggregate data. The main assumption: employment is determined by time-invariant state factors and time effects that are common across states.

The equation Yist = as + lt + dDst + eist D is a dummy for high-minimum-wage states and periods. The time dummy captures the average change in employment in PA. The state dummy captures the pre-treatment difference in employment across states. The d captures the change in employment in NH, on top of the change in PA. The DiD estimate of the causal effect of interest.

Card & Krueger (1994) NJ PA NJ-PA 20.44 (0.51) 23.33 (1.35) Employment before 20.44 (0.51) 23.33 (1.35) -2.89 (1.44) Employment after 21.03 (0.52) 21.17 (0.94) -0.14 (1.07) Change in employment 0.59 (0.54) -2.16 (1.25) 2.76 (1.36)

Estimation The DiD coefficient is easily calculated using the sample means. bDiD = DYt – DYc We “control” for: Fixed (time-invariant) factors that vary across states, AND Common factors that vary over time We do NOT control for: Other things changing differentially across states!

Main assumption Key identifying assumption: Employment trends would be the same in both states, in the absence of the treatment. Treatment induces a deviation from the common trend. The states can differ in time-invariant dimensions. Captured by the state fixed-effects. The common trends assumption can be investigated using data on multiple periods.

i) Common trends assumption We can compare the evolution of Y over time in control and treated groups, pre-treatment. If they are not parallel, the control group may not provide a very good “counterfactual” for the treated group in the absence of the treatment. We have to find a better control group!

ii) Regression DiD We can use regression to get the DiD estimator. A panel data regression with group and time fixed-effects. Yist = a + gNJs + ldt + d(NJs *dt) + eist Ex: Our main variable of interest is the indicator for the increase in the minimum wage. The interaction of the treatment indicator and a “post-treatment” dummy. It’s easy to add additional states or periods. Adding the corresponding dummies (fixed effects).

Regression DiD (ii) The regression includes “main effects” for state and year and an interaction for observations from NJ in November. Again, d would be our DiD coefficient of interest. This is a convenient formulation. It makes it easy to include additional states or pre-treatment periods. Just include a dummy for each state and period!

iii) Treatment intensity The “treatment” does not have to be binary. We can allow for differing treatment intensity across units and time. By interacting the “post” dummy with a non-binary treatment variable. Example: Interact minimum wage increases with the fraction of workers likely to be affected in a state. Measured by the pre-treatment proportion of low-wage workers. Card (1992).

iv) Individual-level controls Regression DiD also allows for the inclusion of additional covariates. Both at the group (s) and individual level (i). Only the group-level controls are likely to be a source of OVB. But individual-level controls can increase precision. However, combining micro data with group-level regressors makes inference complicated. How do we adjust standard errors?

v) Leads and lags We we have multiple periods of data, we can test that the effect follows the cause, and not viceversa. Suppose the treatment happens at different points in different states. We can include dummies for both leads and lags. We would expect that the leads are not significant. This also allows to check the pattern of lagged effects.

vi) State-specific trends An alternative check is to include state-specific time trends. Linear, or even quadratic, cubic… This allows T and C groups to follow different trends in a limited way. We would like the effects to be unaffected by these trends. We need multiple pre-treatment periods!

vii) Groups don’t have to be states! Instead of state, s can denote demographic groups. Age groups, marital status, etc. Some of which are affected and some not. Instead of time, t can be cohort or other characteristics. But DiD designs always set up a treatment-control comparison.

viii) Composition effects What if the composition of T and C groups changes as a result of the treatment? Ex: Divorce law affects marriage rates (and divorce rates!). Ex2: Welfare programs for single mothers affect fertility or marriage (or migration!). You would need to check for composition effects carefully!

ix) Higher-order differences Triple differences. Example: Extension of Medicaid coverage. In some states and years, and only for children of certain ages. Outcome: mother’s participation and earninds. Fixed effects for: state-year, age-year, and age-state. More convincing results than exploiting state and time alone.

x) Standard errors Two problems: 1. When combining individual and aggregate variables, the “Moulton problem”. Errors correlated within group. Only a problem when estimating DiD from individual-level data.

2. When panel data, serial correlation. Errors correlated over time within unit. The longer the panel, the worse. Both of these problems imply that our usual stantard errors, that assume independent errors, will be incorrect. The usual OLS variance formula for b is not correct!

Fixes “Cluster” standard errors at the unit (state) level. Problem: we need a large number of units! What if we have few clusters? Base inference on a t-distribution with G-K degrees of freedom, rather than the standard normal. “Block bootstrap”. Not clear, research is under way, no consensus. My advice: try to get more clusters! Hard to publish a paper with only one law change and two groups these days!

“The question of how best to approach the serial correlation problem is currently under study, and a consensus has not yet emerged” (Mostly Harmless)