Differences-in-Differences

Slides:



Advertisements
Similar presentations
RESEARCH DESIGN.
Advertisements

Class 18 – Thursday, Nov. 11 Omitted Variables Bias
1 Difference in Difference Models Bill Evans Spring 2008.
Random Assignment Experiments
Presented by Malte Lierl (Yale University).  How do we measure program impact when random assignment is not possible ?  e.g. universal take-up  non-excludable.
The counterfactual logic for public policy evaluation Alberto Martini hard at first, natural later 1.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Differences-in-Differences
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Impact Evaluation Methods. Randomized Trials Regression Discontinuity Matching Difference in Differences.
Difference in difference models
Differences-in- Differences November 10, 2009 Erick Gong Thanks to Null & Miguel.
Economics Prof. Buckles1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
1 Qualitative Independent Variables Sometimes called Dummy Variables.
Clustered or Multilevel Data
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
1 Difference in Difference Models Bill Evans Spring 2008.
1Prof. Dr. Rainer Stachuletz Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Non-Experimental Methods Florence Kondylis.
Jon Curwin and Roger Slater, QUANTITATIVE METHODS: A SHORT COURSE ISBN © Cengage Chapter 2: Basic Sums.
Fractions, Decimals, and Percent By: Jack M 1 _____ %
University Prep Math Patricia van Donkelaar Course Outcomes By the end of the course,
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Multiple Regression 3 Sociology 5811 Lecture 24 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Application 3: Estimating the Effect of Education on Earnings Methods of Economic Investigation Lecture 9 1.
Latent Growth Modeling Byrne Chapter 11. Latent Growth Modeling Measuring change over repeated time measurements – Gives you more information than a repeated.
Applying impact evaluation tools A hypothetical fertilizer project.
Non-experimental methods Markus Goldstein The World Bank DECRG & AFTPM.
Instrumental Variables: Introduction Methods of Economic Investigation Lecture 14.
AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Laura Chioda.
Differences-in- Differences. Identifying Assumption Whatever happened to the control group over time is what would have happened to the treatment group.
Economics 20 - Prof. Anderson1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Designs for Experiments with More Than One Factor When the experimenter is interested in the effect of multiple factors on a response a factorial design.
Experimental Evaluations Methods of Economic Investigation Lecture 4.
Differences-in-Differences
Chapter 15 Panel Data Models.
Quasi Experimental Methods I
The greatest mathematical tool of all!!
QM222 Class 13 Section D1 Omitted variable bias (Chapter 13.)
Econometrics ITFD Week 8.
business analytics II ▌panel data models
Quasi Experimental Methods I
Identification: Difference-in-Difference estimator
Experimental Research Designs
Difference-in-Differences
Economics 172 Issues in African Economic Development
QM222 Class 18 Omitted Variable Bias
Making the Most out of Discontinuities
Quasi-Experimental Methods
Chapter Eight: Quantitative Methods
Economics 172 Issues in African Economic Development
Differences-in-Differences
Methods of Economic Investigation Lecture 12
Scatter Plots of Data with Various Correlation Coefficients
Cross Sectional Designs
Economics 20 - Prof. Anderson
Impact Evaluation Methods
Impact Evaluation Toolbox
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Impact Evaluation Methods: Difference in difference & Matching
Sampling and Power Slides by Jishnu Das.
LONG MULTIPLICATION is just multiplying two numbers.
Improving Overlap Farrokh Alemi, Ph.D.
The Productivity Effects of Privatization Longitudinal Estimates using Comprehensive Manufacturing Firm Data from Hungary, Romania, Russia, and Ukraine.
Positive analysis in public finance
Chapter 9 Dummy Variables Undergraduated Econometrics Page 1
Simultaneous Equations
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Presentation transcript:

Differences-in-Differences ECON 3386 Anant Nyshadham

Math Review (1) Yi = a + bTi + cXi + ei for those of you looking at these slides later, here’s what we just wrote down: (1) Yi = a + bTi + cXi + ei (2) E(Yi | Ti=1) – E(Yi | Ti=0) = [a + b + cE(Xi | Ti=1) + E(ei | Ti=1)] – [a + 0 + cE(Xi | Ti=0) + E(ei | Ti=0)] = b + c [E(Xi | Ti=1) – E(Xi | Ti=0)] True effect “Omitted variable/selection bias” term

What if we had data from before the program? What if we estimated this equation using data from before the program? (1) Yi = a + bTi + cXi + ei Specifically, what would our estimate of b be?

What if we had data from before the program? What if we estimated this equation using data from before the program? (1) Yi = a + bTi + cXi + ei (2) E(Yi0| Ti1=1) – E(Yi0| Ti1=0) = [a + 0 + cE(Xi0 | Ti1=1) + E(ei0| Ti1=1)] – [a + 0 + cE(Xi0| Ti1=0) + E(ei0| Ti1=0)] = c [E(Xi | Ti=1) – E(Xi | Ti=0)] “Omitted variable/selection bias” term ALL THAT’S LEFT IS THE PROBLEMATIC TERM – HOW COULD THIS BE HELPFUL TO US?

Differences-in-Differences (just what it sounds like) Use two periods of data add second subscript to denote time = {E(Yi1 | Ti1=1) – E(Yi1 | Ti1=0)} (difference btwn T&C, post) – {E(Yi0 | Ti1=1) – E(Yi0 | Ti1=0)} – (difference btwn T&C, pre) = b + c [E(Xi1 | Ti1=1) – E(Xi1 | Ti1=0)] – c [E(Xi0 | Ti1=1) – E(Xi0 | Ti1=0)]

Differences-in-Differences (just what it sounds like) Use two periods of data add second subscript to denote time = {E(Yi1 | Ti1=1) – E(Yi1 | Ti1=0)} (difference btwn T&C, post) – {E(Yi0 | Ti1=1) – E(Yi0 | Ti1=0)} – (difference btwn T&C, pre) = b + c [E(Xi1 | Ti1=1) – E(Xi1 | Ti1=0)] – c [E(Xi0 | Ti1=1) – E(Xi0 | Ti1=0)] = b YAY! Assume differences between X don’t change over time.

Differences-in-Differences, Graphically Treatment Control Pre Post

Differences-in-Differences, Graphically Effect of program using only pre- & post- data from T group (ignoring general time trend). Pre Post

Differences-in-Differences, Graphically Effect of program using only T & C comparison from post-intervention (ignoring pre-existing differences between T & C groups). Pre Post

Differences-in-Differences, Graphically Pre Post

Differences-in-Differences, Graphically Effect of program difference-in-difference (taking into account pre-existing differences between T & C and general time trend). Pre Post

Identifying Assumption Whatever happened to the control group over time is what would have happened to the treatment group in the absence of the program. Effect of program difference-in-difference (taking into account pre-existing differences between T & C and general time trend). Pre Post

Uses of Diff-in-Diff Simple two-period, two-group comparison very useful in combination with other methods Randomization Regression Discontinuity Matching (propensity score) Can also do much more complicated “cohort” analysis, comparing many groups over many time periods

The (Simple) Regression Yi,t = a + bTreati,t+ cPosti,t + d(Treati,tPosti,t )+ ei,t Treati,t is a binary indicator (“turns on” from 0 to 1) for being in the treatment group Posti,t is a binary indicator for the period after treatment and Treati,tPosti,t is the interaction (product) Interpretation of a, b, c, d is “holding all else constant”

Putting Graph & Regression Together Yi,t = a + bTreati,t+ cPosti,t + d(Treati,tPosti,t )+ ei,t d is the causal effect of treatment a + b + c + d a + b a + c a Pre Post

Putting Graph & Regression Together Yi,t = a + bTreati,t+ cPosti,t + d(Treati,tPosti,t )+ ei,t a + b + c + d Single Diff 2= (a+b+c+d)-(a+c) = (b+d) a + b Single Diff 1= (a+b)-(a)=b a + c a Pre Post

Putting Graph & Regression Together Yi,t = a + bTreati,t+ cPosti,t + d(Treati,tPosti,t )+ ei,t Diff-in-Diff=(Single Diff 2-Single Diff 1)=(b+d)-b=d a + b + c + d Single Diff 2 = (a+b+c+d)-(a+c) = (b+d) a + b Single Diff 1= (a+b)-(a)=b a + c a Pre Post

Cohort Analysis When you’ve got richer data, it’s not as easy to draw the picture or write the equations cross-section (lots of individuals at one point in time) time-series (one individual over lots of time) repeated cross-section (lots of individuals over several times)  panel (lots of individuals, multiple times for each)  Basically, control for each time period and each “group” (fixed effects) – the coefficient on the treatment dummy is the effect you’re trying to estimate

DiD Data Requirements Either repeated cross-section or panel Treatment can’t happen for everyone at the same time If you believe the identifying assumption, then you can analyze policies ex post Let’s us tackle really big questions that we’re unlikely to be able to randomize

Malaria Eradication in the Americas (Bleakley 2007) Question: What is the effect of malaria on economic development? Data: Malaria Eradication in United States South (1920’s) Brazil, Colombia, Mexico (1950’s) Diff-in-Diff: Use birth cohorts (old people vs. young people) & (regions with lots of malaria vs. little malaria) Idea: Young Cohort X Region w/malaria Result: This group higher income & literacy

What’s the intuition Areas with high pre-treatment malaria will most benefit from malaria eradication Young people living in these areas will benefit most (older people might have partial immunity) Comparison Group: young people living in low pre-treatment malaria areas (malaria eradication will have little effect here)

Robustness Checks If possible, use data on multiple pre-program periods to show that difference between treated & control is stable Not necessary for trends to be parallel, just to know function for each If possible, use data on multiple post-program periods to show that unusual difference between treated & control occurs only concurrent with program Alternatively, use data on multiple indicators to show that response to program is only manifest for those we expect it to be (e.g. the diff-in-diff estimate of the impact of ITN distribution on diarrhea should be zero)

Effect of 2ndary School Construction in Tanzania Villages “Treatment Villages” got 2ndary schools “Control Villages” didn’t Who benefits from 2ndary schools? Young People benefit Older people out of school shouldn’t benefit Effect: (Young People X Treatment Villages)