Difference in Difference 1
Preliminaries Office Hours: Fridays 4-5pm 32Lif, 3.01 I will post slides from class on my website
Diff-in-Diff Principle tool in non-experimental applied micro over the last twenty years Takes idea from experimental literature – control groups – and applies it in non experimental circumstances Suitability of control group is key as control group provides counterfactual
Counterfactuals ✔ Describe, as if to a policymaker with no background in econometrics, what a counterfactual is and why it is important for establishing the impact of a particular program?
Counterfactual Suppose you are interested in assessing the effect of reducing class sizes on children’s final exam grades. You have test scores from students in classes where the class size was reduced starting in the prior year and from students in classes where the size remained the same. Under what conditions would the difference in the average test scores across these two groups be a valid estimate for the effect of reducing class sizes? Explicitly state the counterfactual you need and how it relates to the comparison group you actually have (i.e., students in classes where the size remained the same).
Re-Cap Rubin Causal Model We would like to know the effect of the ‘treatment’ on the treatment group E(Y T i |T) - E(Y C i |T) What do these mean? Do we observe (the sample analogue of) both of these objects?
Re-Cap Rubin Causal Model We would like to know the effect of the ‘treatment’ on the treatment group E(Y T i |T) - E(Y C i |T) What do these mean? We don’t observe (the sample analogue of) E(Y C i |T) Instead we often estimate E(Y T i |T) - E(Y C i |C) What is the problem?
Selection bias Is Occurs when E(Y C i |T) ≠ E(Y C i |C) Note (E(Y T i |T) - E(Y C i |T)) - (E(Y T i |T) - E(Y C i |C)) = E(Y C i |T) ≠ E(Y C i |C) Examples What do we do with Diff-in-Diff? Estimate E(ΔY T i |T) - E(ΔY C i |C) So biased if E(ΔY C i |T) ≠ E(ΔY C i |C) What do we call this assumption
2. Productivity of Cocoa Farmers a)Time Series Estimate? Any good? Over or under-estimate? What is the identification assumption b)Cross Section Estimate? Any good? Over or under-estimate? What is the identification assumption c)DiD Estimate? Any good? What problems has it solved? What is the identification assumption? Do we believe it
Stata Part We are trying to find the effect of the announcement of an Incinerator on house prices. We have two years 1978 and 1981 The treated group are houses within three miles of the incinerator, the control are houses further than three miles
Treatment Effect in reg lrprice nearinc if year==1981, robust Linear regression Number of obs = 142 F( 1, 140) = Prob > F = R-squared = Root MSE = | Robust lrprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] nearinc | _cons | What is the estimated treatment effect? What is the identification assumption? What do we learn? What are the means of houseprice in 1981 for treatment and control groups?
Treatment Effect in 1978 reg lrprice nearinc if year==1978, robust Linear regression Number of obs = 179 F( 1, 177) = Prob > F = R-squared = Root MSE = | Robust lrprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] nearinc | _cons | What is the estimated ‘treatment’ effect? What has this to do with our identification assumption? What do we learn? What are the means of house price in 1981 for treatment and control groups?
Diff-in-Diff Linear regression Number of obs = 321 F( 3, 317) = Prob > F = R-squared = Root MSE = | Robust lrprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] nearinc | y81 | y81_nearinc | _cons | How would we write down the estimating equation? Which is the variable of interest? What is the estimated ‘treatment’ effect? What is our identification assumption? What do we learn? How do the estimated coefficients relate to the previous tables?
Diff-in-Diff plus Controls Linear regression Number of obs = 321 F( 3, 317) = Prob > F = R-squared = Root MSE = | Robust lrprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] nearinc | y81 | y81_nearinc | age | agesq | lintst | lland | larea | rooms | baths | lcbd | _cons | Which is the variable of interest? What is the estimated ‘treatment’ effect? How does this change? What is our identification assumption? How has it changed?
What are the policy implications