Difference-in-Differences Week 9
Introduction The fixed-effects strategy requires panel data. Repeated observations on the same individuals. Often, the regressor of interest varies at a more aggregate or group level. Ex: state policies that change over time but are constant across workers within states The source of OVB must be unobserved variables at the state-year level.
Introduction (ii) In some cases, group-level omitted variables can be captured by group-level fixed effects. This approach leads to the DiD identification strategy. DiD is a version of fixed-effects estimation using aggregate data.
Example: Minimum wage and employment Do minimum wage laws hurt employment among low-wage workers? Card & Krueger (1994). They collect data on wages for fast-food workers. Before and after a minimum wage increase In New Jersey. But also in Pannsylvania, where the minimum wage didn’t change.
The setup The panel dimension: The quasi-experiment: The “treatment”: Wages observed in February and November of 1992. The quasi-experiment: NJ increased the minimum wage in April 1992 From $4.25 to $5.05 The “treatment”: the new minimum wage The “treated group”: workers in NJ The “control group”: Your pick! They use Pennsylvania, where the MW did not change in 1992.
The basic idea of DiD An extension of the “differences estimator” that we used in experiments. Compare the post-intervention means of the outcome variable for treatment versus control group. Or, simple OLS without controls. Now, this won’t work since treatment was not assigned at random. Treated and control units may be systematically different! But: DiD allows us to control for time-invariant differences between treated and control units!
The basic idea (ii) DiD is a version of fixed effects estimation using aggregate data. The main assumption: employment is determined by time-invariant state factors and time effects that are common across states.
The equation Yist = as + lt + dDst + eist D is a dummy for high-minimum-wage states and periods. The time dummy captures the average change in employment in PA. The state dummy captures the pre-treatment difference in employment across states. The d captures the change in employment in NH, on top of the change in PA. The DiD estimate of the causal effect of interest.
Card & Krueger (1994) NJ PA NJ-PA 20.44 (0.51) 23.33 (1.35) Employment before 20.44 (0.51) 23.33 (1.35) -2.89 (1.44) Employment after 21.03 (0.52) 21.17 (0.94) -0.14 (1.07) Change in employment 0.59 (0.54) -2.16 (1.25) 2.76 (1.36)
Estimation The DiD coefficient is easily calculated using the sample means. bDiD = DYt – DYc We “control” for: Fixed (time-invariant) factors that vary across states, AND Common factors that vary over time We do NOT control for: Other things changing differentially across states!
Main assumption Key identifying assumption: Employment trends would be the same in both states, in the absence of the treatment. Treatment induces a deviation from the common trend. The states can differ in time-invariant dimensions. Captured by the state fixed-effects. The common trends assumption can be investigated using data on multiple periods.
i) Common trends assumption We can compare the evolution of Y over time in control and treated groups, pre-treatment. If they are not parallel, the control group may not provide a very good “counterfactual” for the treated group in the absence of the treatment. We have to find a better control group!
ii) Regression DiD We can use regression to get the DiD estimator. A panel data regression with group and time fixed-effects. Yist = a + gNJs + ldt + d(NJs *dt) + eist Ex: Our main variable of interest is the indicator for the increase in the minimum wage. The interaction of the treatment indicator and a “post-treatment” dummy. It’s easy to add additional states or periods. Adding the corresponding dummies (fixed effects).
Regression DiD (ii) The regression includes “main effects” for state and year and an interaction for observations from NJ in November. Again, d would be our DiD coefficient of interest. This is a convenient formulation. It makes it easy to include additional states or pre-treatment periods. Just include a dummy for each state and period!
iii) Treatment intensity The “treatment” does not have to be binary. We can allow for differing treatment intensity across units and time. By interacting the “post” dummy with a non-binary treatment variable. Example: Interact minimum wage increases with the fraction of workers likely to be affected in a state. Measured by the pre-treatment proportion of low-wage workers. Card (1992).
iv) Individual-level controls Regression DiD also allows for the inclusion of additional covariates. Both at the group (s) and individual level (i). Only the group-level controls are likely to be a source of OVB. But individual-level controls can increase precision. However, combining micro data with group-level regressors makes inference complicated. How do we adjust standard errors?
v) Leads and lags We we have multiple periods of data, we can test that the effect follows the cause, and not viceversa. Suppose the treatment happens at different points in different states. We can include dummies for both leads and lags. We would expect that the leads are not significant. This also allows to check the pattern of lagged effects.
vi) State-specific trends An alternative check is to include state-specific time trends. Linear, or even quadratic, cubic… This allows T and C groups to follow different trends in a limited way. We would like the effects to be unaffected by these trends. We need multiple pre-treatment periods!
vii) Groups don’t have to be states! Instead of state, s can denote demographic groups. Age groups, marital status, etc. Some of which are affected and some not. Instead of time, t can be cohort or other characteristics. But DiD designs always set up a treatment-control comparison.
viii) Composition effects What if the composition of T and C groups changes as a result of the treatment? Ex: Divorce law affects marriage rates (and divorce rates!). Ex2: Welfare programs for single mothers affect fertility or marriage (or migration!). You would need to check for composition effects carefully!
ix) Higher-order differences Triple differences. Example: Extension of Medicaid coverage. In some states and years, and only for children of certain ages. Outcome: mother’s participation and earninds. Fixed effects for: state-year, age-year, and age-state. More convincing results than exploiting state and time alone.
x) Standard errors Two problems: 1. When combining individual and aggregate variables, the “Moulton problem”. Errors correlated within group. Only a problem when estimating DiD from individual-level data.
2. When panel data, serial correlation. Errors correlated over time within unit. The longer the panel, the worse. Both of these problems imply that our usual stantard errors, that assume independent errors, will be incorrect. The usual OLS variance formula for b is not correct!
Fixes “Cluster” standard errors at the unit (state) level. Problem: we need a large number of units! What if we have few clusters? Base inference on a t-distribution with G-K degrees of freedom, rather than the standard normal. “Block bootstrap”. Not clear, research is under way, no consensus. My advice: try to get more clusters! Hard to publish a paper with only one law change and two groups these days!
“The question of how best to approach the serial correlation problem is currently under study, and a consensus has not yet emerged” (Mostly Harmless)