Statistical Analysis of the Regression-Discontinuity Design
Analysis Requirements l Pre-post l Two-group l Treatment-control (dummy-code) COXOCOOCOXOCOO
Assumptions in the Analysis l Cutoff criterion perfectly followed. l Pre-post distribution is a polynomial or can be transformed to one. l Comparison group has sufficient variance on pretest. l Pretest distribution continuous. l Program uniformly implemented.
The Curvilinearilty Problem pre p o s t e f f If the true pre-post relationship is not linear...
The Curvilinearilty Problem pre p o s t e f f and we fit parallel straight lines as the model...
The Curvilinearilty Problem pre p o s t e f f and we fit parallel straight lines as the model... The result will be biased.
The Curvilinearilty Problem pre p o s t e f f And even if the lines aren’t parallel (interaction effect)...
The Curvilinearilty Problem pre p o s t e f f And even if the lines aren’t parallel (interaction effect)... The result will still be biased.
Model Specification l If you specify the model exactly, there is no bias. l If you overspecify the model (add more terms than needed), the result is unbiased, but inefficient l If you underspecify the model (omit one or more necessary terms, the result is biased.
Model Specification y i = 0 + 1 X i + 2 Z i For instance, if the true function is
Model Specification y i = 0 + 1 X i + 2 Z i For instance, if the true function is And we fit: y i = 0 + 1 X i + 2 Z i + e i
Model Specification y i = 0 + 1 X i + 2 Z i For instance, if the true function is: And we fit: y i = 0 + 1 X i + 2 Z i + e i Our model is exactly specified and we obtain an unbiased and efficient estimate.
Model Specification y i = 0 + 1 X i + 2 Z i On the other hand, if the true function is
Model Specification y i = 0 + 1 X i + 2 Z i On the other hand, if the true model is And we fit: y i = 0 + 1 X i + 2 Z i + 2 X i Z i + e i
Model Specification y i = 0 + 1 X i + 2 Z i On the other hand, if the true function is And we fit: y i = 0 + 1 X i + 2 Z i + 2 X i Z i + e i Our model is overspecified; we included some unnecessary terms, and we obtain an inefficient estimate.
Model Specification y i = 0 + 1 X i + 2 Z i + 2 X i Z i + 2 Z i And finally, if the true function is 2
Model Specification y i = 0 + 1 X i + 2 Z i + 2 X i Z i + 2 Z i And finally, if the true model is And we fit: y i = 0 + 1 X i + 2 Z i + e i 2
Model Specification y i = 0 + 1 X i + 2 Z i + 2 X i Z i + 2 Z i And finally, if the true function is: And we fit: y i = 0 + 1 X i + 2 Z i + e i Our model is underspecified; we excluded some necessary terms, and we obtain a biased estimate. 2
Overall Strategy l Best option is to exactly specify the true function. l We would prefer to err by overspecifying our model because that only leads to inefficiency. l Therefore, start with a likely overspecified model and reduce it.
Steps in the Analysis 1.Transform pretest by subtracting the cutoff. 2.Examine the relationship visually. 3.Specify higher-order terms and interactions. 4.Estimate initial model. 5.Refine the model by eliminating unneeded higher-order terms.
Transform the Pretest l Do this because we want to estimate the jump at the cutoff. l When we subtract the cutoff from x, then x=0 at the cutoff (becomes the intercept). X i = X i - X c ~
Examine Relationship Visually Count the number of flexion points (bends) across both groups...
Examine Relationship Visually Here, there are no bends, so we can assume a linear relationship. Count the number of flexion points (bends) across both groups...
Specify the Initial Model l The rule of thumb is to include polynomials to (number of flexion points) + 2. l Here, there were no flexion points so... l Specify to 0+2 = 2 polynomials (i.E., To the quadratic).
y i = 0 + 1 X i + 2 Z i + 3 X i Z i + 4 X i + 5 X i Z i + e i The RD Analysis Model y i = outcome score for the i th unit 0 =coefficient for the intercept 1 =linear pretest coefficient 2 =mean difference for treatment 3 =linear interaction 4 =quadratic pretest coefficient 5 =quadratic interaction X i =transformed pretest Z i =dummy variable for treatment(0 = control, 1= treatment) e i =residual for the i th unit where: ~~~~ 22
Data to Analyze
Initial (Full) Model The regression equation is posteff = *precut *group *linint *quad quadint Predictor Coef Stdev t-ratio p Constant precut group linint quad quadint s = R-sq = 47.7% R-sq(adj) = 47.1%
Without Quadratic The regression equation is posteff = *precut *group *linint Predictor Coef Stdev t-ratio p Constant precut group linint s = R-sq = 47.5% R-sq(adj) = 47.2%
Final Model The regression equation is posteff = *precut *group Predictor Coef Stdev t-ratio p Constant precut group s = R-sq = 47.5% R-sq(adj) = 47.3%
Final Fitted Model