Regression in Practice: Observational studies with controls for pretests work better than you think Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103(484), 1334-1344. Quantify how much biased removed by statistical control using pretests in a given setting Sample: Volunteer undergraduates Outcome: Math and vocabulary tests Treatment: basic didactic, showing transparencies defining math concepts Berk R (2005) Randomized experiments as the bronze standard. J Exp Criminol 1:417–433 Berk R, Barnes G, Ahlman L, Kurtz E (2010) When second best is good enough: a comparison between a true experiment and a regression discontinuity quasi-experiment. J Exp Criminol 6:191–208. ‘‘the results from the two approaches are effectively identical’’ page 191. Pohl, S., Steiner, P. M., Eisermann, J., Soellner, R., & Cook, T. D. (2009). Unbiased causal inference from an observational study: Results of a within-study comparison. Educational Evaluation and Policy Analysis, 31(4), 463-479. Concato, J., Shah, N., & Horwitz, R. I. (2000). Randomized, controlled trials, observational studies, and the hierarchy of research designs. New England Journal of Medicine, 342(25), 1887-1889. Cook, T. D., Shadish, S., & Wong, V. A. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons. Journal of Policy and Management. 27 (4), 724–750. Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103(484), 1334-1344. Steiner, Peter M., Thomas D. Cook & William R. Shadish (in press). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. Journal of Educational and Behavioral Statistics. Steiner, Peter M., Thomas D. Cook, William R. Shadish & M.H. Clark (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods. Volume 15, Issue 3. Pages 250-267. more than just pretest. Kane, T., & Staiger, D. (2008). Estimating Teacher Impacts on Student Achievement: An Experimental Evaluation. NBER working paper 14607. 35. Kane, T., & Staiger, D. (2008). Bifulco, Robert . "Can Nonexperimental Estimates Replicate Estimates Based on Random Assignment in Evaluations of School Choice? A Within‐Study Comparison." Journal of Policy Analysis and Management 31, no. 3 (2012): 729-751. Reports 64% to 96% reduced with pre-test Guarino, C. M., Reckase, M. D., & Wooldridge, J. M. (2014). Can value-added measures of teacher performance be trusted?. Retrieved from http://www.econstor.eu/bitstream/10419/62407/1/717898571.pdf Advocates dynamic regression, which is essentially ER(1), control for prior OLS Regression with pretests removes 84% to 94% of bias relative to RCT!! Propensity by strata not quite as good
Supporting findings Berk R (2005) Randomized experiments as the bronze standard. J Exp Criminol 1:417–433 Berk R, Barnes G, Ahlman L, Kurtz E (2010) When second best is good enough: a comparison between a true experiment and a regression discontinuity quasi-experiment. J Exp Criminol 6:191–208. ‘‘the results from the two approaches are effectively identical’’ page 191. Bifulco, Robert . "Can Nonexperimental Estimates Replicate Estimates Based on Random Assignment in Evaluations of School Choice? A Within‐Study Comparison." Journal of Policy Analysis and Management 31, no. 3 (2012): 729-751. Reports 64% to 96% reduced with pre-test Chetty, Raj. John N. Friedman, and Jonah E. Rockoff. Measuring the Impacts of Teachers I: Evaluating Bias in Teacher Value-Added Estimates. American Economic Review 2014, 104(9): 2593–2632 Concato, J., Shah, N., & Horwitz, R. I. (2000). Randomized, controlled trials, observational studies, and the hierarchy of research designs. New England Journal of Medicine, 342(25), 1887-1889. Cook, T. D., P. Steiner, and S. Pohl. 2010. Assessing how bias reduction is influenced by covariate choice, unreliability and data analytic mode: An analysis of different kinds of within-study comparisons in different substantive domains. Multivariate Behavioral Research 44(6): 828-47. Cook, T. D., Shadish, S., & Wong, V. A. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons. Journal of Policy and Management. 27 (4), 724–750. Kane, T., & Staiger, D. (2008). Estimating Teacher Impacts on Student Achievement: An Experimental Evaluation. NBER working paper 14607. Pohl, S., Steiner, P. M., Eisermann, J., Soellner, R., & Cook, T. D. (2009). Unbiased causal inference from an observational study: Results of a within-study comparison. Educational Evaluation and Policy Analysis, 31(4), 463-479. Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103(484), 1334-1344. Steiner, Peter M., Thomas D. Cook & William R. Shadish (in press). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. Journal of Educational and Behavioral Statistics. Steiner, Peter M., Thomas D. Cook, William R. Shadish & M.H. Clark (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods. Volume 15, Issue 3. Pages 250-267. more than just pretest.
Cook, T. D. , P. Steiner, and S. Pohl. 2010 Cook, T. D., P. Steiner, and S. Pohl. 2010. Assessing how bias reduction is influenced by covariate choice, unreliability and data analytic mode: An analysis of different kinds of within-study comparisons in different substantive domains. Multivariate Behavioral Research 44(6): 828-47.
Reviews in Recent studies Need: rich covariates, local comparisons (within districts), multiple time points for pre-tests, multiple content areas for pre-tests Chaplin, D., Mamun, A., Protik, A., Schurrer, J., Vohra, D., Bos, K., Burak, H., Meyer, L., Dumitrescu, A., Ksoll, K., &Cook, T. (2016). Grid Electricity Expansion in Tanzania by MCC: Findings from a Rigorous Impact Evaluation. Working Paper. Princeton, NJ: Mathematica Policy Research. Hallberg, K., Cook, T. D., Steiner, P., & Clark, M. H. (in press[b]). The role of pretests in education observational studies: Evidence from an empirical within study comparison. Prevention Science. Hallberg, K., Wong, V., & Cook T. D. (in press[a]). Evaluating methods for selecting school level comparisons in quasi-experimental designs: results from a within-study comparison. Journal of Public Policy and Management. St. Clair, T., Hallberg, K., & Cook, T. D. (in press). The validity and precision of the comparative interrupted time-series design: three within-study comparisons. Journal of Educational and Behavioral Statistics, 1076998616636854. Wong, V.C., Valentine, J., Miller-Bain, K. (in press) Empirical Performance of Covariates in Education Observational Studies. Journal of Research on Educational Effectiveness.
Criticisms of propensity scores No better than the covariates that go into it no control for unobservables Heckman, 2005; Morgan & Harding, 2006, page 40; Rosenbaum, 2002, page 297; Shadish et al., 2002, page 164); How could they be better than covariates? Propensity=f(covariates). Ambivalent about quality of propensity model Group overlap must be substantial Propensity model should not fit too well! implies confounding of covariates and treatment not good enough implies poorly understood treatment mechanism – poor control Short-term biases (2 years) are substantially less than medium term (3 to 5 year) biases—the value of comparison groups may deteriorate Heckman, J. (2005). The Scientific Model of Causality. Sociological Methodology, 35, 1-99. Morgan, S. L. & Harding, D. J. (2006). Matching estimators of causal effects: Prospects and pitfalls in theory and practice. Sociological Methods and Research 35, 3-60. Rosenbaum, P. 2002. Observational Studies. New York: Springer. (Shadish, Cook, & Campbell, 2002) Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston, NY: Houghton Mifflin.
Page 40: Morgan, S. L. & Harding, D. J. (2006) Page 40: Morgan, S. L. & Harding, D. J. (2006). Matching estimators of causal effects: Prospects and pitfalls in theory and practice. Sociological Methods and Research 35, 3-60. Page 101: Jenkins, J.M., Farkas, G., Duncan, G.J., Burchinal, M. and Vandell, D.L., 2016. Head Start at Ages 3 and 4 Versus Head Start Followed by State Pre-K Which Is More Effective?. Educational evaluation and policy analysis, 38(1), pp.88-112.