Regression in Practice: Observational studies with controls for pretests work better than you think Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008).

Slides:



Advertisements
Similar presentations
Thomas D. Cook Northwestern University
Advertisements

Mywish K. Maredia Michigan State University
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
Sources of bias in experiments and quasi-experiments sean f. reardon stanford university 11 december, 2006.
Introduction to Propensity Score Weighting Weimiao Fan 10/10/
Regression Discontinuity Design William Shadish University of California, Merced.
Regression Discontinuity Design Thanks to Sandi Cleveland and Marc Shure (class of 2011) for some of these slides.
Threats to Conclusion Validity. Low statistical power Low statistical power Violated assumptions of statistical tests Violated assumptions of statistical.
Sessions 22 & 23: (Second Best) Alternatives to Randomized Designs Mark W. Lipsey Vanderbilt University IES/NCER Summer Research Training Institute, 2008.
Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.
Using Covariates in Experiments: Design and Analysis STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Advanced Statistics for Interventional Cardiologists.
Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.
Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.
S-005 Intervention research: True experiments and quasi- experiments.
Article Review Cara Carty 09-Mar-06. “Confounding by indication in non-experimental evaluation of vaccine effectiveness: the example of prevention of.
Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.
Ch. 2 Tools of Positive Economics. Theoretical Tools of Public Finance theoretical tools The set of tools designed to understand the mechanics behind.
A Randomized Experiment Comparing Random to Nonrandom Assignment William R Shadish University of California, Merced and M.H. Clark Southern Illinois University,
WWC Standards for Regression Discontinuity Study Designs June 2010 Presentation to the IES Research Conference John Deke ● Jill Constantine.
Can Mental Health Services Reduce Juvenile Justice Involvement? Non-Experimental Evidence E. Michael Foster School of Public Health, University of North.
Quasi Experimental and single case experimental designs
Using School Choice Lotteries to Test Measures of School Effectiveness David Deming Harvard University and NBER.
REBECCA M. RYAN, PH.D. GEORGETOWN UNIVERSITY ANNA D. JOHNSON, M.P.A. TEACHERS COLLEGE, COLUMBIA UNIVERSITY ANNUAL MEETING OF THE CHILD CARE POLICY RESEARCH.
Developing an evaluation of professional development Webinar #2: Going deeper into planning the design 1.
Teacher effectiveness. Kane, Rockoff and Staiger (2007)
Teacher Quality/Effectiveness: Defining, Developing, and Assessing Policies and Practices Part III: Setting Policies around Teacher Quality/Effectiveness.
Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11.
Using Prior Scores to Evaluate Bias in Value-Added Models Raj Chetty, Stanford University and NBER John N. Friedman, Brown University and NBER Jonah Rockoff,
(ARM 2004) 1 INNOVATIVE STATISTICAL APPROACHES IN HSR: BAYESIAN, MULTIPLE INFORMANTS, & PROPENSITY SCORES Thomas R. Belin, UCLA.
Standards of Evidence for Prevention Programs Brian R. Flay, D.Phil. Distinguished Professor Public Health and Psychology University of Illinois at Chicago.
Issues in Selecting Covariates for Propensity Score Adjustment William R Shadish University of California, Merced.
David M. Murray, Ph.D. Associate Director for Prevention Director, Office of Disease Prevention Multilevel Intervention Research Methodology September.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Chapter 6 Selecting a Design. Research Design The overall approach to the study that details all the major components describing how the research will.
Issues in Evaluating Educational Research
Chapter 11: Quasi-Experimental and Single Case Experimental Designs
Lurking inferential monsters
Module 5 Inference with Panel Data
Ch. 2 Tools of Positive Economics
Lezione di approfondimento su RDD (in inglese)
An Empirical Test of the Regression Discontinuity Design
William R. Shadish University of California, Merced
Stephen W. Raudenbush University of Chicago December 11, 2006
AERA workshop April 4, 2014 (AERA on-line video – cost is $95)
DUET.
Chapter 7 The Hierarchy of Evidence
Dr. Robert H. Meyer Research Professor and Director
Meta-Analysis: Synthesizing evidence
Impact evaluation: The quantitative methods with applications
Sessions 22 & 23: (Second Best) Alternatives to Randomized Designs
Welcome to on-line audience, ask questions with microphone
Impact Evaluation Methods
1 Causal Inference Counterfactuals False Counterfactuals
Research in Psychology
Week 2 Outline Me Barbara Maddie
1/18/2019 1:17:10 AM1/18/2019 1:17:10 AM Discussion of “Strategies for Studying Educational Effectiveness” Mark Dynarski Society for Research on Educational.
The Nonexperimental and Quasi-Experimental Strategies
Impact Evaluation Methods: Difference in difference & Matching
Explanation of slide: Logos, to show while the audience arrive.
Analysing RWE for HTA: Challenges, methods and critique
Non-Experimental designs: Correlational & Quasi-experimental designs
Positive analysis in public finance
Types of Designs: R: Random Assignment of subjects to groups
Alternative Scenarios and Related Techniques
Enhancing Causal Inference in Observational Studies
Regression in Practice: Observational studies with controls for pretests work better than you think Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008).
Enhancing Causal Inference in Observational Studies
The Use of Test Scores in Secondary Analysis
Presentation transcript:

Regression in Practice: Observational studies with controls for pretests work better than you think Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103(484), 1334-1344. Quantify how much biased removed by statistical control using pretests in a given setting Sample: Volunteer undergraduates Outcome: Math and vocabulary tests Treatment: basic didactic, showing transparencies defining math concepts Berk R (2005) Randomized experiments as the bronze standard. J Exp Criminol 1:417–433 Berk R, Barnes G, Ahlman L, Kurtz E (2010) When second best is good enough: a comparison between a true experiment and a regression discontinuity quasi-experiment. J Exp Criminol 6:191–208. ‘‘the results from the two approaches are effectively identical’’ page 191. Pohl, S., Steiner, P. M., Eisermann, J., Soellner, R., & Cook, T. D. (2009). Unbiased causal inference from an observational study: Results of a within-study comparison. Educational Evaluation and Policy Analysis, 31(4), 463-479. Concato, J., Shah, N., & Horwitz, R. I. (2000). Randomized, controlled trials, observational studies, and the hierarchy of research designs. New England Journal of Medicine, 342(25), 1887-1889. Cook, T. D., Shadish, S., & Wong, V. A. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons. Journal of Policy and Management. 27 (4), 724–750. Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103(484), 1334-1344. Steiner, Peter M., Thomas D. Cook & William R. Shadish (in press). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. Journal of Educational and Behavioral Statistics. Steiner, Peter M., Thomas D. Cook, William R. Shadish & M.H. Clark (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods. Volume 15, Issue 3. Pages 250-267. more than just pretest. Kane, T., & Staiger, D. (2008). Estimating Teacher Impacts on Student Achievement: An Experimental Evaluation. NBER working paper 14607. 35. Kane, T., & Staiger, D. (2008). Bifulco, Robert . "Can Nonexperimental Estimates Replicate Estimates Based on Random Assignment in Evaluations of School Choice? A Within‐Study Comparison." Journal of Policy Analysis and Management 31, no. 3 (2012): 729-751. Reports 64% to 96% reduced with pre-test Guarino, C. M., Reckase, M. D., & Wooldridge, J. M. (2014). Can value-added measures of teacher performance be trusted?. Retrieved from http://www.econstor.eu/bitstream/10419/62407/1/717898571.pdf Advocates dynamic regression, which is essentially ER(1), control for prior OLS Regression with pretests removes 84% to 94% of bias relative to RCT!! Propensity by strata not quite as good

Supporting findings Berk R (2005) Randomized experiments as the bronze standard. J Exp Criminol 1:417–433 Berk R, Barnes G, Ahlman L, Kurtz E (2010) When second best is good enough: a comparison between a true experiment and a regression discontinuity quasi-experiment. J Exp Criminol 6:191–208. ‘‘the results from the two approaches are effectively identical’’ page 191. Bifulco, Robert . "Can Nonexperimental Estimates Replicate Estimates Based on Random Assignment in Evaluations of School Choice? A Within‐Study Comparison." Journal of Policy Analysis and Management 31, no. 3 (2012): 729-751. Reports 64% to 96% reduced with pre-test Chetty, Raj. John N. Friedman, and Jonah E. Rockoff. Measuring the Impacts of Teachers I: Evaluating Bias in Teacher Value-Added Estimates. American Economic Review 2014, 104(9): 2593–2632 Concato, J., Shah, N., & Horwitz, R. I. (2000). Randomized, controlled trials, observational studies, and the hierarchy of research designs. New England Journal of Medicine, 342(25), 1887-1889. Cook, T. D., P. Steiner, and S. Pohl. 2010. Assessing how bias reduction is influenced by covariate choice, unreliability and data analytic mode: An analysis of different kinds of within-study comparisons in different substantive domains. Multivariate Behavioral Research 44(6): 828-47.  Cook, T. D., Shadish, S., & Wong, V. A. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons. Journal of Policy and Management. 27 (4), 724–750. Kane, T., & Staiger, D. (2008). Estimating Teacher Impacts on Student Achievement: An Experimental Evaluation. NBER working paper 14607. Pohl, S., Steiner, P. M., Eisermann, J., Soellner, R., & Cook, T. D. (2009). Unbiased causal inference from an observational study: Results of a within-study comparison. Educational Evaluation and Policy Analysis, 31(4), 463-479. Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103(484), 1334-1344. Steiner, Peter M., Thomas D. Cook & William R. Shadish (in press). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. Journal of Educational and Behavioral Statistics. Steiner, Peter M., Thomas D. Cook, William R. Shadish & M.H. Clark (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods. Volume 15, Issue 3. Pages 250-267. more than just pretest.

Cook, T. D. , P. Steiner, and S. Pohl. 2010 Cook, T. D., P. Steiner, and S. Pohl. 2010. Assessing how bias reduction is influenced by covariate choice, unreliability and data analytic mode: An analysis of different kinds of within-study comparisons in different substantive domains. Multivariate Behavioral Research 44(6): 828-47. 

Reviews in Recent studies Need: rich covariates, local comparisons (within districts), multiple time points for pre-tests, multiple content areas for pre-tests Chaplin, D., Mamun, A., Protik, A., Schurrer, J., Vohra, D., Bos, K., Burak, H., Meyer, L., Dumitrescu, A., Ksoll, K., &Cook, T. (2016). Grid Electricity Expansion in Tanzania by MCC: Findings from a Rigorous Impact Evaluation. Working Paper. Princeton, NJ: Mathematica Policy Research. Hallberg, K., Cook, T. D., Steiner, P., & Clark, M. H. (in press[b]). The role of pretests in education observational studies: Evidence from an empirical within study comparison. Prevention Science. Hallberg, K., Wong, V., & Cook T. D. (in press[a]). Evaluating methods for selecting school level comparisons in quasi-experimental designs: results from a within-study comparison. Journal of Public Policy and Management. St. Clair, T., Hallberg, K., & Cook, T. D. (in press). The validity and precision of the comparative interrupted time-series design: three within-study comparisons. Journal of Educational and Behavioral Statistics, 1076998616636854. Wong, V.C., Valentine, J., Miller-Bain, K. (in press) Empirical Performance of Covariates in Education Observational Studies. Journal of Research on Educational Effectiveness.

Criticisms of propensity scores No better than the covariates that go into it no control for unobservables Heckman, 2005; Morgan & Harding, 2006, page 40; Rosenbaum, 2002, page 297; Shadish et al., 2002, page 164); How could they be better than covariates? Propensity=f(covariates). Ambivalent about quality of propensity model Group overlap must be substantial Propensity model should not fit too well! implies confounding of covariates and treatment not good enough implies poorly understood treatment mechanism – poor control Short-term biases (2 years) are substantially less than medium term (3 to 5 year) biases—the value of comparison groups may deteriorate Heckman, J. (2005). The Scientific Model of Causality. Sociological Methodology, 35, 1-99. Morgan, S. L. & Harding, D. J. (2006). Matching estimators of causal effects: Prospects and pitfalls in theory and practice. Sociological Methods and Research 35, 3-60. Rosenbaum, P. 2002. Observational Studies. New York: Springer. (Shadish, Cook, & Campbell, 2002) Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston, NY: Houghton Mifflin.

Page 40: Morgan, S. L. & Harding, D. J. (2006) Page 40: Morgan, S. L. & Harding, D. J. (2006). Matching estimators of causal effects: Prospects and pitfalls in theory and practice. Sociological Methods and Research 35, 3-60. Page 101: Jenkins, J.M., Farkas, G., Duncan, G.J., Burchinal, M. and Vandell, D.L., 2016. Head Start at Ages 3 and 4 Versus Head Start Followed by State Pre-K Which Is More Effective?. Educational evaluation and policy analysis, 38(1), pp.88-112.