Regression in Practice: Observational studies with controls for pretests work better than you think Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008).

Slides:



Advertisements
Similar presentations
1 Chapter 4 The Designing Research Consumer. 2 High Quality Research: Evaluating Research Design High quality evaluation research uses the scientific.
Advertisements

Predictors of Recurrence in Bipolar Disorder: Primary Outcomes From the Systematic Treatment Enhancement Program for Bipolar Disorder (STEP-BD) Dr. Hena.
Thomas D. Cook Northwestern University
Mywish K. Maredia Michigan State University
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
Sources of bias in experiments and quasi-experiments sean f. reardon stanford university 11 december, 2006.
Introduction to Propensity Score Weighting Weimiao Fan 10/10/
Regression Discontinuity Design William Shadish University of California, Merced.
Regression Discontinuity Design Thanks to Sandi Cleveland and Marc Shure (class of 2011) for some of these slides.
Who are the participants? Creating a Quality Sample 47:269: Research Methods I Dr. Leonard March 22, 2010.
Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.
Using Covariates in Experiments: Design and Analysis STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Advanced Statistics for Interventional Cardiologists.
Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.
Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.
S-005 Intervention research: True experiments and quasi- experiments.
Article Review Cara Carty 09-Mar-06. “Confounding by indication in non-experimental evaluation of vaccine effectiveness: the example of prevention of.
Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.
A Randomized Experiment Comparing Random to Nonrandom Assignment William R Shadish University of California, Merced and M.H. Clark Southern Illinois University,
WWC Standards for Regression Discontinuity Study Designs June 2010 Presentation to the IES Research Conference John Deke ● Jill Constantine.
Can Mental Health Services Reduce Juvenile Justice Involvement? Non-Experimental Evidence E. Michael Foster School of Public Health, University of North.
Comments on Tradeoffs and Issues William R. Shadish University of California, Merced.
Using School Choice Lotteries to Test Measures of School Effectiveness David Deming Harvard University and NBER.
REBECCA M. RYAN, PH.D. GEORGETOWN UNIVERSITY ANNA D. JOHNSON, M.P.A. TEACHERS COLLEGE, COLUMBIA UNIVERSITY ANNUAL MEETING OF THE CHILD CARE POLICY RESEARCH.
Developing an evaluation of professional development Webinar #2: Going deeper into planning the design 1.
Teacher Quality/Effectiveness: Defining, Developing, and Assessing Policies and Practices Part III: Setting Policies around Teacher Quality/Effectiveness.
Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11.
Where are we going? What to do if no RCT, RD, ITS or sophisticated matching is possible? We describe and analyze principle of pattern matching to improve.
Using Prior Scores to Evaluate Bias in Value-Added Models Raj Chetty, Stanford University and NBER John N. Friedman, Brown University and NBER Jonah Rockoff,
Standards of Evidence for Prevention Programs Brian R. Flay, D.Phil. Distinguished Professor Public Health and Psychology University of Illinois at Chicago.
Issues in Selecting Covariates for Propensity Score Adjustment William R Shadish University of California, Merced.
David M. Murray, Ph.D. Associate Director for Prevention Director, Office of Disease Prevention Multilevel Intervention Research Methodology September.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Chapter 6 Selecting a Design. Research Design The overall approach to the study that details all the major components describing how the research will.
Issues in Evaluating Educational Research
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.
Approaches to social research Lerum
Chapter 11: Quasi-Experimental and Single Case Experimental Designs
Lurking inferential monsters
Module 5 Inference with Panel Data
Lezione di approfondimento su RDD (in inglese)
Regression in Practice: Observational studies with controls for pretests work better than you think Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008).
An Empirical Test of the Regression Discontinuity Design
William R. Shadish University of California, Merced
Stephen W. Raudenbush University of Chicago December 11, 2006
Research Designs, Threats to Validity and the Hierarchy of Evidence and Appraisal of Limitations (HEAL) Grading System.
AERA workshop April 4, 2014 (AERA on-line video – cost is $95)
Chapter 7 The Hierarchy of Evidence
Dr. Robert H. Meyer Research Professor and Director
Impact evaluation: The quantitative methods with applications
Sessions 22 & 23: (Second Best) Alternatives to Randomized Designs
Building a Strong Outcome Portfolio
Welcome to on-line audience, ask questions with microphone
Impact Evaluation Methods
1 Causal Inference Counterfactuals False Counterfactuals
Quasi-Experimental Design
Research in Psychology
Week 2 Outline Me Barbara Maddie
1/18/2019 1:17:10 AM1/18/2019 1:17:10 AM Discussion of “Strategies for Studying Educational Effectiveness” Mark Dynarski Society for Research on Educational.
Impact Evaluation Methods: Difference in difference & Matching
Explanation of slide: Logos, to show while the audience arrive.
Rerandomization to Improve Baseline Balance in Educational Experiments
Class 2: Evaluating Social Programs
Class 2: Evaluating Social Programs
Analysing RWE for HTA: Challenges, methods and critique
Positive analysis in public finance
Alternative Scenarios and Related Techniques
Misc Internal Validity Scenarios External Validity Construct Validity
The Use of Test Scores in Secondary Analysis
Presentation transcript:

Regression in Practice: Observational studies with controls for pretests work better than you think Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103(484), 1334-1344. Quantify how much biased removed by statistical control using pretests in a given setting Sample: Volunteer undergraduates Outcome: Math and vocabulary tests Treatment: basic didactic, showing transparencies defining math concepts Berk R (2005) Randomized experiments as the bronze standard. J Exp Criminol 1:417–433 Berk R, Barnes G, Ahlman L, Kurtz E (2010) When second best is good enough: a comparison between a true experiment and a regression discontinuity quasi-experiment. J Exp Criminol 6:191–208. ‘‘the results from the two approaches are effectively identical’’ page 191. Pohl, S., Steiner, P. M., Eisermann, J., Soellner, R., & Cook, T. D. (2009). Unbiased causal inference from an observational study: Results of a within-study comparison. Educational Evaluation and Policy Analysis, 31(4), 463-479. Concato, J., Shah, N., & Horwitz, R. I. (2000). Randomized, controlled trials, observational studies, and the hierarchy of research designs. New England Journal of Medicine, 342(25), 1887-1889. Cook, T. D., Shadish, S., & Wong, V. A. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons. Journal of Policy and Management. 27 (4), 724–750. Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103(484), 1334-1344. Steiner, Peter M., Thomas D. Cook & William R. Shadish (in press). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. Journal of Educational and Behavioral Statistics. Steiner, Peter M., Thomas D. Cook, William R. Shadish & M.H. Clark (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods. Volume 15, Issue 3. Pages 250-267. more than just pretest. Kane, T., & Staiger, D. (2008). Estimating Teacher Impacts on Student Achievement: An Experimental Evaluation. NBER working paper 14607. 35. Kane, T., & Staiger, D. (2008). Bifulco, Robert . "Can Nonexperimental Estimates Replicate Estimates Based on Random Assignment in Evaluations of School Choice? A Within‐Study Comparison." Journal of Policy Analysis and Management 31, no. 3 (2012): 729-751. Reports 64% to 96% reduced with pre-test Guarino, C. M., Reckase, M. D., & Wooldridge, J. M. (2014). Can value-added measures of teacher performance be trusted?. Retrieved from http://www.econstor.eu/bitstream/10419/62407/1/717898571.pdf Advocates dynamic regression, which is essentially ER(1), control for prior OLS Regression with pretests removes 84% to 94% of bias relative to RCT!! Propensity by strata not quite as good

Supporting findings Berk R (2005) Randomized experiments as the bronze standard. J Exp Criminol 1:417–433 Berk R, Barnes G, Ahlman L, Kurtz E (2010) When second best is good enough: a comparison between a true experiment and a regression discontinuity quasi-experiment. J Exp Criminol 6:191–208. ‘‘the results from the two approaches are effectively identical’’ page 191. Bifulco, Robert . "Can Nonexperimental Estimates Replicate Estimates Based on Random Assignment in Evaluations of School Choice? A Within‐Study Comparison." Journal of Policy Analysis and Management 31, no. 3 (2012): 729-751. Chetty, Raj. John N. Friedman, and Jonah E. Rockoff. Measuring the Impacts of Teachers I: Evaluating Bias in Teacher Value-Added Estimates. American Economic Review 2014, 104(9): 2593–2632 Concato, J., Shah, N., & Horwitz, R. I. (2000). Randomized, controlled trials, observational studies, and the hierarchy of research designs. New England Journal of Medicine, 342(25), 1887-1889. Cook, T. D., P. Steiner, and S. Pohl. 2010. Assessing how bias reduction is influenced by covariate choice, unreliability and data analytic mode: An analysis of different kinds of within-study comparisons in different substantive domains. Multivariate Behavioral Research 44(6): 828-47.  Cook, T. D., Shadish, S., & Wong, V. A. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons. Journal of Policy and Management. 27 (4), 724–750. Dong, N., & Lipsey, M. W. (2017). Can Propensity Score Analysis Approximate Randomized Experiments Using Pretest and Demographic Information in Pre-K Intervention Research?. Evaluation review, 0193841X17749824. (generally close using PSA, some differences in estimated effects) Kane, T., & Staiger, D. (2008). Estimating Teacher Impacts on Student Achievement: An Experimental Evaluation. NBER working paper 14607. Pohl, S., Steiner, P. M., Eisermann, J., Soellner, R., & Cook, T. D. (2009). Unbiased causal inference from an observational study: Results of a within-study comparison. Educational Evaluation and Policy Analysis, 31(4), 463-479. Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103(484), 1334-1344. Steiner, Peter M., Thomas D. Cook & William R. Shadish (in press). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. Journal of Educational and Behavioral Statistics. Steiner, Peter M., Thomas D. Cook, William R. Shadish & M.H. Clark (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods. Volume 15, Issue 3. Pages 250-267 Wong, V. C., Valentine, J. C., & Miller-Bains, K. (2017). Empirical performance of covariates in education observational studies. Journal of Research on Educational Effectiveness, 10(1), 207-236. Pretests account for 80%-95% of the bias compared to RCT

Wong, V. C. , Valentine, J. C. , & Miller-Bains, K. (2017) Wong, V. C., Valentine, J. C., & Miller-Bains, K. (2017). Empirical performance of covariates in education observational studies. Journal of Research on Educational Effectiveness, 10(1), 207-236. see 218-223 Still, 17% of the bias remained for math and 28% for reading. Here, inclusion of a single pretest measure resulted in 3% remaining bias for math, and 7% for reading Thus, using the pretest measure alone, the remaining bias in the observational study ranged from 11% to 29% for the vocabulary outcome, and for math, it was more than double, ranging from 61% to 71%. Proxy pre-test used

Theory based covariates also helped! Using multiple pre-tests Wong, V. C., Valentine, J. C., & Miller-Bains, K. (2017). Empirical performance of covariates in education observational studies. Journal of Research on Educational Effectiveness, 10(1), 207-236. see 218-223 Theory based covariates also helped!

Cook, T. D. , P. Steiner, and S. Pohl. 2010 Cook, T. D., P. Steiner, and S. Pohl. 2010. Assessing how bias reduction is influenced by covariate choice, unreliability and data analytic mode: An analysis of different kinds of within-study comparisons in different substantive domains. Multivariate Behavioral Research 44(6): 828-47. 

Reviews in Recent studies Need: rich covariates, local comparisons (within districts), multiple time points for pre-tests, multiple content areas for pre-tests Chaplin, D., Mamun, A., Protik, A., Schurrer, J., Vohra, D., Bos, K., Burak, H., Meyer, L., Dumitrescu, A., Ksoll, K., &Cook, T. (2016). Grid Electricity Expansion in Tanzania by MCC: Findings from a Rigorous Impact Evaluation. Working Paper. Princeton, NJ: Mathematica Policy Research. Hallberg, K., Cook, T. D., Steiner, P., & Clark, M. H. (in press[b]). The role of pretests in education observational studies: Evidence from an empirical within study comparison. Prevention Science. Hallberg, K., Wong, V., & Cook T. D. (in press[a]). Evaluating methods for selecting school level comparisons in quasi-experimental designs: results from a within-study comparison. Journal of Public Policy and Management. St. Clair, T., Hallberg, K., & Cook, T. D. (in press). The validity and precision of the comparative interrupted time-series design: three within-study comparisons. Journal of Educational and Behavioral Statistics, 1076998616636854. Wong, V.C., Valentine, J., Miller-Bain, K. (in press) Empirical Performance of Covariates in Education Observational Studies. Journal of Research on Educational Effectiveness.

Criticisms of propensity scores No better than the covariates that go into it no control for unobservables Heckman, 2005; Morgan & Harding, 2006, page 40; Rosenbaum, 2002, page 297; Shadish et al., 2002, page 164); How could they be better than covariates? Propensity=f(covariates). Ambivalent about quality of propensity model Group overlap must be substantial Propensity model should not fit too well! implies confounding of covariates and treatment not good enough implies poorly understood treatment mechanism – poor control Short-term biases (2 years) are substantially less than medium term (3 to 5 year) biases—the value of comparison groups may deteriorate Heckman, J. (2005). The Scientific Model of Causality. Sociological Methodology, 35, 1-99. Morgan, S. L. & Harding, D. J. (2006). Matching estimators of causal effects: Prospects and pitfalls in theory and practice. Sociological Methods and Research 35, 3-60. Rosenbaum, P. 2002. Observational Studies. New York: Springer. (Shadish, Cook, & Campbell, 2002) Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston, NY: Houghton Mifflin.

Page 40: Morgan, S. L. & Harding, D. J. (2006) Page 40: Morgan, S. L. & Harding, D. J. (2006). Matching estimators of causal effects: Prospects and pitfalls in theory and practice. Sociological Methods and Research 35, 3-60. Page 101: Jenkins, J.M., Farkas, G., Duncan, G.J., Burchinal, M. and Vandell, D.L., 2016. Head Start at Ages 3 and 4 Versus Head Start Followed by State Pre-K Which Is More Effective?. Educational evaluation and policy analysis, 38(1), pp.88-112.