Lurking inferential monsters

Slides:



Advertisements
Similar presentations
REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.
Advertisements

Hypothesis Testing making decisions using sample data.
Chapter 3 Normal Curve, Probability, and Population Versus Sample Part 2.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Lecture 10 Comparison and Evaluation of Alternative System Designs.
© Institute for Fiscal Studies The role of evaluation in social research: current perspectives and new developments Lorraine Dearden, Institute of Education.
Today Concepts underlying inferential statistics
Chapter 7 Probability and Samples: The Distribution of Sample Means
Testing Hypotheses.
Are the results valid? Was the validity of the included studies appraised?
Chapter 8 Introduction to Hypothesis Testing
1 Statistical Inference Greg C Elvers. 2 Why Use Statistical Inference Whenever we collect data, we want our results to be true for the entire population.
Chapter 8 Introduction to Hypothesis Testing
Wednesday, October 12 Correlation and Linear Regression.
Economic evaluation of health programmes Department of Epidemiology, Biostatistics and Occupational Health Class no. 19: Economic Evaluation using Patient-Level.
Introductory Topics PSY Scientific Method.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Analyzing Statistical Inferences How to Not Know Null.
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Chapter 10 The t Test for Two Independent Samples
CHAPTER OVERVIEW Say Hello to Inferential Statistics The Idea of Statistical Significance Significance Versus Meaningfulness Meta-analysis.
Chapter 10 Copyright © Allyn & Bacon 2008 This multimedia product and its contents are protected under copyright law. The following are prohibited by law:
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Chapter 3 Normal Curve, Probability, and Population Versus Sample Part 2 Aug. 28, 2014.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Alexander Spermann University of Freiburg, SS 2008 Matching and DiD 1 Overview of non- experimental approaches: Matching and Difference in Difference Estimators.
Chapter 9 Introduction to the t Statistic
Issues in Selecting Covariates for Propensity Score Adjustment William R Shadish University of California, Merced.
Looking for statistical twins
Statistics & Evidence-Based Practice
Chapter 8 Introducing Inferential Statistics.
Logic of Hypothesis Testing
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.
Approaches to social research Lerum
The comparative self-controlled case series (CSCCS)
Quasi Experimental Methods I
Inference and Tests of Hypotheses
Chapter 21 More About Tests.
Confidence Intervals and p-values
Quasi Experimental Methods I
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
General principles in building a predictive model
chance Learning impeded by two processes: Bias , Chance
Chapter Eight: Quantitative Methods
Matching Methods & Propensity Scores
Matching Methods & Propensity Scores
Methods of Economic Investigation Lecture 12
Stat 217 – Day 28 Review Stat 217.
Matching Methods & Propensity Scores
Sampling and Sample Size Calculations
ELM DICIPE Mozambique Gaza, Nampula, and Tete Midline 2016
Chapter 12 Power Analysis.
Sampling and Power Slides by Jishnu Das.
Statistical Data Analysis
Chapter 7: The Normality Assumption and Inference with OLS
What are their purposes? What kinds?
Inferential Statistics
SAMPLING AND STATISTICAL POWER
Psych 231: Research Methods in Psychology
Understanding Statistical Inferences
How to assess an abstract
Two Halves to Statistics
Testing Hypotheses I Lesson 9.
Sample Sizes for IE Power Calculations.
InferentIal StatIstIcs
Inferential testing.
MGS 3100 Business Analysis Regression Feb 18, 2016
Causal Comparative Research Design
Introduction To Hypothesis Testing
Presentation transcript:

Lurking inferential monsters Lurking inferential monsters? Exploring selection bias in school-based interventions Ben Weidmann benweidmann@g.harvard.edu

A strategy for empirically assessing bias: within-study comparisons We can test for selection bias, by doing a ‘within-study comparison’ when we have experimental data. This idea of is not new [Lalonde 1986] Participate in Program Group 1. Experimental Treatment Group 2. Experimental Control Group 3. Observational Control

Some comments on existing within-study comparisons The within-study comparison literature has generally: Looked at single evaluations [rather than systematically examining a large set, in a particular context] Not a lot about schools [some examples, e.g. Bifulco (2012) on charter schools] Reason to think that school evaluations might be a good context for ‘selection on observables’ In edu evaluations, non-experimental estimates are probably better than in the canonical job-training literature [Cook, Shadish and Wong (2008)]

Research questions and priors ­What is the distribution of selection bias across a range of school-based interventions in the UK? On average, the estimated bias will be between 0.05 and 0.1 s.d. (in effect size units) 2. Are some types of interventions more prone to selection bias than others? Selection bias will be smaller for interventions focussed on: older children Maths (as opposed to literacy) 3. What non-experimental methods are best at recovering experimental estimates? Mahalanobis distance; (rather than just using propensity score) Preference for matches from within same LA; Sub-classification (rather than nearest neighbour) 4. [RELATED WORK: How different are the EEF trial results if they’re reweighted to represent a more general population of students?]

Research questions and limitations ­What is the distribution of selection bias across a range of school-based interventions in the UK? We’re examining a specific selection mechanism (that may not apply to other contexts) 2. Are some types of interventions more prone to selection bias than others? We have relatively few cases (~15) so quantitative analysis will be highly uncertain 3. What non-experimental methods are best at recovering experimental estimates? There are lots of methods that we won’t be testing (e.g. coarsened exact matching)

Estimated Bias (effect size, d) What results might look like? 3 stylised possibilities (for research question 1) Small Mainly positive (or negative) Big -0.5 0.5 Estimated Bias (effect size, d)

Questions and comments welcome!

SIMULATION STUDY

Motivation The UK is hoping to set up a Data Service The goal of the service would be to provide ‘impact estimates’ for programs that are already operating in schools The idea is that organisations contact the Data Service and provide a list of the schools in which they’re operating The Data Service then performs an observational study, using matching The resulting estimate will be fed back to schools and the organisation (and/or used to decide which programs will get a fancy, expensive RCT)

Problem What if this exciting new program is only operating in 1 school? Would we be comfortable providing a ‘Data Service Approved’ impact estimate? Two costs 1. Providing the estimate takes time and money. It’s not worth doing if the estimate is going to be too noisy 2. Although we’ll provide information on uncertainty, sometimes consumers of research (e.g. teachers, journalists, policy makers) might not take these into account But how big should our sample be? Power calculations! [Taking into account the fact that our observational study will have bias]

Goal of my simulation Provide a tool to help decide how big sample sizes need to be to justify providing an official estimate of ‘impact’ Illustrate the power and Type S error (sign error) rates for different, realistic scenarios Power: for a given effect size, the probability we correctly reject 𝐻 0 of no treatment effect Type S errors (sign error): the true effect is negative we confidently conclude that the effect is positive (or vice versa)

Overview of data generating process 𝑌 𝑖 0 = 𝛽 1 𝑋 𝑖 + 𝑈 𝑖 𝑌 𝑖 1 = 𝑌 𝑖 0 +Δ Y is outcome [e.g. standardised reading score at age 11] X is a predictor [e.g. standardised reading score at age 7] 𝑋~𝑁(0,1) U is unobserved characteristics (including error) 𝑈~𝑁(0, 𝜎 𝑈 2 ) ; 𝑋 and 𝑈 are independent Δ is the treatment effect Z is a treatment indicator ∈{0,1} 𝑃 𝑍 𝑖 =1 =Φ( 𝛼 1 𝑈 𝑖 ) The parameter 𝑎 1 determines the extent of ‘bias’ Bias: defined as b = E[U|Z=1]-E[U|Z=0]

Factors for the simulation Inputs into the simulation Sample size R2: in a regression of Y(0)~X Bias

Results (Power)

Results (Type S error)

Conclusions When bias has the opposite sign of the true effect: Bias either reduces power… …or increases the chance that you’re going to make a Type S error When bias and the true effect have the same sign, it helps in terms of power and avoiding Type S errors [although you might make a bad mistake about magnitude] As a general takeaway, if the expected bias is similar in magnitude to the expected effect size, you’re toast [regardless of sample size] unless you have strongly predictive covariates