School of Education Archive Analysis: On EEF Trials Adetayo Kasim, ZhiMin Xiao, and Steve Higgins.

Slides:



Advertisements
Similar presentations
Analysis by design Statistics is involved in the analysis of data generated from an experiment. It is essential to spend time and effort in advance to.
Advertisements

Assumptions underlying regression analysis
Designing an impact evaluation: Randomization, statistical power, and some more fun…
Hypothesis Testing Steps in Hypothesis Testing:
Building Evidence in Education: Workshop for EEF evaluators 2 nd June: York 6 th June: London
Estimating the Effects of Treatment on Outcomes with Confidence Sebastian Galiani Washington University in St. Louis.
Improving health worldwide George B. Ploubidis The role of sensitivity analysis in the estimation of causal pathways from observational.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.1 CorrelationCorrelation The underlying principle of correlation analysis.
Chapter 10 Simple Regression.
Clustered or Multilevel Data
1 Validation and Verification of Simulation Models.
Lecture 10 Comparison and Evaluation of Alternative System Designs.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
AP Statistics Section 10.2 A CI for Population Mean When is Unknown.
Chapter 11: Inference for Distributions
Heterogeneity in Hedges. Fixed Effects Borenstein et al., 2009, pp
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
SAMPLING AND STATISTICAL POWER Erich Battistin Kinnon Scott Erich Battistin Kinnon Scott University of Padua DECRG, World Bank University of Padua DECRG,
Clt1 CENTRAL LIMIT THEOREM  specifies a theoretical distribution  formulated by the selection of all possible random samples of a fixed size n  a sample.
Bootstrapping applied to t-tests
Introduction to Multilevel Modeling Using SPSS
AM Recitation 2/10/11.
Chapter 8 Introduction to Hypothesis Testing
Linear Regression Inference
Inference for a Single Population Proportion (p).
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Lecture 14 Sections 7.1 – 7.2 Objectives:
Chapter 8 Introduction to Hypothesis Testing
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
Advanced Higher Statistics Data Analysis and Modelling Hypothesis Testing Statistical Inference AH.
Estimating Incremental Cost- Effectiveness Ratios from Cluster Randomized Intervention Trials M. Ashraf Chaudhary & M. Shoukri.
Problems with the Design and Implementation of Randomized Experiments By Larry V. Hedges Northwestern University Presented at the 2009 IES Research Conference.
Data Analysis – Statistical Issues Bernd Genser, PhD Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador
Optimal Design for Longitudinal and Multilevel Research Jessaca Spybrook July 10, 2008 *Joint work with Steve Raudenbush and Andres Martinez.
Randomized Trial of Preoperative Chemoradiation Versus Surgery Alone in Patients with Locoregional Esophageal Carcinoma, Ursa et al. Statistical Methods:
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Sample Size Considerations for Answering Quantitative Research Questions Lunch & Learn May 15, 2013 M Boyle.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Robust Estimators.
1 Experiments with Random Factors Previous chapters have considered fixed factors –A specific set of factor levels is chosen for the experiment –Inference.
Compliance Original Study Design Randomised Surgical care Medical care.
Analysis of Experimental Data; Introduction
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Analysis of Experiments
Introduction to inference Estimating with confidence IPS chapter 6.1 © 2006 W.H. Freeman and Company.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Inference for a Single Population Proportion (p)
Analysis for Designs with Assignment of Both Clusters and Individuals
Challenges arising from the analysis of randomized trials in education
Multiple Imputation using SOLAS for Missing Data Analysis
Stephen W. Raudenbush University of Chicago December 11, 2006
HLM with Educational Large-Scale Assessment Data: Restrictions on Inferences due to Limited Sample Sizes Sabine Meinck International Association.
EEF Archive analysis overview
Analyzing Intervention Studies
chance Learning impeded by two processes: Bias , Chance
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
Linear Hierarchical Modelling
Additional notes on random variables
Analysing educational trials: the Education Endowment Foundation Archive Steve Higgins, Adetayo Kasim, ZhiMin Xiao, with Nasima Akhter, Ewoud De Troyer,
Additional notes on random variables
Sampling and Power Slides by Jishnu Das.
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
SAMPLING AND STATISTICAL POWER
CS639: Data Management for Data Science
Sample Sizes for IE Power Calculations.
The end of statistical significance
Presentation transcript:

School of Education Archive Analysis: On EEF Trials Adetayo Kasim, ZhiMin Xiao, and Steve Higgins

School of Education EEF Conference 2 Outline Design Methods – Simple Randomised Trial (SRT) – Multi-Site Trial (MST) – Cluster Randomised Trial (CRT) Estimation Framework – Frequentist versus Bayesian Approach Discussion Introduction

School of Education EEF Conference 3 Introduction This presentation is based on the data released by the FFT in December We present here 15 effect sizes from 11 randomised trials, which involve three design specifications: Design# Trials# Interv. SRT710 MST23 CRT22 The goal of this presentation is to facilitate discussion around analysis of educational trials using different analytical models.

School of Education EEF Conference 4 Simple Randomised Trials These trials randomised children using simple randomisation without acknowledging the nested structure of pupils within schools. We analysed the data to investigate if: – there is any difference between Cohen’s d and Hedges’ g. – simple randomisation results in zero correlation within schools, i.e., Intra-Cluster Correlation (ICC) equals zero.

School of Education EEF Conference 5 Simple Randomised Trials

School of Education EEF Conference 6 Multi-Site Trials These trials performed randomisation within schools in order to account for differences in effect between schools and the nested structure of pupils within schools. We analysed the data to investigate: – if randomisation within schools removes ICC. – fixed versus random effects using multilevel modelling (MLM).

School of Education EEF Conference 7 Multi-Site Trials No InteractionWith Interaction Interv.Fixed (95% CI)MLM (95% CI)FixedMLM (95% CI) 1Within0.32 (0.07, 0.70)0.31 (0.10, 0.66)-0.31 (0.11, 0.84) Total-0.28 (0.08, 0.51)-0.28 (0.07, 0.50) ICC Within0.40 (0.18, 0.75)0.40 (0.21, 0.74)-0.41 (0.26, 1.00) Total-0.32 (0.13, 0.51)0.32 ( ) ICC Within0.08 (-0.07, 0.24)0.08 (-0.08, 0.24)-0.08 (-0.10, 0.25) Total-0.08 (-0.08, 0.24)-0.07 (-0.10, 0.23) ICC

School of Education EEF Conference 8 Multi-Site Trials One advantage of the fixed effect model is that it does not require a minimum number of schools per treatment arm. However, it relies on a strong assumption of no treatment-by-school interaction, an assumption we cannot verify because most studies are not powered enough to detect such interactions.

School of Education EEF Conference 9 Multi-Site Trials The multilevel model is more robust than the fixed effect model because treatment-by-school interaction is specified as random effects. It will always result in a single effect size estimation per outcome. However, MLM may be unsuitable for studies with small number clusters. For Gaussian data, a minimum of five clusters per treatment arm has been recommended for MLM.

School of Education EEF Conference 10 Cluster Randomised Trials Randomisation to treatment is implemented at school level. All pupils in the same school receive the same intervention. We used different sources of variability to calculate the probabilities of observing a certain effect size given the data we happened to observe.

School of Education EEF Conference 11 Cluster Randomised Trials

School of Education EEF Conference 12 Cluster Randomised Trials Using total variability is the most conservative approach and least likely to result in false positives compared to the use of within or between variability. Within variance is sometimes preferred to ensure comparability across studies. However, it could also lead to false positives if there is substantial between- school variability. Between variability is very prone to false positives. This should be used with caution!!

School of Education EEF Conference 13 Frequentist versus Bayesian Methods There is a general concern about inference based on non-random samples due to the validity of standard errors. This is perhaps one reason for many to choose Bayesian inference over classical frequentist methods. We compared results from three approaches, namely, Bayesian, frequentist with non-parametric bootstrapping, and classical frequentist with standard errors.

School of Education EEF Conference 14 Interv.Bayesian (95% CRI)Bootstrap (95% QCI)Freq.SE (95% CI) (-0.00, 0.14)0.07 (0.00, 0.14)0.07 (-0.13, 0.28) (0.02, 0.53) 0.28 (0.08, 0.51) 0.28 (-0.01, 0.57) (0.10, 0.54) 0.32 (0.13, 0.51) 0.32 (-0.00, 0.64) (0.16, 1.22)0.69 (0.42, 0.92)0.67 (0.07, 1.28) (-0.08, 0.25) 0.08 (-0.08, 0.24) 0.08 (-0.11, 0.27) Frequentist versus Bayesian Methods Effect sizes from MST and CRT using Bayesian and frequentist methods

School of Education EEF Conference 15 Discussion Should simple randomisation be used in Educational trials? Fixed or random effect model for multisite trials? Should bootstrapped confidence interval or Bayesian credible interval be used to quantify uncertainty in effect size estimation? Total or within variance for effect size calculation?

School of Education Thank You Adetayo Kasim, ZhiMin Xiao, and Steve Higgins