Small-Sample Methods for Cluster-Robust Inference in School-Based Experiments James E. Pustejovsky UT Austin Educational Psychology Department Quantitative.

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Small-Sample Methods for Cluster-Robust Inference in School-Based Experiments James E. Pustejovsky UT Austin Educational Psychology Department Quantitative Methods Program pusto@austin.utexas.edu March 2, 2017 Society for Research on Educational Effectiveness Washington, DC Elizabeth Tipton Teachers College, Columbia University Dept. of Human Development tipton@tc.columbia.edu

In brief… Analysis of social experiments often requires handling dependencies among outcomes using: Multi-level modeling Regression with cluster-robust variance estimation (CRVE) Conventional CRVE behave poorly when the number of clusters is small, and “small” depends on the model. McCaffrey, Bell, & Botts (2001; Bell & McCaffrey, 2002) proposed bias- reduced linearization variance estimator (BRL), Satterthwaite t-test Our work (Pustejovsky & Tipton, 2017) extends BRL so that it works in panel models with fixed effects F-test for multi-parameter hypothesis tests software implementation in R and Stata (clubSandwich package) Impact estimation in social experiments often requires using analytic methods that can handle dependency among outcomes from the same contexts. For example, in a school-randomized trial, outcomes measured on students from the same school need to be treated as dependent. Similarly, even in an individually-randomized trial, multivariate outcomes measured on each student, joint models need to allow for correlation among the outcomes. One main approach to doing so is to use a multi-level (hierarchical linear model) that includes error terms for lower level and higher level units. Another is to use ordinary regression analysis, paired with cluster-robust variance estimation. This talk will focus on the latter technique. Clustering the standard errors at an appropriate level allows the analyst to account for dependency among units within clusters, without having to worry about the exact form of the error structure. It’s therefore quite an attractive technique, but it has the drawback that it is only asymptotically valid—it will give the right answer as the number of clusters grows to infinity. In small samples, conventional CRVE can actually behave rather poorly, and as I hope to demonstrate today, what counts as small can depend on the model. Dan McCaffrey and colleagues proposed a fix for this issue about 15 years ago, called bias reduced linearization (BRL). BRL works by correcting the cluster-robust variance estimator so that it is approximately unbiased even in small samples. For t-tests, they also introduced a Satterthwaite approximation with degrees of freedom estimated from the data. Our work extends BRL in three ways. First, we generalize it for models that include complex fixed effects, when the conventional version is undefined. Second, we introduce a small sample F-test, which uses a generalization of the Satterthwaite approximation to allow for testing of multiple-parameter hypotheses, not just t-tests for single coefficients. And third, we’ve built software, called the clubSandwich package, that implements the methods for R and Stata. So today I’d like to give a brief overview of bias reduced linearization, and then look at some specific examples to illustrate that these corrections do matter in practice.

Model Main impacts model: More generally, In matrix form: Models with multiple treatment indicators Treatment-by-covariate interactions In matrix form: Let’s consider an estimating equation for the main impact of a treatment versus a control condition. Here we have an outcome Y for unit j in cluster i, as a function of an intercept, a treatment indicator, and then potentially some vector of further covariates. More generally, we might have an estimating equation that includes multiple treatment indicators (if the experiment has more than just one treatment and one control condition), or we might have a model with treatment-by-covariate interactions, to examine if, say effects are moderated by the urbanicity of the school. I mention this because in such models, we need ways to test joint hypotheses about more than one coefficient at a time. In what follows, I’ll use a generic form of the model, written in matrix terms as a vector Y-I of outcomes for all units in cluster I, explained by a set of covariates X. The error term is allowed to be dependent within cluster, in some unspecified way.

Estimation Estimate β by weighted least squares: Standard CRVE: Conventional to use n – 1 degrees of freedom for t-tests. In such a model, the betas would typically be estimated by ordinary or weighted least squares, where the weights might be estimated from the error terms. The standard cluster-robust variance estimator then has this form. For purposes of testing, it’s conventional to use t-tests and F-tests with n – 1 degrees of freedom, as a sort of ad-hoc small sample correction.

Bias-reduced linearization Corrects VCR based on a working model for the error covariance structure: with adjustment matrices A1,…,An chosen to satisfy Degrees of freedom corrections for hypothesis tests Satterthwaite d.f. for t-tests (Bell & McCaffrey, 2002) Approximate Hotelling’s T2 d.f. for F-test (Tipton & Pustejovsky, 2015; Pustejovsky & Tipton, 2017) Bias-reduced linearization works by correcting the conventional robust variance estimator by inserting these adjustment matrices into the middle. The adjustments are chosen so that the variance estimator is unbiased under a working model for the error structure. That’s kind of funny because the whole point of RVE is to avoid assumptions about the error term, and now I’m saying that you have to model the error term to calculate the variance estimator. But it turns out that the working model doesn’t actually have to be right. Even if it is mis-specified, the correction still reduces the bias of the estimator.

Approximate Hotelling Test We propose a generalization of the Satterthwaite approximation to the multi-dimensional case, with Approximate the distribution of VBRL using a Wishart distribution with degrees of freedom η. Estimate η by matching mean and total variance of VBRL. Second contribution. The original BRL method was limited to testing single-coefficient hypotheses with t-tests. We propose a test for multiple-constraint hypotheses that is a direct generalization of the Satterthwaite approximation. To give the flavor, the test involves approximating the distribution of the variance estimator by a Wishart distribution with some degrees of freedom eta. We get eta by matching the mean and total variance of the estimator. Now if V-BRL is a Wishart, then the Wald test statistic follows a Hotelling’s T-squared distribution, which is a multiple of an F distribution. So we use the estimated degrees of freedom to correct the test statistic itself and then to approximate the reference distribution as well.

Effects of Tribes Learning Communities (Hanson et al., 2011) Social-Emotional Learning curriculum. Classroom-level randomization to TRIBES or BAU control. 10 participating schools in Grades 1-2. Original analysis used HLM with classroom level random effects, school fixed effects.

Effects of Tribes Learning Communities (Hanson et al., 2011) OLS estimation (seemingly unrelated regressions) Cluster SEs by school Joint test of outcomes Conventional: F(4, 9) = 6.82, p = .008 Bias-reduced linearization: F(4, 4.3) = 3.70, p = .109 Impact Est. (ES units) Conventional CRVE Bias-Reduced Linearization Outcome SE df p Aggressive behavior (T) 0.329 0.156 9 .065 0.173 7.0 .098 Rule-breaking (T) 0.312 0.157 .078 .114 Interpersonal strength (P) 0.209 0.079 .026 0.085 7.5 .041 Intrapersonal strength (P) 0.231 0.077 .015 0.081 7.4 .023 Note that I used slightly different analytic model and clustered SEs by school rather than classroom. BRL SE is 5-10% larger.

Angrist & Lavy (2009) Cluster-randomized trial in 40 high schools in Israel. Tested effects of monetary incentives on post-secondary matriculation exam (Bagrut) completion rates. Longitudinal data, difference-in-differences specification. Focus on effects for higher-achieving girls Hypothesis Test F df p-value treatment effect (q = 1) Standard 5.746 34.00 .022 Satterthwaite 5.169 18.13 .035 Moderation by school sector (q = 2) 3.186 .054 AHT 1.665 7.84 .250 To give you a sense of how this test works in practice, we re-analyzed data from a cluster-randomized trial reported by Angrist and Lavy. The study involved longitudinal data collection and so the original authors used a difference-in-differences specification to estimate the treatment effect. Looking at the test for the average treatment effect, the standard method gives a p-value of .022, whereas the Satterthwaite correction has a smaller test statistic (because of the correction to the variance estimator) and has about half the degrees of freedom. This leads to a bit larger p-value. Now look at the test for whether the treatment effects are constant across

Further considerations Magnitude of SE adjustment and degrees of freedom depend on: Weighting Cluster sizes Balance Covariate distribution Given these complexities, we recommend applying small-sample adjustment by default when using CRVE.

Software R package clubSandwich Stata package clubSandwich Available on Comprehensive R Archive Network (v0.2.1) Development version at https://github.com/jepusto/clubSandwich Works with a wide variety of models (lm, lme, plm) Stata package clubSandwich Available on Github: https://github.com/jepusto/clubSandwich-Stata Wraps reg and areg

Future directions Performance comparisons versus other small-sample corrections Cluster-wild bootstrap (Cameron, Gelbach, & Miller, 2008; MacKinnon & Webb, 2016). Randomization tests (Canay, Romano, & Shaikh, 2014). Other degrees-of-freedom corrections from GEE literature (e.g., Fay & Graubard, 2001; Wang & Long, 2011). Robust score (LM) tests. Extensions Instrumental variables (2-stage least squares) GEE models Multi-way clustering (Cameron, Gelbach, & Miller, 2011)

References Angrist, J. D., & Lavy, V. (2009). The effects of high stakes high school achievement awards : Evidence from a randomized trial. American Economic Review, 99(4), 1384–1414. Bell, R. M., & McCaffrey, D. F. (2002). Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology, 28(2), 169–181. Cameron, A. C., Gelbach, J. B., and Miller, D. (2008). Bootstrap-based improvements for inference with clustered errors. The Review of Economics and Statistics, 90(3):414–427. Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011). Robust inference with multiway clustering. Journal of Business & Economic Statistics, 29(2), 238–249.. Canay, I. A., Romano, J. P., & Shaikh, A. M. (2014). Randomization tests under an approximate symmetry assumption. Working paper. Fay MP and Graubard BI. Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics 2001;57: 1198- 1206. Hanson, Thomas L., Jo Ann Izu, Anthony Petrosino, Bo Delong-Cotty, and Hong Zheng. Outcome Evaluation of Tribes Learning Communities in California, 2007-2010. ICPSR32821-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2012-12-20. Imbens, G. W. and Kolesar, M. (2016). Robust Standard Errors in Small Samples: Some Practical Advice. Review of Economics and Statistics, forthcoming. James-Burdurmy, Susanne. Randomized Experiment of Playworks Analytic Files for 2010-2011 and 2011-2012 Cohorts in Six United States Cities. ICPSR35638-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2016-09-20. Lee, D. S., & Card, D. (2008). Regression discontinuity inference with specification error. Journal of Econometrics, 142(2), 655–674. MacKinnon, J. G. and Webb, M. D. (2016). Wild bootstrap inference for wildly different cluster sizes. Journal of Applied Econometrics, forthcoming. McCaffrey, D. F., Bell, R. M., & Botts, C. H. (2001). Generalizations of biased reduced linearization. In Proceedings of the Annual Meeting of the American Statistical Association. Pustejovsky, James E. & Elizabeth Tipton (2017). Small sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business and Economic Statistics. In Press. Tipton, E., & Pustejovsky, J. E. (2015). Small-sample adjustments for tests of moderators and model fit using robust variance estimation in meta-regression. Journal of Educational and Behavioral Statistics, 40(6), 604–634. Wang M and Long Q. Modified robust variance estimator for generalized estimating equations with improved small-sample performance. Statistics in Medicine 2011;30(11): 1278-1291.

Simulation results: Block-randomized trials Note: q is the dimension of the hypothesis test. Source: Pustejovsky & Tipton (2017).

Simulation results: Cluster-randomized trials Note: q is the dimension of the hypothesis test. Source: Pustejovsky & Tipton (2017).

Block-randomized/multi-site trials Model with block fixed effects: Overall impact estimate: where 𝛿 1 ,…, 𝛿 𝑛 are treatment effect estimates from each block. Conventional CRVE (clustering by block):

Block-randomized/multi-site trials (cont.) BRL correction: Satterthwaite df: Satterthwaite df = n – 1 if wj are equal (otherwise df < n – 1).

Cluster-randomized trials Model (without covariates): Overall impact estimate: where 𝜇 1 𝑇 ,…, 𝜇 𝑛 𝑇 𝑇 and 𝜇 1 𝐶 ,…, 𝜇 𝑛 𝐶 𝐶 are cluster-specific mean estimates.

Cluster-randomized trials (cont.) Conventional CRVE: BRL correction: If wi are approximately equal (cf. Imbens & Kolesaar, 2016):

Effects of Playworks on school climate, student social skills and behavior (James-Burdurmy et al., 2013) Structured physical activity and recess coaching program. 29 participating schools, grouped in 9 blocks School-level block randomization to Playworks or BAU control. 17 treatment schools 12 control schools OLS estimation, including block fixed effects Cluster SEs by school

Bias-Reduced Linearization Effects of Playworks on school climate, student social skills and behavior (James-Burdurmy et al., 2013) Impact Est. (ES units) Conventional CRVE Bias-Reduced Linearization Outcome SE df p Df Teacher support for organized play 0.591 0.138 28 <.001 0.172 12.0 .005 Staff support for organized play 0.324 0.130 .019 0.156 12.2 .059 Student bullying/exclusion -1.014 0.187 0.253 11.9 .002 Difficult transitioning to learning after recess -0.840 0.112 0.143 11.8 SEs are 20-35% larger Joint test of outcomes Conventional: F(4, 28) = 23.5, p < .001 Bias-reduced linearization: F(4, 9.0) = 10.6, p = .002