Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11
Randomized experiments are the “gold standard” for estimating causal effects WHY? 1.They yield unbiased estimates 2.They eliminate confounding factors The Gold Standard
RRRR RRRR RRRR RRRR RRRR RRRR RRRR RRRR RRR RRRRR RRRRR RRRRR RRR RRRRR RRRRR RRRRR Randomize RRRRRRRR
12 Females, 8 Males8 Females, 12 Males5 Females, 15 Males15 Females, 5 Males Suppose you get a “bad” randomization What would you do??? Can you rerandomize??? When? How? Covariate Balance - Gender
Rubin: What if, in a randomized experiment, the chosen randomized allocation exhibited substantial imbalance on a prognostically important baseline covariate? Cochran: Why didn't you block on that variable? Rubin: Well, there were many baseline covariates, and the correct blocking wasn't obvious; and I was lazy at that time. Cochran: This is a question that I once asked Fisher, and his reply was unequivocal: Fisher (recreated via Cochran): Of course, if the experiment had not been started, I would rerandomize. The Origin of the Idea
The more covariates, the more likely at least one covariate will be imbalanced across treatment groups With just 10 independent covariates, the probability of a significant difference (at level α =.05) for at least one covariate is 1 – (1-.05) 10 = 40%! Covariate imbalance is not limited to rare “unlucky” randomizations Covariate Imbalance
We all know that randomized experiments are the “gold standard” for estimating causal effects WHY? 1.They yield unbiased estimates 2.They eliminate confounding factors … on average! For any particular experiment, covariate imbalance is possible (and likely!), and conditional bias exists The Gold Standard
Randomize subjects to treatment and control Collect covariate data Specify criteria determining when a randomization is unacceptable; based on covariate balance (Re)randomize subjects to treatment and control Check covariate balance 1) 2) Conduct experiment unacceptable acceptable Analyze results (with a randomization test) 3) 4) RERANDOMIZATION
Theorem: If the treatment groups are exchangeable for each randomization and for the rerandomization criteria, then rerandomization yields an unbiased estimate of the average treatment effect. For exchangeability: Equal sized treatment groups Rerandomization criteria that is objective and does not favor a specific group Unbiased
Unbiased Options for unequal sized treatment groups: Rerandomize into multiple equally sized groups, then combine groups after the rerandomization Discard the extra units to form equal sized groups Rerandomize to lower MSE, with perhaps slight bias Do not rerandomize
Randomization Test: Simulate randomizations to see what the statistic would look like just by random chance, if the null hypothesis were true Rerandomization Test: A randomization test, but for each simulated randomization, follow the same rerandomization criteria used in the experiment As long as the simulated randomizations are done using the same randomization scheme used in the experiment, this will give accurate p-values Rerandomization Test
t-test: Too conservative Significant results can be trusted Regression: Regression including the covariates that were balanced on using rerandomization more accurately estimates the true precision of the estimated treatment effect Assumptions are less dangerous after rerandomization because groups should be well balanced Alternatives for Analysis
We use Mahalanobis Distance, M, to represent multivariate distance between group means: Choose a and rerandomize when M > a Criteria for Acceptable Balance
Distribution of M RERANDOMIZE Acceptable Randomizations p a = Probability of accepting a randomization
Choosing the acceptance probability is a tradeoff between a desire for better balance and computational time The number of randomizations needed to get one successful one is Geometric(p a ), so the expected number needed is 1/p a Computational time must be considered in advance, since many simulated acceptable randomizations are needed to conduct the randomization test Choosing p a
The obvious choice may be to set limits on the acceptable balance for each covariate individually This destroys the joint distribution of the covariates Another Measure of Balance
Since M follows a known distribution, easy to specify the proportion of accepted randomizations M is affinely invariant (unaffected by affine transformations of the covariates) Correlations between covariates are maintained The balance improvement for each covariate is the same (and known)… … and is the same for any linear combination of the covariates Rerandomization Based on M
Theorem: If n T = n C, the covariate means are normally distributed, and rerandomization occurs when M > a, then Covariates After Rerandomization
Percent Reduction in Variance For each covariate x j, the ratio of variances is: and the percent reduction in variance is
Shadish Data n = 445 undergraduate students Randomized Experiment 235 Observational Study 210 Vocab Training 116 Math Training 119 Vocab Training 131 Math Training 79 randomization students choose Shadish, M. R., Clark, M. H., Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. JASA. 103(484):
Shadish Data k = 10 covariates
Shadish Data To generate 1000 rerandomizations: p a = 0.1:11.6 seconds p a = 0.01:1.8 minutes p a = 0.001:18.3 minutes p a = : 3.4 hours
Shadish Data To generate 1000 rerandomizations: p a = 0.1:11.6 seconds p a = 0.01:1.8 minutes p a = 0.001:18.3 minutes p a = : 3.4 hours
Shadish Data Rerandomize when M > 1.48 k = 10 covariates p a = Percent Reduction in Variance: 100(1 – v a ) = 88%
Shadish Data
Theorem: If n T = n C, the covariate and outcome means are normally distributed, the treatment effect is additive, and rerandomization occurs when M > a, then and the percent reduction in variance for the estimated treatment effect is where R 2 is the squared canonical correlation. Estimated Treatment Effect After Rerandomization
vava Outcome Variance Reduction
Shadish Data n = 445 undergraduate students Randomized Experiment 235 Observational Study 210 Vocab Training 116 Math Training 119 Vocab Training 131 Math Training 79 randomization students choose Outcomes: score on a vocabulary test, score on a mathematics test
Shadish Data Vocabulary: R 2 = 0.11 Percent Reduction in Variance = 88(0.11) = 9.68% Equivalent to increasing the sample size by 1/(1-0.28) = 1.39 Mathematics: R 2 = 0.32 Percent Reduction in Variance = 88(0.32) = 28.16%
Shadish Data
Since M follows a known distribution, easy to specify the proportion of accepted randomizations M is affinely invariant (unaffected by affine transformations of the covariates) Correlations between covariates are maintained The balance improvement for each covariate is the same… … and is the same for any linear combination of the covariates Rerandomization Based on M Is that good??? Since M follows a known distribution, easy to specify the proportion of accepted randomizations M is affinely invariant (unaffected by affine transformations of the covariates) Correlations between covariates are maintained The balance improvement for each covariate is the same… … and is the same for any linear combination of the covariates
In practice, covariates are often of varying importance We can place covariates into tiers, with the top tier including the most important covariates, the second tier less important, etc. Criteria for acceptable balance can be less stringent for each successive tier Tiers of Covariates
1)Check balance for covariates in tier 1 calculate M 1 using the covariates in tier 1, rerandomize if M 1 > a 1 2) For each successive tier, regress each covariate on all covariates in previous tiers, and check balance for the residuals of that tier calculate M t using the residuals of tier t, rerandomize if M t > a t a t denotes the acceptance threshold for tier t Tiers of Covariates
Theorem: If the n T = n C and the covariate means are normally distributed, and rerandomization occurs if any M t > a t, then the ratio of variances for covariate x j is Tiers of Covariates where R t 2 denotes the squared canonical correlation between x j and all covariates in tiers 1 through t.
Empirical vs Theoretical Values
Rerandomization is easily extended to multiple treatment groups To measure balance, one of the MANOVA test statistics (such as Wilks’ ) can be used, measuring the ratio of within group variability to between group variability for multiple covariates This is equivalent to rerandomizing based on Mahalanobis distance if used on only two groups Multiple Treatment Groups
This is not meant to replace blocking, and blocking and rerandomization can be used together Blocking is great for balancing a few very important covariates, but is not easily used for many covariates “Block what you can, rerandomize what you can’t” Blocking
The theoretical results given depend on M ~ k 2 If the covariate means are not normally distributed, that is fine! The theoretical results given are used nowhere in analysis. Rerandomization does NOT depend on asymptotics. The threshold a corresponding to p a, and the percent reduction in variance for the covariates can be estimated via simulation Small Samples or Non-Normality
Rerandomization improves covariate balance between the treatment groups, giving the researcher more faith that an observed effect is really due to the treatment If the covariates are correlated with the outcome, rerandomization also increases precision in estimating the treatment effect, giving the researcher more power to detect a significant result Conclusion Thank you!!!