Rerandomization in Randomized Experiments STAT 320 Duke University Kari Lock Morgan.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

COMPUTER INTENSIVE AND RE-RANDOMIZATION TESTS IN CLINICAL TRIALS Thomas Hammerstrom, Ph.D. USFDA, Division of Biometrics The opinions expressed are those.
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
Chapter 7 Hypothesis Testing
6. Statistical Inference: Example: Anorexia study Weight measured before and after period of treatment y i = weight at end – weight at beginning For n=17.
Inference: Fishers Exact p-values STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke.
Review bootstrap and permutation
Properties of Least Squares Regression Coefficients
“Students” t-test.
Chapter 16 Inferential Statistics
Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
Objective: To test claims about inferences for two proportions, under specific conditions Chapter 22.
Hypothesis Testing: Intervals and Tests
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Inference: Neyman’s Repeated Sampling STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science.
Confidence Intervals for Proportions
Topic 6: Introduction to Hypothesis Testing
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Inferences About Process Quality
Section 4.4 Creating Randomization Distributions.
Chapter 9 Hypothesis Testing.
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 19: Confidence Intervals for Proportions
Using Covariates in Experiments: Design and Analysis STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical.
Separate multivariate observations
AP Statistics Section 13.1 A. Which of two popular drugs, Lipitor or Pravachol, helps lower bad cholesterol more? 4000 people with heart disease were.
Introduction to Linear Regression and Correlation Analysis
Estimation and Hypothesis Testing. The Investment Decision What would you like to know? What will be the return on my investment? Not possible PDF for.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Ch 10 Comparing Two Proportions Target Goal: I can determine the significance of a two sample proportion. 10.1b h.w: pg 623: 15, 17, 21, 23.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
MS 305 Recitation 11 Output Analysis I
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
AP Statistics Section 13.1 A. Which of two popular drugs, Lipitor or Pravachol, helps lower bad cholesterol more? 4000 people with heart disease were.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Introducing Inference with Bootstrapping and Randomization Kari Lock Morgan Department of Statistical Science, Duke University with.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Matching STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University.
© Copyright McGraw-Hill 2004
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Chapter 13 Understanding research results: statistical inference.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11.
Rerandomization in Randomized Experiments Kari Lock and Don Rubin Harvard University JSM 2010.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Statistics 19 Confidence Intervals for Proportions.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Methods of Presenting and Interpreting Information Class 9.
Confidence Intervals for Proportions
Inference and Tests of Hypotheses
Confidence Intervals for Proportions
CHAPTER 12 More About Regression
Fundamentals of regression analysis
Chapter 9 Hypothesis Testing.
Comparing Populations
CHAPTER 12 More About Regression
Rerandomization to Improve Baseline Balance in Educational Experiments
CHAPTER 12 More About Regression
Presentation transcript:

Rerandomization in Randomized Experiments STAT 320 Duke University Kari Lock Morgan

When interpreting p-value: chance of getting results as extreme IF NO DIFFERENCE can’t accept null even if insignificant Interpretation of a confidence interval Can’t ever compute estimand (need theory to show unbiased) Standard error: standard deviation of statistics More possible allocations does not necessarily mean higher SE (think of increasing sample size: more allocations but lower SE) Comments on Homework 2

Randomized experiments are the “gold standard” for estimating causal effects WHY? 1.They yield unbiased estimates 2.They balance covariates across treatment groups The Gold Standard

RRRR RRRR RRRR RRRR RRRR RRRR RRRR RRRR RRR RRRRR RRRRR RRRRR RRR RRRRR RRRRR RRRRR Randomize RRRRRRRR

12 Females, 8 Males8 Females, 12 Males5 Females, 15 Males15 Females, 5 Males Covariate Balance - Gender What if you get a “bad” randomization? What would you do??? Can you rerandomize? When? How?

Rubin: What if, in a randomized experiment, the chosen randomized allocation exhibited substantial imbalance on a prognostically important baseline covariate? Cochran: Why didn't you block on that variable? Rubin: Well, there were many baseline covariates, and the correct blocking wasn't obvious; and I was lazy at that time. Cochran: This is a question that I once asked Fisher, and his reply was unequivocal: Fisher (recreated via Cochran): Of course, if the experiment had not been started, I would rerandomize. The Origin of the Idea

In 2007, Dr. Ellen Langer tested her hypothesis that “mind-set matters” with a randomized experiment She recruited 84 maids working at 7 different hotels, and randomly assigned half to a treatment group and half to control The “treatment” was simply informing the maids that their work satisfies the Surgeon General’s recommendations for an active lifestyle Crum, A.J. and Langer, E.J. (2007). “Mind-Set Matters: Exercise and the Placebo Effect,” Psychological Science, 18: Mind-Set Matters

p-value =.01p-value =.001 Mind-Set Matters

The more covariates, the more likely at least one covariate will be imbalanced across treatment groups With just 10 independent covariates, the probability of a significant difference (α =.05) for at least one covariate is 1 – (1-.05) 10 = 40%! Covariate imbalance is not limited to rare “unlucky” randomizations Covariate Imbalance

Randomize 20 cards to two treatment groups Any differences? Covariate Imbalance

Randomized experiments are the “gold standard” for estimating causal effects WHY? 1.They yield unbiased estimates 2.They eliminate confounding factors … on average! For any particular experiment, covariate imbalance is possible (and likely!), and conditional bias exists The Gold Standard

Randomize subjects to treated and control Collect covariate data Specify criteria determining when a randomization is unacceptable; based on covariate balance (Re)randomize subjects to treated and control Check covariate balance 1) 2) Conduct experiment unacceptable acceptable Analyze results (with a randomization test) 3) 4) RERANDOMIZATION

Let x be the covariate matrix Let W be the vector of treatment assignments The criterion determining whether a randomization, W, is acceptable should be some function of x and W Rerandomization Criterion

Let y i (W i ) denote the i th unit’s potential outcome under treatment group W i Potential Outcomes

If the treated and control groups are the same size, and if the criteria for rerandomization treats the treated and control groups equally, then Intuition: For every randomization, W, thrown away, there is an exact opposite randomization, 1-W, that is also thrown away Unbiased

If treated and control groups are not the same size, then rerandomization may not yield an unbiased estimate. Example: Suppose you have one covariate x, and the criteria for rerandomization is Units with more extreme x values will be more likely to be in the larger treatment group. Unbiased

Randomization Test: Simulate randomizations to see what the statistic would look like just by random chance, if the null hypothesis were true Rerandomization Test: A randomization test, but for each simulated randomization, follow the same rerandomization criteria used in the experiment As long as the simulated randomizations are done using the same randomization scheme used in the experiment, this will give accurate p-values Rerandomization Test

t-test: Too conservative Significant results can be trusted Regression: Regression including the covariates that were balanced on using rerandomization more accurately estimates the true precision of the estimated treatment effect Assumptions are less dangerous after rerandomization because groups should be well balanced Alternatives for Analysis

If you have more than one covariate that you think may be associated with the outcome, what would you use as criteria for an acceptable randomization? Criteria for Acceptable Balance

The obvious choice may be to set limits on the acceptable balance for each covariate individually This destroys the joint distribution of the covariates One Measure of Balance

We use Mahalanobis Distance, M, to represent multivariate distance between group means: Choose a and rerandomize when M > a Criteria for Acceptable Balance

Distribution of M RERANDOMIZE Acceptable Randomizations p a = Probability of accepting a randomization

Choosing the acceptance probability is a tradeoff between a desire for better balance and computational time The number of randomizations needed to get one successful one is Geometric(p a ), so the expected number needed is 1/p a Computational time must be considered in advance, since many simulated acceptable randomizations are needed to conduct the randomization test Choosing p a

Since M follows a known distribution, easy to specify the proportion of accepted randomizations M is affinely invariant (unaffected by affine transformations of the covariates) Correlations between covariates are maintained The balance improvement for each covariate is the same (and known)… … and is the same for any linear combination of the covariates Rerandomization Based on M

Mind-Set Matters Difference in Means = -8.3

1000 Randomizations

Theorem: If n T = n C, the covariate means are normally distributed, and rerandomization occurs when M > a, then and Covariates After Rerandomization

Define the percent reduction in variance for each covariate to be the percentage by which rerandomization reduces the randomization variance for the difference in means: Percent Reduction in Variance For rerandomization when M ≤ a, the percent reduction in variance for each covariate is 1– v a

Percent Reduction in Variance What if we increase p a ?

Theorem: If n T = n C, the covariate means are normally distributed, and rerandomization occurs when M > a, then and the percent reduction in variance for the outcome difference in means is where R 2 is the coefficient of determination (squared canonical correlation). Estimated Treatment Effect

vava Outcome Variance Reduction

Outcome: Weight Change R 2 =.1 Percent Reduction in Variance = (1 – v a )R 2 = (1 –.12 )(.1) = 8.8% Outcome: Change in Diastolic Blood Pressure R 2 =.64 Percent Reduction in Variance = (1 – v a )R 2 = (1 –.12 )(.64) = 56% Equivalent to increasing the sample size by 1/(1 -.56) = 2.27

Rerandomization improves covariate balance between the treatment groups, giving the researcher more faith that an observed effect is really due to the treatment If the covariates are correlated with the outcome, rerandomization also increases precision in estimating the treatment effect, giving the researcher more power to detect a significant result Conclusion