Problems with the Design and Implementation of Randomized Experiments By Larry V. Hedges Northwestern University Presented at the 2009 IES Research Conference.

Slides:



Advertisements
Similar presentations
Analysis by design Statistics is involved in the analysis of data generated from an experiment. It is essential to spend time and effort in advance to.
Advertisements

Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Designing an impact evaluation: Randomization, statistical power, and some more fun…
T-tests continued.
1-Way Analysis of Variance
Statistical Analysis Overview I Session 2 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill.
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
1 Design of Engineering Experiments Part 3 – The Blocking Principle Text Reference, Chapter 4 Blocking and nuisance factors The randomized complete block.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
Smith/Davis (c) 2005 Prentice Hall Chapter Thirteen Inferential Tests of Significance II: Analyzing and Interpreting Experiments with More than Two Groups.
Design of Engineering Experiments - Experiments with Random Factors
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
1 Multifactor ANOVA. 2 What We Will Learn Two-factor ANOVA K ij =1 Two-factor ANOVA K ij =1 –Interaction –Tukey’s with multiple comparisons –Concept of.
ANOVA: ANalysis Of VAriance. In the general linear model x = μ + σ 2 (Age) + σ 2 (Genotype) + σ 2 (Measurement) + σ 2 (Condition) + σ 2 (ε) Each of the.
Chapter 7 Sampling and Sampling Distributions
Chapter 3 Analysis of Variance
Evaluating Hypotheses
Statistical Background
Experimental Evaluation
Inferences About Process Quality
Analysis of Variance & Multivariate Analysis of Variance
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
Biostatistics-Lecture 9 Experimental designs Ruibin Xi Peking University School of Mathematical Sciences.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
ANOVA 3/19/12 Mini Review of simulation versus formulas and theoretical distributions Analysis of Variance (ANOVA) to compare means: testing for a difference.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Statistics Design of Experiment.
Chapter 1: Introduction to Statistics
1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
Chapter 13Design & Analysis of Experiments 8E 2012 Montgomery 1.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.
Confidence Interval Estimation
Analysis of Variance ( ANOVA )
Random Sampling, Point Estimation and Maximum Likelihood.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
Testing Hypotheses about Differences among Several Means.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
Handling Mobility in Cluster- Randomized Cohort Trials.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
DOX 6E Montgomery1 Design of Engineering Experiments Part 9 – Experiments with Random Factors Text reference, Chapter 13, Pg. 484 Previous chapters have.
Intermediate Applied Statistics STAT 460 Lecture 17, 11/10/2004 Instructor: Aleksandra (Seša) Slavković TA: Wang Yu
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Data Analysis in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine October.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
1 Experiments with Random Factors Previous chapters have considered fixed factors –A specific set of factor levels is chosen for the experiment –Inference.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
IE241: Introduction to Design of Experiments. Last term we talked about testing the difference between two independent means. For means from a normal.
One-Way Analysis of Variance Recapitulation Recapitulation 1. Comparing differences among three or more subsamples requires a different statistical test.
Analysis of Experiments
ANOVA Overview of Major Designs. Between or Within Subjects Between-subjects (completely randomized) designs –Subjects are nested within treatment conditions.
1 Topic 14 – Experimental Design Crossover Nested Factors Repeated Measures.
Formulation of the Research Methods A. Selecting the Appropriate Design B. Selecting the Subjects C. Selecting Measurement Methods & Techniques D. Selecting.
Inference about the slope parameter and correlation
Factorial Experiments
ANOVA Econ201 HSTS212.
Comparing Three or More Means
Simple Linear Regression
What are their purposes? What kinds?
Chapter 10 – Part II Analysis of Variance
Presentation transcript:

Problems with the Design and Implementation of Randomized Experiments By Larry V. Hedges Northwestern University Presented at the 2009 IES Research Conference

Hard Answers to Easy Questions By Larry V. Hedges Northwestern University Presented at the 2009 IES Research Conference

Easy Question Isn’t it ok if I just match (schools) on some variable before randomizing? (You know lots of people do it) This is a simple question, but giving it an answer requires serious thinking about design and analysis

What Does this Question Mean? Generally adding matching or blocking variables means adding another (blocking) factor to the design The exact consequences depend on the design you started with: Individually randomized (completely randomized design) Cluster randomized (hierarchical design) Multicenter or matched (randomized blocks design)

Individually Randomized (Completely Randomized) Design In this case you are adding a blocking factor crossed with treatment (p blocks) In other words, the design becomes a (generalized) randomized block design Blocks 12…p T C

Individually Randomized (Completely Randomized) Design How does this impact the analysis? Think about a balanced design with 2n students per block and p blocks and the ANOVA partitioning of sums of squares and degrees of freedom Original partitioning SS Total = SS T + SS WT df Total = df T + df WT 2pn – 1 = 1 + 2pn – 2 Original test statistic F = SS T /(SS WT /df WT )

Individually Randomized (Completely Randomized) Design New partitioning SS Total = SS T + SS B + SS BxT + SS WC df Total = df T + df B + df BxT + df WC 2pn – 1 = 1 + (p – 1) + (p – 1) + 2p(n – 1) New test statistic ? F = SS T /(SS WC /df WC ) Or F = SS T /(SS BxT /df BxT ) It depends on the inference model

Individually Randomized (Completely Randomized) Design Original DesignBlocked Design SS = SS T + SS WT SS = SS T + (SS B + SS BxT + SS WC ) df = df T + df WT df = df T + (df B + df BxT + df WC ) 2pn–1 = 1 + (2pn –2)2pn–1 = 1 + (p-1) + (p-1) + 2p(n-1)

Inference Models I will mention two inference models Conditional inference model Unconditional inference model These inference models determine the type of inference (generalization) you wish to make Inference model chosen has implications for the statistical analysis procedure chosen The inference model determines the natural random effects

Inference Models Conditional Inference Model Generalization is to the blocks actually in the experiment (or those just like them) Blocks in the experiment are the universe (population) Generalization to other blocks depends on extra-statistical considerations (which blocks are just like them? How do you know?) Generalization obviously cannot be model free

Inference Models Unconditional Inference model Generalization is to a universe (of blocks) including blocks not in the experiment Blocks in the experiment are a sample of blocks in the universe (population) If blocks in the experiment can be considered a representative sample, inference to the population of blocks is by sampling theory If blocks are not a probability sample, generalization gets tricky (what is the universe? How do you know?)

Inference Models You can think of the inference model as linked to the sampling model for blocks If the blocks observed are a (random) sample of blocks, then they are a source of random variation If blocks observed are the entire universe of relevant blocks, then they are not a source of random variation The statistical analysis can be chosen independently of the inference model, but if it doesn’t include all sources of random variation, inferences will be compromised

Inference Models and Statistical Analyses Individually Randomized Design Blocks are fixed effects under the conditional inference models In this case the correct test statistic is F C = SS T /(SS WC /df WC ) and the F-distribution has 1 & 2p(n -1) df Block effects are random under the unconditional inference model In this case the correct test statistic is F U = SS T /(SS BxT /df BxT ) and the F-distribution has 1 & (p -1) df

Inference Models and Statistical Analyses Individually Randomized Design You can see that the error term in the test has (a lot) more df under fixed effects model 2p(n – 1) versus (p – 1) What you can’t see is that (if there is a treatment effect) the average value of the F-statistic is typically also larger under the fixed effects model It is bigger by a factor proportional to where ω = σ BxT 2 /σ B 2 is a treatment heterogeneity parameter and ρ is the intraclass correlation and

Possible Statistical Analyses Individually Randomized Design Possible statistical analyses 1.Ignore the blocking 2.Include blocks as fixed effects 3.Include blocks as random effects Consequences depend on whether you want to make a conditional or unconditional inference

Making Unconditional Inferences Individually Randomized Design Possible statistical analyses 1.Ignore the blocking Bad idea: Will inflate significance levels of tests for treatment effects substantially 2.Include blocks as fixed effects Bad idea: Will inflate significance levels of tests for treatment effects substantially 3.Include blocks as random effects Correct significance levels (but less power than conditional analysis)

Making Conditional Inferences Individually Randomized Design Possible statistical analyses 1.Ignore the blocking Bad idea: May deflate actual significance levels of tests for treatment effects substantially (unless ρ = 0) Include blocks as fixed effects Correct significance levels and more powerful test than for unconditional analysis Include blocks as random effects Bad idea: May deflate significance levels and reduce power

Cluster Randomized (Hierarchical) Design The issues about blocking in the cluster randomized design are the same as in the individually randomized design The inference model will determine the most appropriate statistical analysis Examining the properties of the statistical analysis may also reveal the weakness of the design for a given inference purpose For example, a small number of blocks may provide only very uncertain inference to a universe of blocks based on sampling arguments

Cluster Randomized (Hierarchical) Design In this case you are adding a blocking factor crossed with treatment (p blocks) but clusters are still nested within treatments [here C ij is the j th cluster in the i th block] Note that there are m clusters in each treatment per block Block 1Block p C 11, …, C 1m C 1(m+1), …, C 2m C p1, …, C pm C p(m+1), …, C p(2m) T --- … C

Cluster Randomized (Hierarchical) Design How does this impact the analysis? Think about a balanced design with 2mn students per block and p blocks and the ANOVA partitioning of sums of squares and degrees of freedom Original partitioning SS Total = SS T + SS C + SS WC:T df Total = df T + df C + df WC:T 2mn – 1 = 1 + 2(m – 1) + 2m(n – 1) Original test statistic F = SS T /(SS c /df C )

Cluster Randomized (Hierarchical) Design New partitioning SS Total = SS T + SS B + SS BxT + SS C:BxT + SS WC df Total = df T + df B + df BxT + df C:BxT + df WC 2mpn – 1 = 1+ (p – 1) +(p – 1) +2p(m – 1) +2pm (n – 1) New test statistic ? F = SS T /(SS WT /df WT ) F = SS T /(SS C:BxT /df C:BxT )

Inference Models and Statistical Analyses Cluster Randomized Design Blocks are fixed under the conditional inference model, but clusters are typically random In this case the correct test statistic is F C = SS T /(SS C:BxT /df C:BxT ) and the F-distribution has 1 & 2p(m – 1) df Blocks are random under the unconditional inference model, but clusters are typically random In this case there is no exact ANOVA test if there are block treatment interactions, but a conservative test uses the test statistic F C = SS T /(SS B /df B ) and the F-distribution has 1 & (p – 1) df (large sample tests, e.g., based on HLM, are available)

Inference Models and Statistical Analyses Cluster Randomized Design You can see that the error term has more df under fixed effects model If there is a treatment effect the average value of the F- statistic is also larger under the fixed effects model It is bigger by a factor proportional to where ω B = σ BxT 2 /σ B 2 is a treatment heterogeneity parameter and ρ B and ρ C are the block and cluster level intraclass correlations, respectively and

Possible Statistical Analyses Cluster Randomized Design Possible statistical analyses 1.Ignore the blocking 2.Include blocks as fixed effects 3.Include blocks as random effects Consequences depend on whether you want to make a conditional or unconditional inference

Making Unconditional Inferences Cluster Randomized Design Possible statistical analyses 1.Ignore the blocking Bad idea: Will inflate significance levels of tests for treatment effects substantially 2.Include blocks as fixed effects Bad idea: Will inflate significance levels of tests for treatment effects substantially 3.Include blocks as random effects Correct significance levels but less power than conditional analysis

Making Conditional Inferences Cluster Randomized Design Possible statistical analyses 1.Ignore the blocking Bad idea: May deflate actual significance levels of tests for treatment effects substantially 2.Include blocks as fixed effects Correct significance levels and more powerful test than for unconditional analysis 3.Include blocks as random effects Not such a bad idea: significance levels unaffected

Multi-center (Randomized Blocks) Design The issues about blocking in the multicenter (randomized blocks) design are the same as in the cluster randomized design The inference model will determine the most appropriate statistical analysis Examining the properties of the statistical analysis may also reveal the weakness of the design for a given inference purpose For example, a small number of blocks may provide only very uncertain inference to a universe of blocks based on sampling arguments

Multi-center (Randomized Blocks) Design In this case you are adding a blocking factor crossed with treatment (p blocks) and clusters, but clusters are still nested within blocks [here C ij is the j th cluster in the i th block] Note that there are m clusters in each treatment per block and n individuals in each treatment in each cluster Block 1Block p C 11 … C1mC1m … Cp1Cp1 … C pm T … … … C

Multi-center (Randomized Blocks) Design How does this impact the analysis? Think about a balanced design with 2mn students per block and p blocks n individuals per cell and the ANOVA partitioning of sums of squares and degrees of freedom Original partitioning SS Total = SS T + SS C + SS TxC + SS WC df Total = df T + df C + df TxC + df WC 2pmn – 1 = 1 + (pm – 1) + (pm – 1) + 2pm(n – 1) Original test statistic F = SS T /(SS TxC /df TxC )

Multi-center (Randomized Blocks) Design New partitioning SS Total = SS T + SS B + SS C:B + SS BxT + SS C:BxT + SS WC df Total = df T + df B + df C:B + df BxT + df C:BxT + df WC 2mpn – 1 = 1+ (p – 1) + p(m – 1) + (p – 1) +2p(m – 1) +2pm (n – 1) New test statistic ? F = SS T /(SS WC /df WC ) F = SS T /(SS BxT /df BxT )

Inference Models and Statistical Analyses Randomized Blocks Design Blocks are fixed under the conditional inference models, but clusters are typically random In this case the correct test statistic is F C = SS T /(SS C:BxT /df C:BxT ) and the F-distribution has 1 & p(m – 1) df Blocks are random under the unconditional inference model, but clusters are typically random In this case the correct test statistic is F U = SS T /(SS BxT /df BxT ) and the F-distribution has 1 & (p – 1) df

Inference Models and Statistical Analyses Randomized Blocks Design You can see that the error term has more df under fixed effects model If there is a treatment effect the average value of the F-statistic is also larger under the fixed effects model It is bigger by a factor proportional to where ω B = σ BxT 2 /σ B 2 and ω C = σ CxT 2 /σ C 2 are treatment heterogeneity parameters and ρ B and ρ C are the block and cluster level intraclass correlations, respectively and

Possible Statistical Analyses Randomized Blocks Design Possible statistical analyses 1.Ignore the blocking 2.Include blocks as fixed effects 3.Include blocks as random effects Consequences depend on whether you want to make a conditional or unconditional inference

Making Unconditional Inferences Randomized Blocks Design Possible statistical analyses 1.Ignore the blocking Bad idea: Will inflate significance levels of tests for treatment effects substantially 2.Include blocks as fixed effects Bad idea: Will inflate significance levels of tests for treatment effects substantially 3.Include blocks as random effects Correct significance levels but less power than conditional analysis

Making Conditional Inference Randomized Blocks Design Possible statistical analyses 1.Ignore the blocking Bad idea: May deflate actual significance levels of tests for treatment effects substantially 2.Include blocks as fixed effects Correct significance levels and more powerful test than for unconditional analysis 3.Include blocks as random effects Bad idea: May deflate significance levels and reduce power

Another Easy Question There was some attrition from my study after assignment. Does that cause a serious problem? This is another simple question, but the answer is far from simple. One answer can be framed using concepts of experimental design

Post Assignment Attrition A different question has a simple answer: Does that (attrition) cause a problem in principle? The simple answer to that question is YES! Randomized experiments with attrition no longer give model free, unbiased estimates of the causal effect of treatment Whether the bias is serious or not depends (on the model that generates the missing data)

Post Assignment Attrition The design is changed by adding a crossed factor corresponding to missingness like this Now we can see a problem with estimating treatment effect from only the observed part of the design: The observed treatment effect is only part of the total treatment effect ObservedMissing T C

Post Assignment Attrition Suppose that the means are given by the μ’s and the proportions are given by the π’s Observed Missing ProportionMean ProportionMean T μ TO μ TM C μ CO μ CM

Post Assignment Attrition The treatment effect on all individuals randomized is When the proportion of dropouts is equal in T and C so that π T = π C = π The mean of the treatment effect on all individuals randomized is

Post Assignment Attrition Rewriting this we see that the average treatment effect for individuals assigned to treatment is where δ O is the treatment effect among the individuals that are observed and δ M is the treatment effect among the individuals that are not observed and δ is the treatment effect among all individuals assigned Thus bounds on δ M imply bounds on δ l

Post Assignment Attrition No estimate of the treatment effect is possible without an estimate of the treatment effect among the missing individuals One possibility is to model (assume) that we know something about the treatment effect in the missing individuals We can assume a range of values to get bounds on the possible treatment effect

Post Assignment Attrition When attrition rate is not the same in the treatment groups (π T ≠ π C ) the analysis is trickier One idea is to convince ourselves that the treatment effect for those who drop out is the same as those who do not Observed Missing Mean T9033 C6710 T-C23

Post Assignment Attrition This does not assure that attrition has not altered the treatment effect l Observed Missing Mean T9033 C6710 T-C23

Post Assignment Attrition This does not assure that attrition has not altered the treatment effect We have to know both μ TM and μ CM to identify the treatment effect, knowing δ M = (μ TM – μ CM ) is not enough Observed Missing Total nMean n n T C T-C23 -23

Post Assignment Attrition Suppose that B L TM and B L CM are lower bounds on the means for missing individuals in the treatment group and B U TM and B U CM are the upper bounds Then the upper and lower bounds on the treatment effect are Lower Upper

Post Assignment Attrition Note that none of the results on attrition involve sampling or estimation error Results get more complex if we take this into account, but the basic ideas are those here

Conclusions Many simple questions arise in connection with field experiments The answers to these questions often require thinking through complex aspects of the design the inference model assumptions about missing data No correct answers are possible without recognizing these complexities