Recent Developments in the Design of Cluster Randomized Trials Society of Clinical Trials & International Clinical Trials Methodology Conference May 9th 2017 Elizabeth L. Turner Assistant Professor, Department of Biostatistics & Bioinformatics, Duke University Director, Research Design and Analysis Core, Duke Global Health Institute (DGHI) Joint work with Fan Li, Duke University John Gallis, Duke University Melanie Prague, INRIA, Bordeaux & Harvard, USA David Murray, Office of Disease Prevention and Office of the Director, NIH Funding: National Institutes of Health grants R01 HD075875, R37 AI51164, R01 AI110478, K01 MH104310
Motivating example Cluster Randomized Trials (CRTs) in a Nutshell This is a talk on “developments” but important to start with the basics especially for those who are new to the field.
Malaria screening and treatment Motivating Example Health and Literacy Intervention Cluster Randomized Controlled Trial (CRT) Malaria screening and treatment Malaria Hypothesis: school-based screening and treating children for malaria will lead to reduced prevalence of malaria
Motivating Example Health and Literacy Intervention Cluster Randomized Controlled Trial (CRT) ? Age, bed-net use, geographic location etc. Malaria screening and treatment Malaria
Motivating Example Health and Literacy Intervention Cluster Randomized Controlled Trial (CRT) Randomization Age, bed-net use, geographic location etc. Malaria screening and treatment Malaria
Motivating Example Health and Literacy Intervention Cluster Randomized Controlled Trial (CRT) Malaria screening and treatment Randomization of schools (clusters) Outcomes measured on children (individuals nested in schools) Malaria
(individuals nested in schools) Motivating Example Health and Literacy Intervention Cluster Randomized Controlled Trial (CRT) School Randomization of schools (clusters) Outcomes measured on children (individuals nested in schools) Child
Motivating Example Health and Literacy Intervention Cluster Randomized Controlled Trial (CRT) School Randomization of schools (clusters) Outcomes measured on children (individuals nested in schools) Child Repeated measurements on children (time nested in individuals) Baseline 12 mth 24 mth
Why Cluster Randomization? Intervention at group-level School-based screening and treatment Intervention manipulates environment Water pump in village Logistical and practical reasons Ease of implementation For infectious diseases To account for herd immunity To exploit network structures Lower statistical efficiency (bigger SE), so why use it? Note: Blinding is often not possible in CRT
May only have follow-up measurements and/or more follow-up time points Parallel-Arm CRT Baseline Follow-up May only have follow-up measurements and/or more follow-up time points In all examples will consider 20 individuals at each time point with 5 individuals in 4 clusters. Other variants exist for each design. Could be cohort or cross-sectional. Alternatives: All involve randomization and some form of clustering that must be appropriately accounted for in both the design and analysis.
Network-Randomized CRT Baseline Follow-up Like a regular parallel-arm CRT with clusters defined by a network of individuals around index case Expect to be a cohort. Specific examples: snowball CRT and ring trial
Network-Randomized CRT Baseline Follow-up Example: Ring trial of Ebola vaccine (cf. Ellenberg plenary talk yesterday) Expect to be a cohort. Specific examples: snowball CRT and ring trial
Outline and Objectives
Objectives Highlight recent developments via four design challenges – focus parallel-arm CRT Clustering Change in individual vs. change in population Baseline imbalance: covariates and cluster size Sample size and power Describe alternative clustered designs that use randomization What I’ve tried to do with this talk is to make it accessible to people who know very little about cluster RCTs but to hopefully also brings some ideas and thoughts to those who are more experienced, possibly even than me. The goal is to give an introduction and to highlight some key challenges in the design and analysis of CRT, informed by my personal experience. It won’t be an exhaustive list and I am sure some of you may have even more experience than me and can contribute other. I’ll aim to speak for 45-50 mins and am happy to take questions along the way. I can tune what I cover towards the end depending on how things have gone along the way. Please don’t hesitate to ask for clarification. 1. Clustering: will talk about the nature of it, what it is, implications for design and for analysis 2. Baseline balance (or imbalance): CRT, especially in health care systems might have very few clusters and therefore increases probability of chance imbalance 3. Ethical and acceptability issues: in some cases it might not be possible to obtain individual consent. Is this acceptable to the community and/or the individuals. Also, many CRT evaluate effectiveness so that they want to determine whether an intervention works in real life. It might be difficult to convince communities to participate if they think there’s only a 50% chance of getting what they think will be an effective intervention. Instead, it might be more acceptable to run some kind of crossover study so that all clusters eventually receive the intervention. An increasingly common one is called the stepped wedge design. 4. Cohort vs. cross-sectional design: individually randomized trials are cohort by definition (I think) whereas might do something differently with CRT e.g. in ongoing study in Kenya, we want to know about the population-level health effects of a drug subsidy program to improve targeting of anti-malarials at the community level so we need to understand the prevalence of malaria in a region and what is the impact of the itnervention acorss the population of those who have a febrile illness. It wouldn’t make sense to only recruit a cohort at baseline and then follow only that cohort over time. We want to appropriately capture outcomes for those who need to be treated i.e. those with malaria and this could be different people at different points in time 5. Selection bias and blinding – I won’t say much about this but it is related to baseline balance. Can be difficult to blind communities and individuals to the Possible solutions are also pseudo-cluster randomization 6. Could also add things like implementation and community buy-in and other challenges to loss to follow-up Could also list other things such as: measurement bias and measurement error
References – See Article Online first at American Journal of Public Health
1. Parallel-Arm CRT
Design Challenge 1 Clustering
Baseline Clustering: Malaria Prevalence by School Additional challenge: structural missingness Halliday (2012), Tropical Medicine & International Health, 17(5): 532-549
Measure of Clustering: ICC (⍴) Intra-cluster correlation coefficient (ICC) (Eldridge 2009) Most commonly used measure for CRT Range: 0-1; typically < 0.2 in CRT CONSORT: report in published trials Need estimate of clustering for sample size Many articles now published these. Our study: 0.01 for malaria prevalence
Complete Clustering: ⍴ = 1 10 clusters (e.g. 10 schools) of 5 children each. 4 clusters with 100% prevalence, 6 clusters with no malaria i.e. 0% prevalence Malaria No malaria >1 child /school gives no more information than 1 child/school since every child in a given school has the same outcome
No Clustering: ⍴ = 0 Malaria 20% prevalence of malaria in each school No malaria 20% prevalence of malaria in each school No structure by school - more like a random sample of children
Some Clustering: 0 < ⍴ < 1 Malaria No malaria A more typical situation: e.g. prevalence ranges from 0% - 80%
Clustering in CRT Participants same cluster more similar than to participants in other clusters Implications: effective sample size 50 children/10 schools effective samp. size: 10-50 Major challenge in design & analysis
Alternative Measure of Clustering: CV Outcome measure ICC, ⍴ CV, k Relationship ICC to CV Continuous Binary* In HALI: And overall variance (binomial variance) assumed to be 0.2(1-0.2) = 0.2x0.8 Therefore: ICC= (0.2x0.2)^2 / (0.2x0.8) = 0.2^3/0.8 = 0.01 * Alternative ICC definitions – Eldridge et al, Int. Stat Review 2009
Other Developments in Clustering Sample size calculations Many sample size calculations use ICC Hayes & Moulton (2009) focus more on CV Donner & Klar (2000) agree for rates Imprecision in clustering measure Under-power CRT Several articles address ICC In HALI: And overall variance (binomial variance) assumed to be 0.2(1-0.2) = 0.2x0.8 Therefore: ICC= (0.2x0.2)^2 / (0.2x0.8) = 0.2^3/0.8 = 0.01
Design Challenge 1: Clustering Solution Design for it (inflate sample size) & Analyze data accounting for it
Design Challenge 2 Change in individuals vs. change in population
Cohort vs. cross-sectional design Ongoing CRT a – intervention to improve targeting of anti-malarials Interested in population-level impact Repeated cross-sectional surveys Motivating example b – School-based malaria screening and treatment Interested in ability to clear recurrent infections Closed cohort of children Ongoing CRT in South Africa c - Home-based HIV testing Interested in linkage to care Open cohort a Laktabai, BMJ Open, 2017; b Halliday, Plos Medicine, 2015; c Iwuji, Plos Med, 2016
Design Challenge 2 Change in individuals vs Design Challenge 2 Change in individuals vs. change in population Solution Match design to research question, power accordingly & Analyze appropriately
Design Challenge 3 Baseline Imbalance My impression is that this might not be thought about enough
Design Challenge 3A Baseline Cluster Size Imbalance My impression is that this might not be thought about enough
Baseline imbalance in cluster size Definition: Clusters have different sizes 10 clusters (e.g. 10 schools) of 5 children each. 4 clusters with 100% prevalence, 6 clusters with no malaria i.e. 0% prevalence
Baseline Cluster Size Imbalance in CRT Implications for efficiency Account for in design power Sample size adjustments based on Cluster size characteristics Relative efficiency Account for in analysis Type I error If you don’t account in design, have lower power 40-43 If don’t account for in analysis, inflated Type I error 44
Design Challenge 3A Baseline Cluster Size Imbalance Solution Design with equal cluster sizes or Accommodate in sample size calculations & analysis
Design Challenge 3B Baseline Covariate Imbalance My impression is that this might not be thought about enough
Motivating Example Health and Literacy Intervention Cluster Randomized Controlled Trial (CRT) Age, bed-net use, geographic location etc. Malaria screening and treatment By chance Malaria
Baseline Covariate Imbalance Threat to internal validity of trial Next talk - design strategies (Fan Li) Matching Stratification Covariate-constrained randomization
Design Challenge 3B Baseline Covariate Imbalance Solution Use matching, stratification or constrained randomization & Analyze accounting for the design strategy chosen
Summary Design challenges and solutions
Design challenges and solutions Design solution Analytic solution 1. Clustering Account for in sample size Account for clustering 2. Cohort vs. cross-sectional design Match to research question Match to design 3. Baseline imbalance A. Cluster size Avoid it or account for it Account for it B. Covariates Pair matching Stratification Constrained randomization Restrict inference space e.g. paired t-test
Design challenges and solutions Design solution Analytic solution 1. Clustering Account for in sample size Account for clustering 2. Cohort vs. cross-sectional design Match to research question Match to design 3. Baseline imbalance A. Cluster size Avoid it or account for it Account for it B. Covariates Pair matching Stratification Constrained randomization Restrict inference space e.g. paired t-test
Design challenges and solutions Design solution Analytic solution 1. Clustering Account for in sample size Account for clustering 2. Cohort vs. cross-sectional design Match to research question Match to design 3. Baseline imbalance A. Cluster size Avoid it or account for it Account for it B. Covariates Pair matching Stratification Constrained randomization Restrict inference space e.g. paired t-test
Design challenges and solutions Design solution Analytic solution 1. Clustering Account for in sample size Account for clustering 2. Cohort vs. cross-sectional design Match to research question Match to design 3. Baseline imbalance A. Cluster size Avoid it or account for it Account for it B. Covariates Pair matching Stratification Constrained randomization Restrict inference space e.g. paired t-test
Design challenges and solutions Design solution Analytic solution 1. Clustering Account for in sample size Account for clustering 2. Cohort vs. cross-sectional design Match to research question Match to design 3. Baseline imbalance A. Cluster size Avoid it or account for it Account for it B. Covariates Pair matching Stratification Constrained randomization Restrict inference space Implications for sample size and power e.g. paired t-test
Sample Size and Power for CRTs My impression is that this might not be thought about enough
CRT Sample Size and Power Ignore positive ICC inflated Type I error Two penalties in cluster randomization (Cornfield1978) Extra variation Limited degrees of freedom (df) Account for both in sample size, e.g., Variance inflated by ICC & t-test with appropriate df Inadequate RCT sample size inflated by design effect: 1+ (m-1)⍴ Many developments both in theory and implementation in software
CRT Sample Size and Power Many recent developments Fixed # clusters Repeated measurements over time Additional structural hierarchies of clustering Varying (i.e. imbalanced) cluster size Two recent reviews Rutterford, Int J Epi., 2015 Gao, Contemp Clin Trials, 2015 Recent book: Moerbeek & Teerenstra Many developments both in theory and implementation in software
What if the parallel-arm CRT design does not meet research needs?
2. Alternatives to the Parallel-Arm CRT
May only have follow-up measurements and/or more follow-up time points Parallel-Arm CRT Baseline Follow-up May only have follow-up measurements and/or more follow-up time points In all examples will consider 20 individuals at each time point with 5 individuals in 4 clusters. Other variants exist for each design. Could be cohort or cross-sectional. Alternatives: All involve randomization and some form of clustering that must be appropriately accounted for in both the design and analysis.
Crossover CRT Time 1 Time 2 Randomized: control intervention intervention control Some features Incomplete SW design has same number of measurements as the parallel design but has the added challenge of a possible time effect that could confound the relationship. There is at least some within-cluster and between-cluster information at each time point but certainly need to somehow account for time in the analysis. Complete SW design has many more measures and how to apportion post one-year intervention period
Stepped Wedge CRT Baseline Follow-up Intervention periods Some features Incomplete SW design has same number of measurements as the parallel design but has the added challenge of a possible time effect that could confound the relationship. There is at least some within-cluster and between-cluster information at each time point but certainly need to somehow account for time in the analysis. Complete SW design has many more measures and how to apportion post one-year intervention period Control periods
Pseudo-Cluster Randomized Trial Baseline Follow-up Randomized: Predominantly intervention Randomized: predominantly control 3. Ethical and acceptability issues: in some cases it might not be possible to obtain individual consent. Is this acceptable to the community and/or the individuals. Also, many CRT evaluate effectiveness so that they want to determine whether an intervention works in real life. It might be difficult to convince communities to participate if they think there’s only a 50% chance of getting what they think will be an effective intervention. Instead, it might be more acceptable to run some kind of crossover study so that all clusters eventually receive the intervention. An increasingly common one is called the stepped wedge design.
Individual Randomization Individually Randomized Group Treatment Trial Baseline Follow-up Individual Randomization 3. Ethical and acceptability issues: in some cases it might not be possible to obtain individual consent. Is this acceptable to the community and/or the individuals. Also, many CRT evaluate effectiveness so that they want to determine whether an intervention works in real life. It might be difficult to convince communities to participate if they think there’s only a 50% chance of getting what they think will be an effective intervention. Instead, it might be more acceptable to run some kind of crossover study so that all clusters eventually receive the intervention. An increasingly common one is called the stepped wedge design.
Clustering for some individuals due to group intervention Individually Randomized Group Treatment Trial Baseline Follow-up Clustering for some individuals due to group intervention 3. Ethical and acceptability issues: in some cases it might not be possible to obtain individual consent. Is this acceptable to the community and/or the individuals. Also, many CRT evaluate effectiveness so that they want to determine whether an intervention works in real life. It might be difficult to convince communities to participate if they think there’s only a 50% chance of getting what they think will be an effective intervention. Instead, it might be more acceptable to run some kind of crossover study so that all clusters eventually receive the intervention. An increasingly common one is called the stepped wedge design.
Rationale for Alternative Clustered Designs # Published studies Crossover CRT Gain power ~ 90 (Arnup 2016) Stepped-Wedge Logistical / Ethical ~ 50 (Hemming 2017) Pseudo-Cluster Selection Bias < 10 (Turner 2017) IRGT Trial No clusters a priori >32 (Pals 2008)
Summary What I’ve tried to do with this talk is to make it accessible to people who know very little about cluster RCTs but to hopefully also brings some ideas and thoughts to those who are more experienced, possibly even than me. The goal is to give an introduction and to highlight some key challenges in the design and analysis of CRT, informed by my personal experience. It won’t be an exhaustive list and I am sure some of you may have even more experience than me and can contribute other. I’ll aim to speak for 45-50 mins and am happy to take questions along the way. I can tune what I cover towards the end depending on how things have gone along the way. Please don’t hesitate to ask for clarification. 1. Clustering: will talk about the nature of it, what it is, implications for design and for analysis 2. Baseline balance (or imbalance): CRT, especially in health care systems might have very few clusters and therefore increases probability of chance imbalance 3. Ethical and acceptability issues: in some cases it might not be possible to obtain individual consent. Is this acceptable to the community and/or the individuals. Also, many CRT evaluate effectiveness so that they want to determine whether an intervention works in real life. It might be difficult to convince communities to participate if they think there’s only a 50% chance of getting what they think will be an effective intervention. Instead, it might be more acceptable to run some kind of crossover study so that all clusters eventually receive the intervention. An increasingly common one is called the stepped wedge design. 4. Cohort vs. cross-sectional design: individually randomized trials are cohort by definition (I think) whereas might do something differently with CRT e.g. in ongoing study in Kenya, we want to know about the population-level health effects of a drug subsidy program to improve targeting of anti-malarials at the community level so we need to understand the prevalence of malaria in a region and what is the impact of the itnervention acorss the population of those who have a febrile illness. It wouldn’t make sense to only recruit a cohort at baseline and then follow only that cohort over time. We want to appropriately capture outcomes for those who need to be treated i.e. those with malaria and this could be different people at different points in time 5. Selection bias and blinding – I won’t say much about this but it is related to baseline balance. Can be difficult to blind communities and individuals to the Possible solutions are also pseudo-cluster randomization 6. Could also add things like implementation and community buy-in and other challenges to loss to follow-up Could also list other things such as: measurement bias and measurement error
Objectives Highlight recent developments via four design challenges – focus parallel-arm CRT Clustering Change in individual vs. change in population Baseline imbalance: covariates and cluster size Sample size and power Describe alternative clustered designs that use randomization What I’ve tried to do with this talk is to make it accessible to people who know very little about cluster RCTs but to hopefully also brings some ideas and thoughts to those who are more experienced, possibly even than me. The goal is to give an introduction and to highlight some key challenges in the design and analysis of CRT, informed by my personal experience. It won’t be an exhaustive list and I am sure some of you may have even more experience than me and can contribute other. I’ll aim to speak for 45-50 mins and am happy to take questions along the way. I can tune what I cover towards the end depending on how things have gone along the way. Please don’t hesitate to ask for clarification. 1. Clustering: will talk about the nature of it, what it is, implications for design and for analysis 2. Baseline balance (or imbalance): CRT, especially in health care systems might have very few clusters and therefore increases probability of chance imbalance 3. Ethical and acceptability issues: in some cases it might not be possible to obtain individual consent. Is this acceptable to the community and/or the individuals. Also, many CRT evaluate effectiveness so that they want to determine whether an intervention works in real life. It might be difficult to convince communities to participate if they think there’s only a 50% chance of getting what they think will be an effective intervention. Instead, it might be more acceptable to run some kind of crossover study so that all clusters eventually receive the intervention. An increasingly common one is called the stepped wedge design. 4. Cohort vs. cross-sectional design: individually randomized trials are cohort by definition (I think) whereas might do something differently with CRT e.g. in ongoing study in Kenya, we want to know about the population-level health effects of a drug subsidy program to improve targeting of anti-malarials at the community level so we need to understand the prevalence of malaria in a region and what is the impact of the itnervention acorss the population of those who have a febrile illness. It wouldn’t make sense to only recruit a cohort at baseline and then follow only that cohort over time. We want to appropriately capture outcomes for those who need to be treated i.e. those with malaria and this could be different people at different points in time 5. Selection bias and blinding – I won’t say much about this but it is related to baseline balance. Can be difficult to blind communities and individuals to the Possible solutions are also pseudo-cluster randomization 6. Could also add things like implementation and community buy-in and other challenges to loss to follow-up Could also list other things such as: measurement bias and measurement error
Thank you Any questions? References listed “Online First” in American Journal of Public Health