Recent Developments in the Design of Cluster Randomized Trials Society of Clinical Trials & International Clinical Trials Methodology Conference May.

Slides:

Advertisements

Similar presentations

Agency for Healthcare Research and Quality (AHRQ)

Advertisements

Sample size issues & Trial Quality David Torgerson.

GROUP-LEVEL DESIGNS Chapter 9.

Validity, Sampling & Experimental Control Psych 231: Research Methods in Psychology.

Who are the participants? Creating a Quality Sample 47:269: Research Methods I Dr. Leonard March 22, 2010.

Journal Club Alcohol and Health: Current Evidence March-April 2006.

Clustered or Multilevel Data

Sample Size Determination

Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.

Today: Our process Assignment 3 Q&A Concept of Control Reading: Framework for Hybrid Experiments Sampling If time, get a start on True Experiments: Single-Factor.

From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.

EBC course 10 April 2003 Critical Appraisal of the Clinical Literature: The Big Picture Cynthia R. Long, PhD Associate Professor Palmer Center for Chiropractic.

Article Review Cara Carty 09-Mar-06. “Confounding by indication in non-experimental evaluation of vaccine effectiveness: the example of prevention of.

How to find a paper Looking for a known paper: –Field search: title, author, journal, institution, textwords, year (each has field tags) Find a paper to.

Introduction to General Epidemiology (2) By: Dr. Khalid El Tohami.

David M. Murray, Ph.D. Associate Director for Prevention Director, Office of Disease Prevention Multilevel Intervention Research Methodology September.

Epidemiological Study Designs And Measures Of Risks (1)

PROJECT MANAGEMENT MILESTONES HOW DESIGN AND DEVELOP A PROJECT.

Study Design: Making research pretty Adam P. Sima, PhD July 13, 2016

Exercise 2-Effect Size Coding

Critically Appraising a Medical Journal Article

CHAPTER 4 Designing Studies

How to Critically Appraise Literature

Core Competencies: Choosing Study Design

Present: Disease Past: Exposure

How to read a paper D. Singh-Ranger.

Clinical Studies Continuum

Biostatistics Case Studies 2016

CHAPTER 4 Designing Studies

Randomized Trials: A Brief Overview

Research Designs, Threats to Validity and the Hierarchy of Evidence and Appraisal of Limitations (HEAL) Grading System.

SAMPLING (Zikmund, Chapter 12.

Questions: Are behavioral measures less valid and less reliable due to the amount of error that can occur during the tests compared to the other measures?

Research methods Lesson 2.

Critical Reading of Clinical Study Results

Reading Research Papers-A Basic Guide to Critical Analysis

Re-randomising patients within clinical trials

Power, Sample Size, & Effect Size:

Kin 304 Inferential Statistics

S1316 analysis details Garnet Anderson Katie Arnold

Experiments and Observational Studies

Pragmatic Trial Designs

Pilot Studies: What we need to know

Chair and Discussant: Karla Hemming University of Birmingham

Designing Experiments

Epidemiological Modeling to Guide Efficacy Study Design Evaluating Vaccines to Prevent Emerging Diseases An Vandebosch, PhD Joint Statistical meetings,

Experimental Design.

Chair and Discussant: Karla Hemming University of Birmingham

Experimental Design.

Review – First Exam Chapters 1 through 5

Implementation Challenges

How EBM brings the connection between evidence and measurement into focus. Benjamin Smart.

BU Career Development Grant Writing Course- Session 3, Approach

Randomization This presentation draws on previous presentations by Muna Meky, Arianna Legovini, Jed Friedman, David Evans and Sebastian Martinez.

Dr. Matthew Keough August 8th, 2018 Summer School

Scientific Method Steps

Critical Appraisal & Literature review

Program Evaluation, Archival Research, and Meta-Analytic Designs

Introduction to Experimental Design

Evaluating Impacts: An Overview of Quantitative Methods

Randomization This presentation draws on previous presentations by Muna Meky, Arianna Legovini, Jed Friedman, David Evans and Sebastian Martinez.

Sampling and Power Slides by Jishnu Das.

Project Title Subtitle: make sure you specify it is a research project

Cluster Randomized Trials and The Stepped Wedge

Sample Sizes for IE Power Calculations.

Chapter 4 Summary.

STEPS Site Report.

MGS 3100 Business Analysis Regression Feb 18, 2016

Critical Appraisal & Literature review

Cluster Crossovers with Multiple periods

Presentation transcript:

Recent Developments in the Design of Cluster Randomized Trials Society of Clinical Trials & International Clinical Trials Methodology Conference May 9th 2017 Elizabeth L. Turner Assistant Professor, Department of Biostatistics & Bioinformatics, Duke University Director, Research Design and Analysis Core, Duke Global Health Institute (DGHI) Joint work with Fan Li, Duke University John Gallis, Duke University Melanie Prague, INRIA, Bordeaux & Harvard, USA David Murray, Office of Disease Prevention and Office of the Director, NIH Funding: National Institutes of Health grants R01 HD075875, R37 AI51164, R01 AI110478, K01 MH104310

Motivating example Cluster Randomized Trials (CRTs) in a Nutshell This is a talk on “developments” but important to start with the basics especially for those who are new to the field.

Malaria screening and treatment Motivating Example Health and Literacy Intervention Cluster Randomized Controlled Trial (CRT) Malaria screening and treatment Malaria Hypothesis: school-based screening and treating children for malaria will lead to reduced prevalence of malaria

Motivating Example Health and Literacy Intervention Cluster Randomized Controlled Trial (CRT) ? Age, bed-net use, geographic location etc. Malaria screening and treatment Malaria

Motivating Example Health and Literacy Intervention Cluster Randomized Controlled Trial (CRT) Randomization Age, bed-net use, geographic location etc. Malaria screening and treatment Malaria

Motivating Example Health and Literacy Intervention Cluster Randomized Controlled Trial (CRT) Malaria screening and treatment Randomization of schools (clusters) Outcomes measured on children (individuals nested in schools) Malaria

(individuals nested in schools) Motivating Example Health and Literacy Intervention Cluster Randomized Controlled Trial (CRT) School Randomization of schools (clusters) Outcomes measured on children (individuals nested in schools) Child

Motivating Example Health and Literacy Intervention Cluster Randomized Controlled Trial (CRT) School Randomization of schools (clusters) Outcomes measured on children (individuals nested in schools) Child Repeated measurements on children (time nested in individuals) Baseline 12 mth 24 mth

Why Cluster Randomization? Intervention at group-level School-based screening and treatment Intervention manipulates environment Water pump in village Logistical and practical reasons Ease of implementation For infectious diseases To account for herd immunity To exploit network structures Lower statistical efficiency (bigger SE), so why use it? Note: Blinding is often not possible in CRT

May only have follow-up measurements and/or more follow-up time points Parallel-Arm CRT Baseline Follow-up May only have follow-up measurements and/or more follow-up time points In all examples will consider 20 individuals at each time point with 5 individuals in 4 clusters. Other variants exist for each design. Could be cohort or cross-sectional. Alternatives: All involve randomization and some form of clustering that must be appropriately accounted for in both the design and analysis.

Network-Randomized CRT Baseline Follow-up Like a regular parallel-arm CRT with clusters defined by a network of individuals around index case Expect to be a cohort. Specific examples: snowball CRT and ring trial

Network-Randomized CRT Baseline Follow-up Example: Ring trial of Ebola vaccine (cf. Ellenberg plenary talk yesterday) Expect to be a cohort. Specific examples: snowball CRT and ring trial

Outline and Objectives

Objectives Highlight recent developments via four design challenges – focus parallel-arm CRT Clustering Change in individual vs. change in population Baseline imbalance: covariates and cluster size Sample size and power Describe alternative clustered designs that use randomization What I’ve tried to do with this talk is to make it accessible to people who know very little about cluster RCTs but to hopefully also brings some ideas and thoughts to those who are more experienced, possibly even than me. The goal is to give an introduction and to highlight some key challenges in the design and analysis of CRT, informed by my personal experience. It won’t be an exhaustive list and I am sure some of you may have even more experience than me and can contribute other. I’ll aim to speak for 45-50 mins and am happy to take questions along the way. I can tune what I cover towards the end depending on how things have gone along the way. Please don’t hesitate to ask for clarification. 1. Clustering: will talk about the nature of it, what it is, implications for design and for analysis 2. Baseline balance (or imbalance): CRT, especially in health care systems might have very few clusters and therefore increases probability of chance imbalance 3. Ethical and acceptability issues: in some cases it might not be possible to obtain individual consent. Is this acceptable to the community and/or the individuals. Also, many CRT evaluate effectiveness so that they want to determine whether an intervention works in real life. It might be difficult to convince communities to participate if they think there’s only a 50% chance of getting what they think will be an effective intervention. Instead, it might be more acceptable to run some kind of crossover study so that all clusters eventually receive the intervention. An increasingly common one is called the stepped wedge design. 4. Cohort vs. cross-sectional design: individually randomized trials are cohort by definition (I think) whereas might do something differently with CRT e.g. in ongoing study in Kenya, we want to know about the population-level health effects of a drug subsidy program to improve targeting of anti-malarials at the community level so we need to understand the prevalence of malaria in a region and what is the impact of the itnervention acorss the population of those who have a febrile illness. It wouldn’t make sense to only recruit a cohort at baseline and then follow only that cohort over time. We want to appropriately capture outcomes for those who need to be treated i.e. those with malaria and this could be different people at different points in time 5. Selection bias and blinding – I won’t say much about this but it is related to baseline balance. Can be difficult to blind communities and individuals to the Possible solutions are also pseudo-cluster randomization 6. Could also add things like implementation and community buy-in and other challenges to loss to follow-up Could also list other things such as: measurement bias and measurement error

References – See Article Online first at American Journal of Public Health

1. Parallel-Arm CRT

Design Challenge 1 Clustering

Baseline Clustering: Malaria Prevalence by School Additional challenge: structural missingness Halliday (2012), Tropical Medicine & International Health, 17(5): 532-549

Measure of Clustering: ICC (⍴) Intra-cluster correlation coefficient (ICC) (Eldridge 2009) Most commonly used measure for CRT Range: 0-1; typically < 0.2 in CRT CONSORT: report in published trials Need estimate of clustering for sample size Many articles now published these. Our study: 0.01 for malaria prevalence

Complete Clustering: ⍴ = 1 10 clusters (e.g. 10 schools) of 5 children each. 4 clusters with 100% prevalence, 6 clusters with no malaria i.e. 0% prevalence Malaria No malaria >1 child /school gives no more information than 1 child/school since every child in a given school has the same outcome

No Clustering: ⍴ = 0 Malaria 20% prevalence of malaria in each school No malaria 20% prevalence of malaria in each school No structure by school - more like a random sample of children

Some Clustering: 0 < ⍴ < 1 Malaria No malaria A more typical situation: e.g. prevalence ranges from 0% - 80%

Clustering in CRT Participants same cluster more similar than to participants in other clusters Implications: effective sample size 50 children/10 schools  effective samp. size: 10-50 Major challenge in design & analysis

Alternative Measure of Clustering: CV Outcome measure ICC, ⍴ CV, k Relationship ICC to CV Continuous Binary* In HALI: And overall variance (binomial variance) assumed to be 0.2(1-0.2) = 0.2x0.8 Therefore: ICC= (0.2x0.2)^2 / (0.2x0.8) = 0.2^3/0.8 = 0.01 * Alternative ICC definitions – Eldridge et al, Int. Stat Review 2009

Other Developments in Clustering Sample size calculations Many sample size calculations use ICC Hayes & Moulton (2009) focus more on CV Donner & Klar (2000) agree for rates Imprecision in clustering measure Under-power CRT Several articles address ICC In HALI: And overall variance (binomial variance) assumed to be 0.2(1-0.2) = 0.2x0.8 Therefore: ICC= (0.2x0.2)^2 / (0.2x0.8) = 0.2^3/0.8 = 0.01

Design Challenge 1: Clustering Solution Design for it (inflate sample size) & Analyze data accounting for it

Design Challenge 2 Change in individuals vs. change in population

Cohort vs. cross-sectional design Ongoing CRT a – intervention to improve targeting of anti-malarials Interested in population-level impact Repeated cross-sectional surveys Motivating example b – School-based malaria screening and treatment Interested in ability to clear recurrent infections Closed cohort of children Ongoing CRT in South Africa c - Home-based HIV testing Interested in linkage to care Open cohort a Laktabai, BMJ Open, 2017; b Halliday, Plos Medicine, 2015; c Iwuji, Plos Med, 2016

Design Challenge 2 Change in individuals vs Design Challenge 2 Change in individuals vs. change in population Solution Match design to research question, power accordingly & Analyze appropriately

Design Challenge 3 Baseline Imbalance My impression is that this might not be thought about enough

Design Challenge 3A Baseline Cluster Size Imbalance My impression is that this might not be thought about enough

Baseline imbalance in cluster size Definition: Clusters have different sizes 10 clusters (e.g. 10 schools) of 5 children each. 4 clusters with 100% prevalence, 6 clusters with no malaria i.e. 0% prevalence

Baseline Cluster Size Imbalance in CRT Implications for efficiency Account for in design  power Sample size adjustments based on Cluster size characteristics Relative efficiency Account for in analysis  Type I error If you don’t account in design, have lower power 40-43 If don’t account for in analysis, inflated Type I error 44

Design Challenge 3A Baseline Cluster Size Imbalance Solution Design with equal cluster sizes or Accommodate in sample size calculations & analysis

Design Challenge 3B Baseline Covariate Imbalance My impression is that this might not be thought about enough

Motivating Example Health and Literacy Intervention Cluster Randomized Controlled Trial (CRT) Age, bed-net use, geographic location etc. Malaria screening and treatment By chance Malaria

Baseline Covariate Imbalance Threat to internal validity of trial Next talk - design strategies (Fan Li) Matching Stratification Covariate-constrained randomization

Design Challenge 3B Baseline Covariate Imbalance Solution Use matching, stratification or constrained randomization & Analyze accounting for the design strategy chosen

Summary Design challenges and solutions

Design challenges and solutions Design solution Analytic solution 1. Clustering Account for in sample size Account for clustering 2. Cohort vs. cross-sectional design Match to research question Match to design 3. Baseline imbalance A. Cluster size Avoid it or account for it Account for it B. Covariates Pair matching Stratification Constrained randomization Restrict inference space e.g. paired t-test

Design challenges and solutions Design solution Analytic solution 1. Clustering Account for in sample size Account for clustering 2. Cohort vs. cross-sectional design Match to research question Match to design 3. Baseline imbalance A. Cluster size Avoid it or account for it Account for it B. Covariates Pair matching Stratification Constrained randomization Restrict inference space e.g. paired t-test

Design challenges and solutions Design solution Analytic solution 1. Clustering Account for in sample size Account for clustering 2. Cohort vs. cross-sectional design Match to research question Match to design 3. Baseline imbalance A. Cluster size Avoid it or account for it Account for it B. Covariates Pair matching Stratification Constrained randomization Restrict inference space e.g. paired t-test

Design challenges and solutions Design solution Analytic solution 1. Clustering Account for in sample size Account for clustering 2. Cohort vs. cross-sectional design Match to research question Match to design 3. Baseline imbalance A. Cluster size Avoid it or account for it Account for it B. Covariates Pair matching Stratification Constrained randomization Restrict inference space e.g. paired t-test

Design challenges and solutions Design solution Analytic solution 1. Clustering Account for in sample size Account for clustering 2. Cohort vs. cross-sectional design Match to research question Match to design 3. Baseline imbalance A. Cluster size Avoid it or account for it Account for it B. Covariates Pair matching Stratification Constrained randomization Restrict inference space Implications for sample size and power e.g. paired t-test

Sample Size and Power for CRTs My impression is that this might not be thought about enough

CRT Sample Size and Power Ignore positive ICC  inflated Type I error Two penalties in cluster randomization (Cornfield1978) Extra variation Limited degrees of freedom (df) Account for both in sample size, e.g., Variance inflated by ICC & t-test with appropriate df Inadequate RCT sample size inflated by design effect: 1+ (m-1)⍴ Many developments both in theory and implementation in software

CRT Sample Size and Power Many recent developments Fixed # clusters Repeated measurements over time Additional structural hierarchies of clustering Varying (i.e. imbalanced) cluster size Two recent reviews Rutterford, Int J Epi., 2015 Gao, Contemp Clin Trials, 2015 Recent book: Moerbeek & Teerenstra Many developments both in theory and implementation in software

What if the parallel-arm CRT design does not meet research needs?

2. Alternatives to the Parallel-Arm CRT

May only have follow-up measurements and/or more follow-up time points Parallel-Arm CRT Baseline Follow-up May only have follow-up measurements and/or more follow-up time points In all examples will consider 20 individuals at each time point with 5 individuals in 4 clusters. Other variants exist for each design. Could be cohort or cross-sectional. Alternatives: All involve randomization and some form of clustering that must be appropriately accounted for in both the design and analysis.

Crossover CRT Time 1 Time 2 Randomized: control  intervention intervention  control Some features Incomplete SW design has same number of measurements as the parallel design but has the added challenge of a possible time effect that could confound the relationship. There is at least some within-cluster and between-cluster information at each time point but certainly need to somehow account for time in the analysis. Complete SW design has many more measures and how to apportion post one-year intervention period

Stepped Wedge CRT Baseline Follow-up Intervention periods Some features Incomplete SW design has same number of measurements as the parallel design but has the added challenge of a possible time effect that could confound the relationship. There is at least some within-cluster and between-cluster information at each time point but certainly need to somehow account for time in the analysis. Complete SW design has many more measures and how to apportion post one-year intervention period Control periods

Pseudo-Cluster Randomized Trial Baseline Follow-up Randomized: Predominantly intervention Randomized: predominantly control 3. Ethical and acceptability issues: in some cases it might not be possible to obtain individual consent. Is this acceptable to the community and/or the individuals. Also, many CRT evaluate effectiveness so that they want to determine whether an intervention works in real life. It might be difficult to convince communities to participate if they think there’s only a 50% chance of getting what they think will be an effective intervention. Instead, it might be more acceptable to run some kind of crossover study so that all clusters eventually receive the intervention. An increasingly common one is called the stepped wedge design.

Individual Randomization Individually Randomized Group Treatment Trial Baseline Follow-up Individual Randomization 3. Ethical and acceptability issues: in some cases it might not be possible to obtain individual consent. Is this acceptable to the community and/or the individuals. Also, many CRT evaluate effectiveness so that they want to determine whether an intervention works in real life. It might be difficult to convince communities to participate if they think there’s only a 50% chance of getting what they think will be an effective intervention. Instead, it might be more acceptable to run some kind of crossover study so that all clusters eventually receive the intervention. An increasingly common one is called the stepped wedge design.

Clustering for some individuals due to group intervention Individually Randomized Group Treatment Trial Baseline Follow-up Clustering for some individuals due to group intervention 3. Ethical and acceptability issues: in some cases it might not be possible to obtain individual consent. Is this acceptable to the community and/or the individuals. Also, many CRT evaluate effectiveness so that they want to determine whether an intervention works in real life. It might be difficult to convince communities to participate if they think there’s only a 50% chance of getting what they think will be an effective intervention. Instead, it might be more acceptable to run some kind of crossover study so that all clusters eventually receive the intervention. An increasingly common one is called the stepped wedge design.

Rationale for Alternative Clustered Designs # Published studies Crossover CRT Gain power ~ 90 (Arnup 2016) Stepped-Wedge Logistical / Ethical ~ 50 (Hemming 2017) Pseudo-Cluster Selection Bias < 10 (Turner 2017) IRGT Trial No clusters a priori >32 (Pals 2008)

Summary What I’ve tried to do with this talk is to make it accessible to people who know very little about cluster RCTs but to hopefully also brings some ideas and thoughts to those who are more experienced, possibly even than me. The goal is to give an introduction and to highlight some key challenges in the design and analysis of CRT, informed by my personal experience. It won’t be an exhaustive list and I am sure some of you may have even more experience than me and can contribute other. I’ll aim to speak for 45-50 mins and am happy to take questions along the way. I can tune what I cover towards the end depending on how things have gone along the way. Please don’t hesitate to ask for clarification. 1. Clustering: will talk about the nature of it, what it is, implications for design and for analysis 2. Baseline balance (or imbalance): CRT, especially in health care systems might have very few clusters and therefore increases probability of chance imbalance 3. Ethical and acceptability issues: in some cases it might not be possible to obtain individual consent. Is this acceptable to the community and/or the individuals. Also, many CRT evaluate effectiveness so that they want to determine whether an intervention works in real life. It might be difficult to convince communities to participate if they think there’s only a 50% chance of getting what they think will be an effective intervention. Instead, it might be more acceptable to run some kind of crossover study so that all clusters eventually receive the intervention. An increasingly common one is called the stepped wedge design. 4. Cohort vs. cross-sectional design: individually randomized trials are cohort by definition (I think) whereas might do something differently with CRT e.g. in ongoing study in Kenya, we want to know about the population-level health effects of a drug subsidy program to improve targeting of anti-malarials at the community level so we need to understand the prevalence of malaria in a region and what is the impact of the itnervention acorss the population of those who have a febrile illness. It wouldn’t make sense to only recruit a cohort at baseline and then follow only that cohort over time. We want to appropriately capture outcomes for those who need to be treated i.e. those with malaria and this could be different people at different points in time 5. Selection bias and blinding – I won’t say much about this but it is related to baseline balance. Can be difficult to blind communities and individuals to the Possible solutions are also pseudo-cluster randomization 6. Could also add things like implementation and community buy-in and other challenges to loss to follow-up Could also list other things such as: measurement bias and measurement error

Objectives Highlight recent developments via four design challenges – focus parallel-arm CRT Clustering Change in individual vs. change in population Baseline imbalance: covariates and cluster size Sample size and power Describe alternative clustered designs that use randomization What I’ve tried to do with this talk is to make it accessible to people who know very little about cluster RCTs but to hopefully also brings some ideas and thoughts to those who are more experienced, possibly even than me. The goal is to give an introduction and to highlight some key challenges in the design and analysis of CRT, informed by my personal experience. It won’t be an exhaustive list and I am sure some of you may have even more experience than me and can contribute other. I’ll aim to speak for 45-50 mins and am happy to take questions along the way. I can tune what I cover towards the end depending on how things have gone along the way. Please don’t hesitate to ask for clarification. 1. Clustering: will talk about the nature of it, what it is, implications for design and for analysis 2. Baseline balance (or imbalance): CRT, especially in health care systems might have very few clusters and therefore increases probability of chance imbalance 3. Ethical and acceptability issues: in some cases it might not be possible to obtain individual consent. Is this acceptable to the community and/or the individuals. Also, many CRT evaluate effectiveness so that they want to determine whether an intervention works in real life. It might be difficult to convince communities to participate if they think there’s only a 50% chance of getting what they think will be an effective intervention. Instead, it might be more acceptable to run some kind of crossover study so that all clusters eventually receive the intervention. An increasingly common one is called the stepped wedge design. 4. Cohort vs. cross-sectional design: individually randomized trials are cohort by definition (I think) whereas might do something differently with CRT e.g. in ongoing study in Kenya, we want to know about the population-level health effects of a drug subsidy program to improve targeting of anti-malarials at the community level so we need to understand the prevalence of malaria in a region and what is the impact of the itnervention acorss the population of those who have a febrile illness. It wouldn’t make sense to only recruit a cohort at baseline and then follow only that cohort over time. We want to appropriately capture outcomes for those who need to be treated i.e. those with malaria and this could be different people at different points in time 5. Selection bias and blinding – I won’t say much about this but it is related to baseline balance. Can be difficult to blind communities and individuals to the Possible solutions are also pseudo-cluster randomization 6. Could also add things like implementation and community buy-in and other challenges to loss to follow-up Could also list other things such as: measurement bias and measurement error

Thank you Any questions? References listed “Online First” in American Journal of Public Health