Sample size and analytical issues for cluster trials David Torgerson Director, York Trials Unit

Slides:



Advertisements
Similar presentations
Appraisal of an RCT using a critical appraisal checklist
Advertisements

Sample size estimation
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Unequal Randomisation
Meta-analysis: summarising data for two arm trials and other simple outcome studies Steff Lewis statistician.
Sample size issues & Trial Quality David Torgerson.
LSU-HSC School of Public Health Biostatistics 1 Statistical Core Didactic Introduction to Biostatistics Donald E. Mercante, PhD.
Adapting Designs Professor David Torgerson University of York Professor Carole Torgerson Durham University.
Chapter 14 Comparing two groups Dr Richard Bußmann.
Randomised Controlled Trials in the Social Sciences Analysis of randomised trials Martin Bland Professor of Health Statistics University of York www-users.york.ac.uk/~mb55/
Estimation of Sample Size
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Cluster Randomised Trials. Background In most RCTs people are randomised as individuals to treatment. Whilst this method is appropriate for many interventions.
Point estimation, interval estimation
Cluster Randomised Trials. Background In most RCTs people are randomised as individuals to treatment. Whilst this method is appropriate for many interventions.
N = 1, Cross-Over Trials and Balanced Designs. N = 1 Trials Trials can be undertaken with just one participant. If the condition is a chronic relapsing.
Clustered or Multilevel Data
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Chapter 7 Estimation: Single Population
BS704 Class 7 Hypothesis Testing Procedures
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
Statistics for Health Care
Chapter 9 - Lecture 2 Computing the analysis of variance for simple experiments (single factor, unrelated groups experiments).
Pre-randomisation consent (Zelen’s method)
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Sample size calculations
Sample Size Annie Herbert Medical Statistician Research & Development Support Unit Salford Royal Hospitals NHS Foundation Trust
Sample Size Determination
Back to House Prices… Our failure to reject the null hypothesis implies that the housing stock has no effect on prices – Note the phrase “cannot reject”
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
Sample Size Determination Ziad Taib March 7, 2014.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Bootstrapping applied to t-tests
1 Psych 5500/6500 Statistics and Parameters Fall, 2008.
Fall 2012Biostat 5110 (Biostatistics 511) Discussion Section Week 8 C. Jason Liang Medical Biometry I.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
AM Recitation 2/10/11.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Returning to Consumption
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Comparing Two Population Means
Sample size determination Nick Barrowman, PhD Senior Statistician Clinical Research Unit, CHEO Research Institute March 29, 2010.
RDPStatistical Methods in Scientific Research - Lecture 11 Lecture 1 Interpretation of data 1.1 A study in anorexia nervosa 1.2 Testing the difference.
January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Exam Exam starts two weeks from today. Amusing Statistics Use what you know about normal distributions to evaluate this finding: The study, published.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Standard Error and Confidence Intervals Martin Bland Professor of Health Statistics University of York
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 5: Choices for Longitudinal Data Analysis.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Medical Statistics as a science
Issues concerning the interpretation of statistical significance tests.
Data Analysis in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine October.
Sample Size Determination
Compliance Original Study Design Randomised Surgical care Medical care.
Course: Research in Biomedicine and Health III Seminar 5: Critical assessment of evidence.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Chapter 10: The t Test For Two Independent Samples.
Sample Size Determination
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Statistical Core Didactic
Challenges of statistical analysis in surgical trials
Interpreting Basic Statistics
Psych 231: Research Methods in Psychology
Presentation transcript:

Sample size and analytical issues for cluster trials David Torgerson Director, York Trials Unit

Background For any trial we want to make it sufficiently large that if there were a ‘true’ difference between the groups that this difference would be statistically significant. A Type II error occurs when we wrongly conclude there is no difference when there actually is.

Sample size calculations “Most hand calculations diabolically strain human limits, even for the easiest formula,..” (Schulz & Grimes, Lancet 2005)

Sample size formulae Usually need a computer to calculate. However, a simple approximation for a two armed randomised trial with 1:1 ratio for a continuous variable (e.g., blood pressure) is as follows d = effect size (difference/standard deviation):

Example We want to investigate a treatment for back pain. The measure is the Roland and Morris back pain scale with a standard deviation of 4. If we want to detect a 2 point difference how many do we need? 2/4 = 0.5 = Effect size (d). 0.5 x 0.5 = /0.25 = 128 in total for 80% power, 5% significance (use 42 for 90% power). NB using computer software answer = 126

Binary variables For a dichotomous variable (cured not cured) the following is useful (a = average proportion difference).

Example Breast feeding rates are only 50% and we have an educational intervention where we think this will increase to 60%; how many do we need? d 2 = = = 0.01 a = /2 = 0.55 a 2 = = /( ) = /0.040 = 792 Need 792 to have 80% power to show a 10% difference in breast feeding rates if it were present (use 42 for 90% power). NB using computer software the answer is: 774

Approximations The formulae slightly overestimate the true sample size needed. But they can be done on a hand calculator and you can impress the statisticians. What about cluster trials?

Cluster Sample Size Usual sample size estimates assume independence of observations. When people are members of the same cluster (e.g., classroom, GP surgery) they are more related than we would expect to be at random. This is the intra-cluster correlation co- efficient.

ICC The ICC needs to incorporated into the sample size calculations. The formula is as follows: Design effect = 1 + (m – 1) X ICC. Design effect is the size the sample needs to be inflated by. M is the number of people in the cluster.

Sample size example. Let’s assume for an individually randomised trial we need 128 people to detect 0.5 of an effect size with 80% power (2p = 0.05). Now assume we have 24 groups with 7 members. The ICC is 0.05, which is quite high. 1+ (7 – 1) x 0.05 = 1.3, we need to increase the sample size by 30%. Therefore, we will need 166 participants.

What happens if cluster gets bigger? If our cluster size is twice as big (14), things begin to get really interesting. 1+(14-1)x0.05 = What about 30? (1+(30-1)x 0.05 = 2.45 (I.e, 314 participants). Say we randomise a larger cluster, such as a school (n = 500) (1+(500-1) x 0.05 = (ie. 3322).

ICC size ICCs can be large for some things. ICCs for educational outcomes for examples are often around 0.4 to 0.5. A class-based RCT with n = 30 and an ICC of 0.4 would need 1,612 participants or 54 classes with n = 30 in each class.

What makes the ICC large? If the treatment is applied to health care provider (e.g., guidelines will increase ICCs for patients). If cluster relates to outcome variable (e.g., smoking cessation and schools) If members of cluster are expected to influence each other (e.g., households).

AuthorsSourceYears Clustering allowed for in sample size Clustering allowed for in analysis Donner et al. (1990) 16 non-therapeutic intervention trials 1979 – 1989 <20%<50% Simpson et al. (1995) 21 trials from American Journal of Public Health and Preventive Medicine 1990 – %57% Isaakidis and Ioannidis (2003) 51 trials in Sub-Saharan Africa 1973 – 2001 (half post 1995) 20%37% Puffer et al. (2003) 36 trials in British Medical Journal, Lancet, and New England Journal of Medicine 1997 – %92% Eldridge et al. (Clinical Trials 2004) 152 trials in primary health care %59% Reviews of Cluster Trials

Sample Size Problems Cluster Trials Demand Larger Sample Sizes

Conditional ICC The key ICC is the conditional ICC, usually we only have access to estimates of the unconditional ICC. If we know, and can measure, characteristics that cause the ICC, we can adjust for this and lower the ICC. Cook claims that using covariates allows a school based RCT to reduce the number for schools from about 50 to around 22.

Summary of sample size The KEY thing is the size of the cluster. It is nearly always best to get lots of small clusters than a few large ones (e.g, a trial with small hospital wards, GP practices, classrooms will, ceteris paribus, be better than large clusters). BUT if the ICC is tiny may not affect the sample too much.

Cluster Trials: Should I do one? If possible avoid like the plague. BUT although they are difficult to do, properly, they WILL give more robust answers than other methods, (e.g., observational data), when done properly. Is it possible to avoid doing them and do an individually randomised trial?

Contamination An important justification for their use is SUPPOSED ‘contamination’ between participants allocated to the intervention with people allocated to the control.

Spurious Contamination? Trial proposal to cluster randomise practices for a breast feeding study – new mothers might talk to each other! Trial for reducing cardiac risk factors patients again might talk to each other. Trial for removing allergens from homes of asthmatic children.

Contamination Contamination occurs when some of the control patients receive the novel intervention. It is a problem because it reduces the effect size, which increases the risk of a Type II error (concluding there is no effect when there actually is).

Patient level contamination In a trial of counselling adults to reduce their risk of cardiovascular disease general practices were randomised to avoid contamination of control participants by intervention patients. Steptoe. BMJ 1999;319:943.

Accepting Contamination We should accept some contamination and deal with it through individual randomisation and by boosting the sample size rather than going for cluster randomisation Torgerson BMJ 2001;322:355.

Counselling Trial Steptoe et al, wanted to detect a 9% reduction in smoking prevalence with a health promotion intervention. They needed 2000 participants (rather than 1282) because of clustering. If they had randomised 2000 individuals this would have been able to detect a 7% reduction allowing for a 20% CONTAMINATION. Steptoe. BMJ 1999;319:943.

Comparison of Sample Sizes NB: Assuming an ICC of 0.02.

Misplaced contamination The ONLY health study, I’m aware of to date, to directly compare an individually randomised study with a cluster design, showed no evidence of contamination. In an RCT of nurse led cardiovascular risk factor screening some ‘intervention’ clusters had participants allocated to no treatment. NO contamination was observed.

What about dilution bias? If, in the presence of contamination, we use individual allocation we might observe a difference that is statistically significant but is not clinically or economically significant. Dilution has biased the estimate towards the mean.

Dealing with contamination Sometimes there may be substantial contamination and this will dilute the treatment effects, it may, however, still be best to individually randomise if you can measure contamination.

Per-protocol analysis? We cannot adjust for contamination using either per-protocol or on treatment analysis: these popular analytical methods are plainly wrong as they violate the random allocation.

CACE analysis: a solution? If we can measure contamination we can use a statistical approach known as Complier Average Causal Effect (CACE) analysis.

Assumptions of CACE Assumption 1 – if the control group had been offered treatment the same proportion would comply with treatment – this must be true as random allocation ensures that it is. Assumption 2 – merely being offered treatment has no effect on outcomes.

Example CRC screening In a RCT of bowel cancer screening only 53% of people invited for screening attended. ITT = relative risk = BUT what happened to those who were screened? The per protocol RR was 0.62 THIS IS WRONG. What is the true estimate?

Randomisation Observed adherers n = 40,214 (53%) Outcome = 138 = 0.34% Observed non-adherers n = 35,039 (47%) Outcome = 222 = 0.63% Intervention group (n = 75,253) Potential adherers n = 40,078 (53%) Unobserved outcome = 199 = 0.50 % Potential non-adherers n = 34,920 (47%) Unobserved outcome = 221 = 0.63% Control group (n = 74,998)

True differences For ITT the policy of offering screening to the whole community the RR = 0.85, that is a 15% reduction in CRC deaths. For those who accepted screening their RR was 0.68 – a 32% reduction in deaths, NOT a 38% reduction.

Individuals are best Using CACE we can get the best of both worlds retain individual randomisation and get unbiased estimates.

Sample size simulation CACE analysis generally produces wider confidence intervals as there are two sources of variance. Therefore, it is possible that cluster allocation may actually have a lower standard error in some circumstances. To assess whether this is true we undertook a simulation exercise.

Cluster Size ICC = 0.04, Cluster trial Contamination (%) Individual RCT with CACE Contamination effect NB 80% power to detect an effect size of 0.2 Source: Hewitt PhD thesis. Sample size Trade-off between cluster and individual allocation

Sample size CACE performs better than cluster allocation in a range of sample size scenarios Because of the difficulties of doing a cluster trial then an individual trial design with CACE analysis might be best.

Limitations The assumption that being offered treatment has no effect is a weakness as some may appear not to comply but actually access some of the treatment.

Still need to do a cluster trial? If a cluster trial is be undertaken it is important, once the trial has been completed that it is analysed correctly and that the effect of the clustering is accounted for. This has been known since 1940, when Linquist advocated that educational trials should use the class as the natural unit of allocation.

What did Lindquist proposed Each class should be treated both as the unit of allocation and the unit of analysis. Put simply a trial with 20 classes of 30 children is NOT a trial of 600 children it is a trial of 20 classes. The simplest approach is to calculate the mean score of each cluster and do a t-test comparing the two means.

Example A randomised trial of 28 adult literacy classes sought to ascertain whether or not paying participants an incentive to attend would improve adherrence. 14 classes were randomised for students to get an incentive 14 were controls. Students were paid £5 per class attended There were 150 students in total the ICC was See Martin Bland’s website for a worked examplehttp://www-users.york.ac.uk/~mb55/

Two-sample t test with equal variances Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] Group X | Group Y | combined | diff | diff = mean(Group X) - mean(Group Y) t = Ho: diff = 0 degrees of freedom = 150 Ha: diff 0 Pr(T |t|) = Pr(T > t) =

Wrong This analysis is wrong it treats all of the students as individuals and ignores the clustering of outcomes between the two approaches. Let us try Lindquist’s approach to the anlaysis.

Two-sample t test with equal variances Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] | | combined | diff | diff = mean(1) - mean(2) t = Ho: diff = 0 degrees of freedom = 26 Ha: diff 0 Pr(T |t|) = Pr(T > t) =

T-test method This is correct in the sense that it takes clustering into account, however, it does not take chance differences in cluster size into account or powerful predictors of outcome. We have information of cluster size and pre-test literacy score we can use to improve the precision of our estimate (i.e., reduce width of the confidence intervals). We can use summary statistics in a regression approach

Source | SS df MS Number of obs = F( 2, 25) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = sessions | Coef. Std. Err. t P>|t| [95% Conf. Interval] group | midscl | _cons |

Other methods There are other statistical methods, that are more complex, and may yield slightly different results. However, simple methods are approximately correct and easier to do.

Summary Cluster trials need larger sample sizes than individually randomised studies. Clustering needs to be taken into account both in the sample size and the analysis. There are simple methods that can do this.