Challenges of statistical analysis in surgical trials

Slides:



Advertisements
Similar presentations
Systematic Review of Literature Part XIX Analyzing and Presenting Results.
Advertisements

Exploring uncertainty in cost effectiveness analysis NICE International and HITAP copyright © 2013 Francis Ruiz NICE International (acknowledgements to:
天 津 医 科 大 学天 津 医 科 大 学 Clinical trail. 天 津 医 科 大 学天 津 医 科 大 学 1.Historical Background 1537: Treatment of battle wounds: 1741: Treatment of Scurvy 1948:
ODAC May 3, Subgroup Analyses in Clinical Trials Stephen L George, PhD Department of Biostatistics and Bioinformatics Duke University Medical Center.
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Common Problems in Writing Statistical Plan of Clinical Trial Protocol Liying XU CCTER CUHK.
Clustered or Multilevel Data
Pragmatic Randomised Trials. Background Many clinical trials take place in artificial conditions that do not represent NORMAL clinical practice. Often.
Making all research results publically available: the cry of systematic reviewers.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
1 ICEBOH Split-mouth studies and systematic reviews Ian Needleman 1 & Helen Worthington 2 1 Unit of Periodontology UCL Eastman Dental Institute International.
Research Skills Basic understanding of P values and Confidence limits CHE Level 5 March 2014 Sian Moss.
Consumer behavior studies1 CONSUMER BEHAVIOR STUDIES STATISTICAL ISSUES Ralph B. D’Agostino, Sr. Boston University Harvard Clinical Research Institute.
HSRP 734: Advanced Statistical Methods July 31, 2008.
RCTs and instrumental variables Anna Vignoles University of Cambridge.
What is a non-inferiority trial, and what particular challenges do such trials present? Andrew Nunn MRC Clinical Trials Unit 20th February 2012.
Lecture 9: Analysis of intervention studies Randomized trial - categorical outcome Measures of risk: –incidence rate of an adverse event (death, etc) It.
Standardization of Rates. Rates of Disease Are the basic measure of disease occurrence because they most clearly express probability or risk of disease.
Compliance Original Study Design Randomised Surgical care Medical care.
Sample Journal Club Your Name Here.
for Overall Prognosis Workshop Cochrane Colloquium, Seoul
What size of trial do I need?
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Present: Disease Past: Exposure
CLINICAL PROTOCOL DEVELOPMENT
26134 Business Statistics Week 5 Tutorial
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Unit 5: Hypothesis Testing
Reducing bias in randomised controlled trials involving therapists:
Treatment allocation bias
Donald E. Cutlip, MD Beth Israel Deaconess Medical Center
CHAPTER 4 Designing Studies
Randomized Trials: A Brief Overview
Heterogeneity and sources of bias
Critical Reading of Clinical Study Results
S1316 analysis details Garnet Anderson Katie Arnold
Unit 6 - Comparing Two Populations or Groups
11/20/2018 Study Types.
CHAPTER 4 Designing Studies
Variables and Measurement (2.1)
Common Problems in Writing Statistical Plan of Clinical Trial Protocol
CHAPTER 4 Designing Studies
Lesson Using Studies Wisely.
Significance Tests: The Basics
Statistical Reasoning December 8, 2015 Chapter 6.2
Chapter 4: Designing Studies
Statistical Data Analysis
CHAPTER 4 Designing Studies
EAST GRADE course 2019 Introduction to Meta-Analysis
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Psych 231: Research Methods in Psychology
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Challenges in Evaluating Screening & Prevention Interventions
Chapter 4: Designing Studies
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Introduction to Systematic Reviews
Presentation transcript:

Challenges of statistical analysis in surgical trials Professor Catherine Hewitt Senior statistician catherine.hewitt@york.ac.uk BOA Orthopaedic Surgery Research Centre

Non-compliance/crossovers

Perfect trial participants Everyone asked is willing to take part Everyone recruited is randomised and receives their allocated intervention Everyone randomised completes their allocated intervention and provides outcome data

What happens in practice? Some people are not willing to take part Everyone recruited is randomised and some receive their allocated intervention Some participants complete their allocated intervention and some provide outcome data

Problems during analysis Even in well designed and well conducted RCTs non-compliance is almost inevitable Creates problems at analysis stage How should we deal with participants who do not comply with their allocated treatment in the analysis?

Non-compliance Randomisation Follow-up ? ? Analysis

Intention To Treat Randomisation Follow-up Analysis

Advantages of ITT Why use ITT? Maintains baseline comparability Completely objective It mirrors what would happen in practice ITT answers an important question (pragmatic) E.g. “What would be the reduction in CRC mortality if we offered population screening?”

Disadvantage of ITT In presence of non-compliance ITT does not answer the question: What would be the impact if people attended screening? For example, using ITT: On average, screening would reduce your risk of CRC mortality by 20%, Whether you take up the offer of screening or not

How do researchers answer the question: what would be the impact if people attended screening?

Per protocol Randomisation Follow-up Analysis

Drawbacks of PP Drawbacks of PP: Loss of statistical power Answer? Due to a sample size reduction Answer?

On Treatment Randomisation Follow-up Analysis

Example: On treatment

Main problems For both PP and OT approaches to be valid participants who do not comply have to be a random sample of ALL participants who were offered treatment This rarely true in research involving humans Participants self-select themselves into groups: Breaks initial randomisation Violates basis for statistical inference Different characteristics

What’s the alternative to using PP or OT? CACE approach

CACE Randomisation Follow-up Analysis

CACE General problem: If this is possible, then Need to identify the compliance status of the control group If this is possible, then Compare the compliant group in the intervention arm with the potential compliant group in the control arm

CACE – assumptions We need some measure of the compliance status in the intervention arm If the control group offered the treatment same proportion would not comply Should be true due to randomisation Being offered the treatment has no effect on outcomes This needs more thought!

Example: Screening trial Trial of faecal-occult-blood screening for the prevention of colorectal cancer 53% of the people invited for screening attended In publication: ITT used to measure the average benefit of screening in invitees PP used to attempt to measure the average benefit of screening in attendees

CACE – what is observed? Status Intervention (n = 75,253) Control (n = 74,998) Deaths ÷ n ER % Compliers (53%) 138 ÷ 40,214 0.34 Non-compliers (47%) 222 ÷ 35,039 0.63 Total outcome 360 ÷ 75,253 0.48 420 ÷ 74,998 0.56

CACE – assumption 1 Status Intervention (n = 75,253) Control (n = 74,998) Deaths ÷ n ER % Compliers (53%) 138 ÷ 40,214 0.34 Non-compliers (47%) 222 ÷ 35,039 0.63 Total outcome 360 ÷ 75,253 0.48 420 ÷ 74,998 0.56 Assuming same proportion of non-comps in control 47% x 74,998 = 35,249

Offer of treatment no effect on outcome CACE – assumption 2 Status Intervention (n = 75,253) Control (n = 74,998) Deaths ÷ n ER % Compliers (53%) 138 ÷ 40,214 0.34 Non-compliers (47%) 222 ÷ 35,039 0.63 ÷ 35,249 Total outcome 360 ÷ 75,253 0.48 420 ÷ 74,998 0.56 Offer of treatment no effect on outcome 35,249 x 0.63% = 222 =0.63%

CACE – what is estimated? Status Intervention (n = 75,253) Control (n = 74,998) Deaths ÷ n ER % Compliers (53%) 138 ÷ 40,214 0.34 Non-compliers (47%) 222 ÷ 35,039 0.63 222 ÷ 35,249 Total outcome 360 ÷ 75,253 0.48 420 ÷ 74,998 0.56 420 – 222 = 198 & 74,998 – 35,249 = 39,749 198 ÷ 39,749 = 0.50

CACE – complete table Status Intervention (n = 75,253) Control (n = 74,998) n ER % Compliers (53%) 138 ÷ 40,214 0.34 198 ÷ 39,749 0.50 Non-compliers (47%) 222 ÷ 35,039 0.63 222 ÷ 35,249 Total outcome 360 ÷ 75,253 0.48 420 ÷ 74,998 0.56

Example: CACE results Compare the compliers in the intervention arm with the potential compliers in the control arm That is, a 31% reduction in CRC mortality

Example: conclusions Offering screening to the whole population ITT shows a 15% reduction in CRC deaths We know this is conservative For those who accepted screening CACE shows a 31% reduction NOT a 39% reduction as suggested by PP NOR a 71% reduction as suggested by OT

GROUPING IN INDIVIDUALLY RANDOMISED TRIALS

Grouping in iRCTs Methods for the design and analysis of trials where participants are allocated to treatment clusters are now well established Clustering also happens in trials where participants are allocated individually, when the intervention is provided by individual operators, such as surgeons These operators form a hidden sample, whose effect is usually ignored

Grouping in iRCTs Katherine J Lee, and Simon G Thompson BMJ 2005;330:142-144 ©2005 by British Medical Journal Publishing Group

Grouping in iRCTs In a review of 42 trials randomising individuals published in the BMJ during 2002 they found: 38/42 (90%) had some form of clustering 17/42 (40%) having clustering by health professional imposed by design 6/38 (16%) mentioned clustering as a potential issue 4 of 6 allowed for clustering in analysis 3 did not recognise multiple sources of clustering

Grouping in iRCTs There is a new awareness of potential operator effects Including sample size adjustments using an ICC as if for a cluster randomised trial Groupings can occur in all trial arms or only in one of them There are different ways to analyse these trials depending on the number of groupings and whether it is in all trial arms

Example 1: Grouping in iRCTs Orthopaedic surgeons were eligible to take part if they performed knee replacements Surgeons differed in which comparisons they would contribute 116 surgeons in 34 UK centres Included surgeon as a random effect in the analysis to account for potential surgeon effects

Grouping in one trial arm Under the scenario of clustering in one trial arm but not the other How do we analyse that? One simple option would be to allocate participants to a surgeon irrespective of whether they are in the intervention or not If they are in the no surgery group, then they do not know about this nor does the surgeon Each surgeon then has their own cluster of controls and surgeon patients

Grouping in one trial arm For each surgeon, we will then have a mean outcome score for surgery and a mean for controls We could do a paired t test on these summary statistics as we are aggregating by surgeon We could also adjust for baseline scores using analysis of covariance with surgeon as a random effect More simply (but less powerfully) we could use the means of change in scores from baseline for surgery patients and controls in a paired t analysis

Example 2: Grouping in iRCTs

Grouping in iRCTs Problem with this approach is that more complex clustering can not be accounted for Covariates need to be aggregated to the “surgeon” level To account for more complex scenarios then other options are available

Example 3: Grouping in iRCTs In order to quantify the impact of grouping, case manager identifiers included as random effect in the linear mixed model, nested within treatment arm. Participants in usual care were coded as their own case managers Covariance structure was estimated separately for each treatment arm in order to account for the differences in variability for the random effect

Example 3: Grouping in iRCTs Group differences remained significant Accounting for clustering by case manager reduced size of treatment effect slightly

Subgroup analyses

Subgroup analyses In a trial we might want to know if the observed treatment difference is the same across different groups E.g. Young and old; those with a baseline preference for one of the treatments Viewed as the examination of heterogeneity of an observed treatment effect across subsets of individuals The statistical term for heterogeneity of this type is interaction

Subgroup analyses While it may be of interest to look for an interaction, this is not always wise There are numerous subgroups which could be compared Splitting by socio-demographic or clinical Many ways to create groups e.g. age Examination of many subgroups is likely to find some spurious significant interactions We cannot tell if a specific interaction is real or spurious

Subgroup analyses By contrast, when there is a specific prior suspicion of an interaction it is perfectly reasonable to examine it Results of tests for interactions are likely to be convincing only if they were specified at the start of the study In any study that presents subgroup analyses it is important to specify when and why the subgroups were chosen More recently the prior direction of effect too

Subgroup analyses Sometimes authors will carry out significance tests in the different subgroups They will then conclude that the difference exists only or mainly in the subgroups where a significant difference was found This is incorrect!

Subgroup analyses A statement such as P=0.57 does not mean there is no difference Merely we have found no evidence of a difference A P value is a composite Depend on the size of an effect but also on how precisely the effect has been estimated Differences in P values can arise because of differences in effect sizes or differences in standard errors or a combination of the two The correct approach is to include an interaction in the analysis

Example: ProFHER Baseline preference Surgery (n=125) Not surgery (125) 41 (32.8) 31 (24.8) Not surgery 32 (25.6) 28 (22.4) No preference 52 (41.6) 63 (50.4) Missing 0 (0.0) 3 (2.4) OSS scores improved for all groups over time Patients who expressed a preference for surgery generally reported the lowest shoulder functioning Patients who expressed a preference for no surgery reported the highest shoulder functioning scores

Example: ProFHER This model reduced the magnitude of the treatment effect From an overall group difference of 0.75 to 0.50 points Reflecting additional variability in OSS scores explained by patient preference The interaction was not statistically significant (F=0.29, p=.751)

Example: ProFHER N=72 N=60 N=115

Quick summary If you are interested in estimating treatment effectiveness among those who comply consider using CACE approach Consider whether there may be hidden clustering in your trial and make appropriate plans to account for this in the analysis and possibly in the sample size calculation If you want to explore whether treatment effects vary across subgroups, then define them in advance, keep the numbers low, pre-specify direction of effect and use an interaction in the analysis

Any questions?