Multiple Endpoint Testing in Clinical Trials – Some Issues & Considerations Mohammad Huque, Ph.D. Division of Biometrics III/Office of Biostatistics/OPaSS/CDER/FDA.

Slides:



Advertisements
Similar presentations
Labeling claims for patient- reported outcomes (A regulatory perspective) FDA/Industry Workshop Washington, DC September 16, 2005 Lisa A. Kammerman, Ph.D.
Advertisements

Gatekeeping Testing Strategies in Clinical Trials Alex Dmitrienko, Ph.D. Eli Lilly and Company FDA/Industry Statistics Workshop September 2004.
Chapter 7 Hypothesis Testing
Interpreting of Patient-Reported Outcomes
A Flexible Two Stage Design in Active Control Non-inferiority Trials Gang Chen, Yong-Cheng Wang, and George Chi † Division of Biometrics I, CDER, FDA Qing.
Data Monitoring Models and Adaptive Designs: Some Regulatory Experiences Sue-Jane Wang, Ph.D. Associate Director for Adaptive Design and Pharmacogenomics,
By Trusha Patel and Sirisha Davuluri. “An efficient method for accommodating potentially underpowered primary endpoints” ◦ By Jianjun (David) Li and Devan.
Design and analysis of clinical trials MULTIPLE COMPARISONS.
Statistical Decision Making
ODAC May 3, Subgroup Analyses in Clinical Trials Stephen L George, PhD Department of Biostatistics and Bioinformatics Duke University Medical Center.
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
8-2 Basics of Hypothesis Testing
Sample Size Determination
Adaptive Designs for Clinical Trials
Sample Size Determination Ziad Taib March 7, 2014.
Qian H. Li, Lawrence Yu, Donald Schuirmann, Stella Machado, Yi Tsong
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Lecture Slides Elementary Statistics Twelfth Edition
Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 O Guilbaud, FMS+Cramér Society, AZ-Södertälje, Alpha Recycling in Confirmatory.
Background to Adaptive Design Nigel Stallard Professor of Medical Statistics Director of Health Sciences Research Institute Warwick Medical School
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Consumer behavior studies1 CONSUMER BEHAVIOR STUDIES STATISTICAL ISSUES Ralph B. D’Agostino, Sr. Boston University Harvard Clinical Research Institute.
Multiple Testing in Microarray Data Analysis Mi-Ok Kim.
Decision-Theoretic Views on Switching Between Superiority and Non-Inferiority Testing. Peter Westfall Director, Center for Advanced Analytics and Business.
1 Statistics in Drug Development Mark Rothmann, Ph. D.* Division of Biometrics I Food and Drug Administration * The views expressed here are those of the.
All-or-None procedure: An outline Nanayaw Gyadu-Ankama Shoubhik Mondal Steven Cheng.
DSBS Discussion: Multiple Testing 28 May 2009 Discussion on Multiple Testing Prepared and presented by Lars Endahl.
Bayesian Approach For Clinical Trials Mark Chang, Ph.D. Executive Director Biostatistics and Data management AMAG Pharmaceuticals Inc.
1 Study Design Issues and Considerations in HUS Trials Yan Wang, Ph.D. Statistical Reviewer Division of Biometrics IV OB/OTS/CDER/FDA April 12, 2007.
Power & Sample Size Dr. Andrea Benedetti. Plan  Review of hypothesis testing  Power and sample size Basic concepts Formulae for common study designs.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
THE ROLE OF SUBGROUPS IN CLINICAL TRIALS Ralph B. D’Agostino, Sr., PhD Boston University September 13, 2005.
Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.
Guidelines for Multiple Testing in Impact Evaluations Peter Z. Schochet June 2008.
1 השוואות מרובות מדדי טעות, עוצמה, רווחי סמך סימולטניים ד"ר מרינה בוגומולוב מבוסס על ההרצאות של פרופ' יואב בנימיני ופרופ' מלכה גורפיין.
Chapter 10: The t Test For Two Independent Samples.
A Parametrized Strategy of Gatekeeping, Keeping Untouched the Probability of Having at Least One Significant Result Analysis of Primary and Secondary Endpoints.
1 Basics of Inferential Statistics Mark A. Weaver, PhD Family Health International Office of AIDS Research, NIH ICSSC, FHI Lucknow, India, March 2010.
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Methods of Presenting and Interpreting Information Class 9.
Do antidepressants really work?
Logic of Hypothesis Testing
The Importance of Adequately Powered Studies
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Statistical Approaches to Support Device Innovation- FDA View
Confidence Intervals and p-values
Chapter 6 Making Sense of Statistical Significance: Decision Errors, Effect Size and Statistical Power Part 1: Sept. 24, 2013.
Deputy Director, Division of Biostatistics No Conflict of Interest
Understanding Results
Chapter 8: Inference for Proportions
Hypothesis Testing: Hypotheses
Critical Reading of Clinical Study Results
Overview and Basics of Hypothesis Testing
Crucial Statistical Caveats for Percutaneous Valve Trials
Division of Biometrics III/Office of Biostatistics/OPaSS/CDER/FDA
Sequence comparison: Multiple testing correction
Section 10.2 Tests of Significance
Aiying Chen, Scott Patterson, Fabrice Bailleux and Ehab Bassily
One Way ANOVAs One Way ANOVAs
Sequence comparison: Multiple testing correction
OMGT LECTURE 10: Elements of Hypothesis Testing
Chapter 8 Making Sense of Statistical Significance: Effect Size, Decision Errors, and Statistical Power.
Covering Principle to Address Multiplicity in Hypothesis Testing
David Edwards and Jesper Madsen Novo Nordisk
1 Chapter 8: Introduction to Hypothesis Testing. 2 Hypothesis Testing The general goal of a hypothesis test is to rule out chance (sampling error) as.
Optimal Basket Designs for Efficacy Screening with Cherry-Picking
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
Medical Statistics Exam Technique and Coaching, Part 2 Richard Kay Statistical Consultant RK Statistics Ltd 22/09/2019.
Incorporating the sample correlation between two test statistics to adjust the critical points for the control of type-1 error Dror Rom and Jaclyn McTague.
Presentation transcript:

Multiple Endpoint Testing in Clinical Trials – Some Issues & Considerations Mohammad Huque, Ph.D. Division of Biometrics III/Office of Biostatistics/OPaSS/CDER/FDA 2005 Industry/FDA Workshop, Washington. DC 9/16/2018

Disclaimer Views expressed here is that of the presenter and not necessarily of the FDA This presentation reflects my views and not necessarily of the FDA 9/16/2018

Sources of Multiplicity in Clinical Trials Multiple endpoints  Multiple comparisons Interim analysis Subgroup analysis Selection of covariates in an analysis model Others Sources of multiplicity in a clinical trial could be several to many. It is not only because of multiple endpoints. But, in this presentation, I will focus on multiple endpoints. 9/16/2018

OUTLINE Type I error concept and type I error control when testing for multiple endpoints. Complexities? Multiple endpoints are often triaged into primary, secondary and other types of endpoints. Reasons for doing so and how these endpoints are tested? Sequential testing of endpoints - no alpha adjustment is needed. Issues and fixes? Some trials require that 2 or more endpoints must show effects for clinical evidence. Reasons for doing so and consequences? Composite endpoints. Underlying concepts and complexities? Type I error and its control when testing for multiple endpoints is not straightforward. We will see what the complexities are and they are handled A clinical generally includes many endpoints. These multiple endpoints are often triaged into primary, secondary and exploratory endpoints. I will share with you the reasons for doing so and how these endpoints are tested A attractive area of multiple endpoint testing is the sequential testing of endpoints, as in this approach no adjustment of type I error is needed. But there are issues. We will see what the issues are and what are the fixes. Some trials require that 2 or more endpoints must show effects for clinical evidence. I share with you the reasons for doing so and the consequences. 9/16/2018

Trial has a single endpoint to test – type I and type II errors Concludes Treatment Not beneficial Concludes Treatment beneficial Truly Not beneficial H0 Correct Decision Type I error Truly beneficial Ha Type II Conduct a test for claiming that a new treatment is beneficial α = Probability of the Type I error β = Probability of the Type II error (power = 1- β ) When the trial has a single endpoint to test then the concept of the type I error and its control is straightforward. Treatment is truly not beneficial, but the statistical test concludes that there is a beneficial effect of treatment. Then this is type I error Probability of the type I error is calculated from the sampling distribution of the test statistic under the null hypothesis. 9/16/2018

Trial has multiple endpoints to test Consider a two arm superiority trial, a test treatment versus a control Endpoints: y1, y2, …, yK Multiple Null Hypotheses: F = {H01, H02, …, H0K} H0j: δj = 0, Haj δj ≠ 0, j =1, …, K 9/16/2018

Trial has multiple endpoints to test Two scenarios: (A) In the family F all are true null hypotheses (B) Some may be true null hypotheses, and some may be false null hypotheses, but their true state are unknown. 9/16/2018

Testing under scenario (A) Scenario (A) and the trial has 3 endpoints y1, y2, and y3 A test procedure can give type I error in multiple ways: (-, -, +), (-, +, -), (+, -, -), (-, +, +), (+, -, +), (+, +, -), (+, +, +). These are chance events because of multiplicity of tests when in fact there is no treatment benefit for any of the endpoint. α0 = Pr {of at least one of these chance events | test procedure, H0}, H0= ∩H0j 9/16/2018

Testing under scenario (A) α0 is called global alpha (or overall alpha). Also, called the familywise type I error rate (FWER) under H0, where H0= ∩H0j is the global null hypothesis. A test procedure for testing H0 is called a global test procedure 9/16/2018

Global Test procedures Useful for non-specific global claims. Difficulty in interpreting the result. Type I error rate can remain inflated for specific claims. Examples: Simes test, O’Brien’s OLS/GLS tests, Hotelling’s T2 test (Sankoh et al, DIA Jr.,1999) 9/16/2018

Testing under scenario (B) Some of the null hypotheses F = {H01, H02, …, H0K} may be true null hypotheses and some be false, but its not known which ones are which. Question: Is there a treatment effect specifically for the endpoint y1? For answering this question, the null hypothesis is not a single null hypothesis like a global null hypothesis, rather it is a class of null hypothesis configurations in which there is no treatment effect for y1, and all possible scenarios for treatment effects for the remaining endpoints y2, …, yK 9/16/2018

Testing under scenario (B) Consider 3 endpoints y1, y2, and y3. Question: Is there a treatment effect specifically for the endpoint y1? Null hypothesis configurations F1 for testing for treatment effect specifically for the endpoint y1: F1 = { (δ1 = 0, δ2 = 0, δ3 = 0), (δ1 = 0, δ2 = 0, δ3 ≠ 0), (δ1 = 0, δ2 ≠ 0, δ3 = 0), (δ1 = 0, δ2 ≠ 0, δ3 ≠ 0)}. 9/16/2018

Control of FWER (two types) Weak control Control FWER only under the global null configuration Strong control Control FWER under all null configurations Specificity property -- useful for making specific claims. Examples of methods: Bonferroni, Holm, Hochberg*, closed statistical tests, and other methods *with some caveats 9/16/2018

Triaging of multiple endpoints into meaningful families by trial objectives Two important families 1) Prospectively defined 2) FWE controlled Primary endpoints Secondary endpoints Exploratory endpoints (usually not prospectively defined) Adding more endpoints to a clinical trial should not be an issue. Adding new endpoints to a trial is less expensive than doing a new trial with the new endpoint. Additional endpoints in the trial may add to the knowledge of understanding of the drug and disease process. However, in managing multiplicity and controlling the Type I error rate, all these multiple endpoints must be triaged into meaningful families according to trial objectives. Primary endpoints are primary focus of the trial. Their results determine main benefits of he clinical trial’s intervention. Secondary endpoints by themselves generally not sufficient for characterizing treatment benefit. Generally, tested for statistical significance for extended indication and labeling after the primary objectives of the trial are met. 9/16/2018

Statistical methods Prospective alpha allocation schemes (PAAS) – Moyé (2000) Spend alpha1 for the primary endpoints and the remaining alpha for the secondary endpoints - FWER is controlled 9/16/2018

Statistical methods Parallel gatekeeping strategies for clinical trials – Dmitrienko-Offen-Westfall (SM 2003) Chen-Luo-Capizzi (SM 2005) Allows testing of secondary endpoints when at least one of the primary endpoints exhibits a statistically significant result These methods controls FWER for both the primary and secondary endpoints in the strong sense. 9/16/2018

Sequential testing of multiple endpoints A fixed sequence approach allows testing of each of the k null hypotheses at the same significance level of α without any adjustment, as long as the null hypotheses to be tested are hierarchically ordered and are tested in a pre-defined sequential order. Hierarchical ordering of null hypotheses can be achieved, for example, by their clinical relevance. 9/16/2018

Sequential testing of multiple endpoints For this fixed-sequence approach, however, there are two caveats: Pre-specification of the testing sequence No further testing once the sequence breaks Problem: when the sequence breaks and the next p-value is extreme (e.g., p1= 0.50, p2= 0.001) 9/16/2018

A flexible fixed-sequence approach Test H(02) at Level α H(01) is rejected Test H(01) at Level α1 Test H(02) at Level γ H(01) is rejected e.g., α1 = 0.04, α = 0.05, γ = 0.0104, ρ = 0 (γ = 0.0214, ρ = 0.8 ) 9/16/2018

Example: flexible fixed-sequence method 9/16/2018

Some trials require that 2 or more endpoints must show effects Examples: Alzheimer trial (win on ADAS-Cognitive Sub-scale) and (win on Clinician’s Interview Based Impression of Change) Many other examples (PhRMA draft paper) Main Reason: Clinical expectations of the desired clinical benefit (concept beyond statistics) 9/16/2018

Adjustments in the Type I error rate - Some wining criterion require adjustments and some don’t  Adjustment by Sidak’s method on accounting for correlation Note: Which method to use depends on on the clinical decision rule set in advance 9/16/2018

Power Comparison Case of K=2 endpoints: 9/16/2018

Loss in Power when win in all endpoints K=# of endpoints 9/16/2018

Alpha = 0.025 (1-sided), Power = 0.90 Sample Size Increase (1) When Win in All K Endpoints Compared to Single Endpoint Case Alpha = 0.025 (1-sided), Power = 0.90 Correlation K = 2 K=3 K=4 0.0 22.8% 35.9% 45.0% 0.3 21.1 33.1 41.2 0.4 20.2 31.7 39.7 0.5 19.1 29.8 37.3 0.6 17.7 27.5 34.4 0.7 15.9 24.6 30.7 0.8 13.5 20.8 25.8 0.9 10.0 15.3 18.9 (1) Calculations using mutivariate normal distribution of the test statistics comparing active treatment versus placebo for a 2-arm trial, assuming same delta/sigma for all K endpoints 9/16/2018

Composite Endpoints Two types - Total score or index based on a rating scale, e.g., HAMD totals in depression trials, ACR20/ACR70 in rheumatoid arthritis trials Issues: validity and reliability 9/16/2018

Composite Endpoints Another Type Composite endpoint is defined in terms of the time to the first “event”, where event is one of several possible event types LIFE study: Composite of cardiovascular death, stroke and myocardial infraction events. 9/16/2018

Composite Endpoint Issues Life Study The Composite endpoint was significantly positive. However, analysis of the first events by individual components and sub-composite endpoints indicate overall composite result mainly due to reduction in fatal and non-fatal stroke. Issue: How to interpret composite endpoint results? How to characterize benefits in terms of the component endpoints? 9/16/2018

Extent of multiplicity adjustments between endpoints correlation high Small adjustments Practically no adjustments Large adjustments Good case for combining endpoints low high low Homogeneity of treatment effects across endpoints 9/16/2018

Concluding Remarks For endpoint specific claims – strong control of the type I error is needed Parallel gate-keeping strategies can be used for the primary and secondary endpoint claims Flexible sequential test procedure can be used to gain power of the test There is a scientific basis when a reasonable clinical decision rule asks for statistically significant efficacy results in more than 1 endpoint – issue of loss of power? When 4 or more endpoints included as primary (e.g., arthritis trials), and homogeneity of treatment effects acress endpoints is expected - a composite or responder endpoint approach will be effective. 9/16/2018