Design and analysis of clinical trials MULTIPLE COMPARISONS.

Slides:



Advertisements
Similar presentations
Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
Advertisements

1 Health Warning! All may not be what it seems! These examples demonstrate both the importance of graphing data before analysing it and the effect of outliers.
1 QOL in oncology clinical trials: Now that we have the data what do we do?
LSU-HSC School of Public Health Biostatistics 1 Statistical Core Didactic Introduction to Biostatistics Donald E. Mercante, PhD.
1 Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Type I and Type II Errors One-Tailed Tests About a Population Mean: Large-Sample.
Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and Alternative Hypotheses Type I and Type II Errors Type I and Type II Errors.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
By Trusha Patel and Sirisha Davuluri. “An efficient method for accommodating potentially underpowered primary endpoints” ◦ By Jianjun (David) Li and Devan.
Basic Design Consideration. Previous Lecture Definition of a clinical trial The drug development process How different aspects of the effects of a drug.
ODAC May 3, Subgroup Analyses in Clinical Trials Stephen L George, PhD Department of Biostatistics and Bioinformatics Duke University Medical Center.
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
AP Statistics – Chapter 9 Test Review
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Introduction to Hypothesis Testing
ANOVA Determining Which Means Differ in Single Factor Models Determining Which Means Differ in Single Factor Models.
Pengujian Hipotesis Nilai Tengah Pertemuan 19 Matakuliah: I0134/Metode Statistika Tahun: 2007.
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
BCOR 1020 Business Statistics
5-3 Inference on the Means of Two Populations, Variances Unknown
Adaptive Designs for Clinical Trials
Sample Size Determination Ziad Taib March 7, 2014.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
CE-1 CRESTOR ® Clinical Development Efficacy James W. Blasetto, MD, MPH Senior Director, Clinical Research.
Confidence Intervals and Hypothesis Testing - II
1 Dr. Jerrell T. Stracener EMIS 7370 STAT 5340 Probability and Statistics for Scientists and Engineers Department of Engineering Management, Information.
Hypothesis Testing.
Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
STT 315 This lecture is based on Chapter 6. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing.
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Proprietary and Confidential © AstraZeneca 2009 FOR INTERNAL USE ONLY 1 O Guilbaud, FMS+Cramér Society, AZ-Södertälje, Alpha Recycling in Confirmatory.
Analysis of Clinical Trials with Multiple Outcomes Changchun Xie, PhD Assistant Professor of Biostatistics Division of Biostatistics and Bioinformatics.
ANOVA Greg C Elvers.
NY, 14 December 2007 Emmanuel Lesaffre Biostatistical Centre, K.U.Leuven, Leuven, Belgium Dept of Biostatistics, Erasmus MC, Rotterdam, the Netherlands.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Understanding the Variability of Your Data: Dependent Variable Two "Sources" of Variability in DV (Response Variable) –Independent (Predictor/Explanatory)
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
ANOVA (Analysis of Variance) by Aziza Munir
Consumer behavior studies1 CONSUMER BEHAVIOR STUDIES STATISTICAL ISSUES Ralph B. D’Agostino, Sr. Boston University Harvard Clinical Research Institute.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
1 Statistics in Drug Development Mark Rothmann, Ph. D.* Division of Biometrics I Food and Drug Administration * The views expressed here are those of the.
May 2004 Prof. Himayatullah 1 Basic Econometrics Chapter 5: TWO-VARIABLE REGRESSION: Interval Estimation and Hypothesis Testing.
DSBS Discussion: Multiple Testing 28 May 2009 Discussion on Multiple Testing Prepared and presented by Lars Endahl.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
Introduction to sample size and power calculations Afshin Ostovar Bushehr University of Medical Sciences.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
One-way ANOVA: - Comparing the means IPS chapter 12.2 © 2006 W.H. Freeman and Company.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Issues concerning the interpretation of statistical significance tests.
3-1 MGMG 522 : Session #3 Hypothesis Testing (Ch. 5)
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Power & Sample Size Dr. Andrea Benedetti. Plan  Review of hypothesis testing  Power and sample size Basic concepts Formulae for common study designs.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
1 BLA Sipuleucel-T (APC-8015) FDA Statistical Review and Findings Bo-Guang Zhen, PhD Statistical Reviewer, OBE, CBER March 29, 2007 Cellular, Tissue.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Guidelines for Multiple Testing in Impact Evaluations Peter Z. Schochet June 2008.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
A Parametrized Strategy of Gatekeeping, Keeping Untouched the Probability of Having at Least One Significant Result Analysis of Primary and Secondary Endpoints.
Hypothesis Testing Chapter Hypothesis Testing  Developing Null and Alternative Hypotheses  Type I and Type II Errors  One-Tailed Tests About.
Double-blind, randomized trial in 4,162 patients with Acute Coronary Syndrome
Chapter 9 -Hypothesis Testing
Statistical Core Didactic
Chapter 8: Inference for Proportions
Multiple Endpoint Testing in Clinical Trials – Some Issues & Considerations Mohammad Huque, Ph.D. Division of Biometrics III/Office of Biostatistics/OPaSS/CDER/FDA.
Crucial Statistical Caveats for Percutaneous Valve Trials
Aiying Chen, Scott Patterson, Fabrice Bailleux and Ehab Bassily
Presentation transcript:

Design and analysis of clinical trials MULTIPLE COMPARISONS

Contents Error Rates Methods for Constructing MTPs - Union-Intersection (At Least One) Method - Intersection-Union (All or None) Method - Closure Method Common p-Value Based MTPs - Holm’s Procedure - Simes’ Test - Hochberg’s Procedure MTPs for a priori Ordered Hypotheses - Fixed Sequence Procedure - Fallback Procedure Examples 2 Author | 00 Month YearSet area descriptor | Sub level 1

When performing MANY independent tests, we shall expect to have at least one significant result even though no difference exists. Number of testsProbability P(at least one false positive result) = 1 - P(zero false positive results) = 1 – (1 -.05) ^ k Probability of at least one false significant result Doing a lot of tests will give us significant results just by chance. We want to find methods to control this risk (error rate). The same problem arises when considering many confidence intervals simultaneously. The multiplicity problem What is the issue?

Sources of multiplicity in clinical trials 4 Author | 00 Month YearSet area descriptor | Sub level 1 8

Regulatory requirements EMEA/CPMP’s (2002) Points to Consider on Multiplicity Issues …: (from Section 2.5) As a general rule it can be stated that control of the family-wise type-I error in the strong sense (i.e. application of closed test procedures) is a minimal prerequisite for confirmatory claims. (from Section 7) It is therefore necessary that the statistical procedures planned to deal with, or to avoid, multiplicity are fully detailed in the study protocol or in the statistical analysis plan to allow an assessment of their suitability and appropriateness. Additional claims on statistical significant and clinically relevant findings based on secondary variables or on subgroups are possible only after the primary objective of the clinical trial has been achieved, and if the respective questions were pre-specified, and were part of an appropriately planned statistical analysis strategy

General rules A. Multiple treatments Arrange the treatment comparisons in order of importance Decide which comparisons should belong to the confirmatory analysis Decide a way to control the error of false significances for these comparisons

General rules B. Multiple variables Find out which variables are needed to answer the primary objective of the study Look for possibilities to combine the variables, e.g. composite endpoints, global measures (QoL, AUC etc.) Decide a way to control the error of false significances for these variables

General rules C. Multiple time points Find out which time points are the most relevant for the treatment comparison If no single time point is most important, look for possibilities to combine the time points, e.g. average over time (AUC etc.) Decide a way to control the error of false significances if more than one important time point

General rules D. Interim analyses Decide if the study should stop for safety and/or efficacy reasons Decide the number of interim analyses To control the error of a false significance (stopping the study), decide how to spend the total significance level on the interim and final analysis

General rules E. Subgroup analyses Subgroup analyses are usually not part of a confirmatory analysis Restrict the number of subgroup analyses Use only subgroups of sufficient size All post-hoc subgroup analyses are considered exploratory

What’s multiplicity got to do with me? “I (am a Bayesian so I) do not agree with the principles behind adjustment” -OK, but regulatory authorities will (may) take a different view “I work in oncology where we generally use all patients, have 1 treatment comparison, 1 primary endpoint (Time to event) and a small number of secondary endpoints” -Still multiplicity issues around secondary endpoints -Not always this simple: 2 populations e.g. all, biomarker positive group More than 1 treatment comparison eg experimental v control, experimental + control vs. control

What’s multiplicity got to do with me? “I work in early phase trials” -Phase II used for internal decision making so we do not have to take account of the multiplicity (trials would become too big if we did) Agree, but issues of multiplicity still apply -AstraZeneca takes forward any increase in the risk of a false positive finding and as long as this is understood it may be acceptable

Methods based on p-values 13 Author | 00 Month YearSet area descriptor | Sub level 1

Bonferroni N different null hypotheses H 1, … H N Calculate corresponding p-values p 1, … p N Reject H k if and only if p k <  /N Variation: The limits may be unequal as long as they sum up to  Conservative

Bonferroni’s inequality P(A i ) = P(reject H 0i when it is true ) Reject at least one hypthesis falsely

Example of Bonferroni correction Suppose we have N = 3 t-tests. Assume target alpha(T)= Bonferroni corrected p-value is alpha(T)/N = 0.05/3 = Unadjusted p-values are p1 = 0.001; p2 = 0.013; p3 = p1 = < , so reject H 01 p2 = < , so reject H 02 p3 = > , so do not reject H 03

Holm 17 Author | 00 Month YearSet area descriptor | Sub level 1

Holm N different null hypotheses H 01, … H 0N Calculate corresponding p-values p 1, … p N Order the p-values from the smallest to the largest, p(1) < ….<p(N) Start with the smallest p-value and reject H(j) as long as p(j) <  /(N-j+1)

Example of Holm’s test Suppose we have N = 3 t-tests. Assume target alpha(T)= Unadjusted p-values are p 1 = 0.001; p 2 = 0.013; p 3 = For the j th test, calculate  (j) =  (T)/(N – j +1) For test j = 1,  (j) =  (T)/(N – j +1) = 0.05/(3 – 1 + 1) = 0.05 / 3 =

For test j=1, the observed p1 = is less than alpha(j) = , so we reject the null hypothesis. For test j = 2,  (j) =  (T)/(N – j +1) = 0.05/(3 – 2 + 1) = 0.05 / 2 = For test j=2, the observed p 2 = is less than  (j) = 0.025, so we reject the null hypothesis. For test j = 3,  (j) =  (T)/(N – j +1) = 0.05/(3 – 3 + 1) = 0.05 For test j=3, the observed p2 = is greater than  (j) = 0.05, so we do not reject the null hypothesis.

Simes 21 Author | 00 Month YearSet area descriptor | Sub level 1

Hochberg 22 Author | 00 Month YearSet area descriptor | Sub level 1

Hochberg N different null hypotheses H 1, … H N Calculate corresponding p-values p 1, … p N Order the p-values from the smallest to the largest, p(1) < ….<p(N) Start with the largest p-value. If p(N) <  stop and declare all comparisons significant at level  (i.e. reject H (1) … H (N) at level  ). Otherwise accept H (N) and go to the next step if p(N-1) <  /2 stop and declare H (1) … H (N-1) significant. Otherwise accept H (N-1) and go to the next step …. If p(N-k+1) <  /(N-k+1) stop and declare H (1) … H (N-k+1) significant. Otherwise accept H (N-k+1) and go to the next step

Example Assume we performed N=5 tests of hypothesis simultaneously and want the result to be at the level The p- values obtained were p(1)0.009 p(2)0.011 p(3)0.012 p(4)0.134 p(5)0.512

Bonferroni: 0.05/5=0.01. Since only p(1) is less than 0.01 we reject H(1) but accept the remaining hypotheses. Holm: p(1), p(2) and p(3) are less than 0.05/5, 0.05/4 and 0.05/3 respectively so we reject the corresponding hypotheses H(1), H(2) and H(3). But p(4) = > 0.05/2=0.025 so we stop and accept H(4) and H(5). Hochberg: is not less than 0.05 so we accept H(5) is not less than so we accept H(4) is less than so we reject H(1),H(2) and H(3)

Family-wise error rates 26 Author | 00 Month YearSet area descriptor | Sub level 1

Example 1 27 Author | 00 Month YearSet area descriptor | Sub level 1

Example 2 28 Author | 00 Month YearSet area descriptor | Sub level 1

Methods for constructing multiple testing procedures 29 Author | 00 Month YearSet area descriptor | Sub level 1

30 Author | 00 Month YearSet area descriptor | Sub level 1

Fixed sequence 31 Author | 00 Month YearSet area descriptor | Sub level 1

Fallback 32 Author | 00 Month YearSet area descriptor | Sub level 1

Summary 33 Author | 00 Month YearSet area descriptor | Sub level 1

Example 1 34 Author | 00 Month YearSet area descriptor | Sub level 1

35 Author | 00 Month YearSet area descriptor | Sub level 1

Example 2 36 Author | 00 Month YearSet area descriptor | Sub level 1

Example. Two tests. In a heart failure study, two tests were to be performed in a confirmatory analysis. One for testing a treatment effect on death and one to test symptomatic relief using a questionnaire (Quest).

Example (cont.): Bonferroni: Set for death and for Quest Holm. Calculate the two p-values. If the smallest < conclude effect from that test. If the largest p-value < 0.05, conclude effect from that test. Hochberg. Calculate the two p-values. If the largest < 0.05 conclude effect for both variables. If not and the smallest < 0.025, conclude effect from that test.

Example (cont.): Closed test procedure. Choose one of the variables to test first (must be pre-specified). Calculate the two p-values. If the p-value for the first variable is 0.05 then none of the variables are significant.

Drug project example : Crestor (rosuvastatin) Multiplicity issue Four drugs with multiple doses. The study was an open-label study that was planned to be post-NDA. STELLAR was a 15-arm parallel group study comparing doses of rosuvastatin to doses of other statins: rosuva 10, 20, 40, 80 mg versus atorva 10, 20, 40, 80 mg versus prava 10, mg versus simva 10, 20, 40, 80 mg. The primary variable was percent change from baseline in LDL-C.

A commercial request was to compare rosuvastatin to other statins dose-to-dose. To address this objective, 25 pairwise comparisons of interest were specified. A Bonferroni correction was used to account for multiple comparisons. The sample size was estimated considering the Bonferroni correction. It was a large study, with about n=150 per arm. Choice of the conservative Bonferroni correction was influenced by the fact that a competitor received a warning letter from the FDA for dose- to-dose promotion from a study that was not designed to do dose-to-dose comparisons.

There was no discussion with the FDA about correction for multiplicity in STELLAR. Results are considered robust, and they appear in the Crestor label. References 1. Jones PH et al. Comparison of the efficacy and safety of rosuvastatin versus atorvastatin, simvastatin, and pravastatin across doses (STELLAR trial). Am J Cardiol 2003;92: McKenney JM et al. Comparison of the efficacy of rosuvastatin versus atorvastatin, simvastatin, and pravastatin in achieving lipid goals: results from the STELLAR trial. Current Medical Research and Opinion 2003;19(8):