1 Inference and Operational Conduct Issues with Sample Size Adjustment Based On Interim Observed Effect Size H.M. James Hung (DB1/OB/OPaSS/CDER/FDA) Lu.

Slides:



Advertisements
Similar presentations
Emerging Issues with Adaptation of Clinical Trial Design in Drug Development* H.M. James Hung Division of Biometrics I, Office of Biostatistics, OPaSS,
Advertisements

Interim Analysis in Clinical Trials: A Bayesian Approach in the Regulatory Setting Telba Z. Irony, Ph.D. and Gene Pennello, Ph.D. Division of Biostatistics.
Mentor: Dr. Kathryn Chaloner Iowa Summer Institute in Biostatistics
Hypothesis Testing Goal: Make statement(s) regarding unknown population parameter values based on sample data Elements of a hypothesis test: Null hypothesis.
A Flexible Two Stage Design in Active Control Non-inferiority Trials Gang Chen, Yong-Cheng Wang, and George Chi † Division of Biometrics I, CDER, FDA Qing.
Data Monitoring Models and Adaptive Designs: Some Regulatory Experiences Sue-Jane Wang, Ph.D. Associate Director for Adaptive Design and Pharmacogenomics,
Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
1 Implementing Adaptive Designs in Clinical Trials: Risks and Benefits Christopher Khedouri, Ph.D.*, Thamban Valappil, Ph.D.*, Mohammed Huque, Ph.D.* *
Statistical Analysis for Two-stage Seamless Design with Different Study Endpoints Shein-Chung Chow, Duke U, Durham, NC, USA Qingshu Lu, U of Science and.
A new group-sequential phase II/III clinical trial design Nigel Stallard and Tim Friede Warwick Medical School, University of Warwick, UK
Statistical Issues in Research Planning and Evaluation
ODAC May 3, Subgroup Analyses in Clinical Trials Stephen L George, PhD Department of Biostatistics and Bioinformatics Duke University Medical Center.
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Introduction to Hypothesis Testing
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Introduction to Hypothesis Testing
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
BCOR 1020 Business Statistics
Sample Size Determination
Adaptive Designs for Clinical Trials
Sample Size Determination Ziad Taib March 7, 2014.
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Adaptive designs as enabler for personalized medicine
Testing and Estimation Procedures in Multi-Armed Designs with Treatment Selection Gernot Wassmer, PhD Institut für Medizinische Statistik, Informatik und.
Background to Adaptive Design Nigel Stallard Professor of Medical Statistics Director of Health Sciences Research Institute Warwick Medical School
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Estimation of Statistical Parameters
Chapter 8 Introduction to Hypothesis Testing
Concept of Power ture=player_detailpage&v=7yeA7a0u S3A.
Chapter 9 Power. Decisions A null hypothesis significance test tells us the probability of obtaining our results when the null hypothesis is true p(Results|H.
1 Statistical Review Dr. Shan Sun-Mitchell. 2 ENT Primary endpoint: Time to treatment failure by day 50 Placebo BDP Patients randomized Number.
The changing landscape of interim analyses for efficacy / futility
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
1 An Interim Monitoring Approach for a Small Sample Size Incidence Density Problem By: Shane Rosanbalm Co-author: Dennis Wallace.
What is a non-inferiority trial, and what particular challenges do such trials present? Andrew Nunn MRC Clinical Trials Unit 20th February 2012.
1 Chapter 8 Hypothesis Testing 8.2 Basics of Hypothesis Testing 8.3 Testing about a Proportion p 8.4 Testing about a Mean µ (σ known) 8.5 Testing about.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the.
EXPERIMENTAL EPIDEMIOLOGY
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
1 Interim Analysis in Clinical Trials Professor Bikas K Sinha [ ISI, KolkatA ] RU Workshop : April18,
Issues concerning the interpretation of statistical significance tests.
Chapter 8: Confidence Intervals based on a Single Sample
1 Keaven Anderson, Ph.D. Amy Ko, MPH Nancy Liu, Ph.D. Yevgen Tymofyeyev, Ph.D. Merck Research Laboratories June 9, 2010 Information-Based Sample Size Re-estimation.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 1: Demonstrating Equivalence of Active Treatments:
MPS/MSc in StatisticsAdaptive & Bayesian - Lect 41 Lecture 4 Sample size reviews 4.1A general approach to sample size reviews 4.2Binary data 4.3Normally.
MPS/MSc in StatisticsAdaptive & Bayesian - Lect 51 Lecture 5 Adaptive designs 5.1Introduction 5.2Fisher’s combination method 5.3The inverse normal method.
Sample Size Determination
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
Session 6: Other Analysis Issues In this session, we consider various analysis issues that occur in practice: Incomplete Data: –Subjects drop-out, do not.
URBDP 591 A Lecture 16: Research Validity and Replication Objectives Guidelines for Writing Final Paper Statistical Conclusion Validity Montecarlo Simulation/Randomization.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Sample Size and Power Considerations.
1 Chapter 6 SAMPLE SIZE ISSUES Ref: Lachin, Controlled Clinical Trials 2:93-113, 1981.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Critical Appraisal Course for Emergency Medicine Trainees Module 2 Statistics.
Chapter Nine Hypothesis Testing.
Sample Size Determination
Donald E. Cutlip, MD Beth Israel Deaconess Medical Center
Strategies for Implementing Flexible Clinical Trials Jerald S. Schindler, Dr.P.H. Cytel Pharmaceutical Research Services 2006 FDA/Industry Statistics Workshop.
Crucial Statistical Caveats for Percutaneous Valve Trials
Sue Todd Department of Mathematics and Statistics
Aiying Chen, Scott Patterson, Fabrice Bailleux and Ehab Bassily
Hui Quan, Yi Xu, Yixin Chen, Lei Gao and Xun Chen Sanofi June 28, 2019
Exact Inference for Adaptive Group Sequential Clinical Trials
Presentation transcript:

1 Inference and Operational Conduct Issues with Sample Size Adjustment Based On Interim Observed Effect Size H.M. James Hung (DB1/OB/OPaSS/CDER/FDA) Lu Cui (Aventis Pharmaceuticals) Sue-Jane Wang (DB2/OB/OPaSS/CDER/FDA) John Lawrence (DB1/OB/OPaSS/CDER/FDA) Presented in Annual Symposium of New Jersey Chapter of ASA, Piscataway, NJ, June 4, 2002

2 Disclaimer The views expressed in this presentation are not those of the U.S. Food and Drug Administration, nor of Aventis Pharmaceuticals. Dr. Lu Cui was one of the primary investigators of this research during his tenure in FDA.

3 Acknowledgments The research was supported by FDA/CDER RSR Funds, #96-010A and #99/ Thanks are due to Dr. Lu Cui for sharing some of his slides

4 Selected References in Adaptive Design/InterimAnalysis Bauer & Köhne (1994, Biometrics) Bauer & Röhmel (1995, Stat. In Med.) Lan & Trost (1997, ASA Proceedings) Fisher (1998, Stat. In Med.) Posch & Bauer (1999, Biometrical J.) Kieser, Bauer & Lehmacher (1999, Biometrical J.) Lehmacher & Wassmer (1999, Biometrics) Müller & Schäfer (2001, Biometrics) Berry (2002, ASA Biopharmaceutical Report) Brannath, Posch & Bauer (2002, JASA) ………. etc

5 The materials of this presentation are selected from the main results of our RSR research work. Cui, Hung, Wang (1997 ASA; 1999 Biometrics) Lawrence & Hung (2002 ENAR talk)

6 Background Sample size (or amount of statistical information) is one of the design specifications vital to success of Phase II/III (confirmatory?) clinical trials It relates directly and closely to the true effect size (treatment difference normalized by the measure of variability) of the targeted response variable

7 Background Common recommendation Make “educated guess” about the effect size and plan sample size to detect this effect size (or a range of plausible effect sizes) with sufficient power [e.g., > 90% - Hung et al (1997 Biometrics)] This is always good because the fixed-info design 1) provides statistics that have important good statistical properties 2) avoids data-driven adjustments that may induce biases (statistical or operational) making the results not interpretable

8 Biases  use internal data to: change selection of patients, tune-up endpoints, drop/select centers, change patient mixture, do more data dredging or torturing to adjust analysis to get the desired conclusion, eliminate potential dropouts, change design to make treatment-related problems go away, tune up any design element to make the treatments easily differentiated, adjust or sample to a foregone conclusion …………...

9 Background But …. The effect size depends on a primary parameter (e.g., mean treatment difference) and nuisance parameters (e.g., standard deviation, background event rate) The effect size for detection may need to be clinically significant or meaningful (sometimes minimum clinically meaningful)  benefit/risk assessment (subjective) that might not be doable in designing the trial, hard to reach consensus

10 Background But …. The effect size may depend on patient mixtures  potential heterogeneous effects in subpopulations For a hard clinical outcome endpoint, “educated guess” about effect size is difficult e.g., for composite event endpoint, require “educated” guess of where the potential signal lies and what noises may be

11 Background But …. The effect size for detection may depend on $$  benefit/risk/cost consideration ………………. etc Practical considerations  effect size for detection can be a moving target and change as background circumstances change and maximum amount of statistical information one can commit to may also change

12 Background Experiences: Often oversimplify clinical trial designs and inferences and impose too many restrictions to the designs. If a trial fails, it is difficult to know whether it is because the treatment does not have an important effect or the study was underpowered for detecting it.

13 Background Lan (2001, FDA/OB Mini-Symposium) If we know the values of design elements (e.g., effect size) a priori, No Need and Not Ethical to conduct a confirmatory trial Bauer et al (2002, Method Inform Med) “….. It does not make sense to apply uniformly most powerful test in an unchanged design even if we have convincing evidence that this ‘best’ test in the preplanned design may be severely underpowered …….”

14 Need to enhance flexibility in traditional clinical trial design/analysis strategy because practical considerations may change and may often be unpredictable at the design stage

15 Emerging Strategy Mid-course modification of design specifications - adjust sample size - change tested hypothesis from superiority to non-inferiority or vice versa - change from one pre-specified primary endpoint to another pre-specified endpoint - change test method - drop a treatment arm ….. etc

16 Type I error rate may greatly exceed the acceptable level Statistical power may be compromised Traditional estimate may be severely biased Impact of Design Modification Based on Interim Observed Data

17 Sample Size Re-estimation Literature on sample size re-estimation is abundant. Increasing sample size (or amount of statistical information) based on nuisance parameters without breaking blind - has little effect on type I error - may preserve the intended power level - needs little or mild statistical adjustment (e.g., estimate, CI) Wittes & Brittain (1990), Gould (1992), Gould & Shih (1992) Shih (1992, 1993, 1995), Birkett & Day (1994) Jennison & Turnbull (1999, book), ……… etc

18 Sample Size Re-estimation But ….. Lan (1997, ASA talk), Liu (2000, ICSA talk) e.g., knowing the components of the variance can lead to estimation of treatment difference; hence sample size re-estimation based on variance might affect type I error depending on how it is processed (e.g., by obtaining TSS & WSS)

19 Sample Size Re-estimation Increasing sample size (or amount of statistical information) based on the internal data path may substantially inflate type I error, bias the estimate, invalidate CI - crude estimate of maximum amount of inflation obtainable, at least by simulation

20 Sample Size Re-estimation Question: At an interim time of a trial, if the observed treatment difference is far smaller than expected, we wish to increase sample size. Then, what adjustments are needed to perform valid statistical testing?

21 Selected References Bauer & Köhne (1994, Biometrics) Proschan & Hunsberger (1995, Biometrics) Lan & Trost (1997, ASA Proceedings) Cui, Hung & Wang (1997 ASA Proceedings, 1999, Biometrics) Fisher (1998, Stat. In Med.) Shen & Fisher (1999, Biometrics) Lehmacher & Wassmer (1999, Biometrics) Müller & Schäfer (2001, Biometrics) Liu & Chi (2001, Biometrics) Lan (2001, FDA/CDER/OB mini-symposium) Lan (2002, FDA/ASA workshop) Brannath, Posch & Bauer (2002, JASA) Lawrence & Hung (2002, ENAR)

22 Experimental (T) with N subjects Control (C) with N subjects Baseline Test H 0 :  = 0 vs. H 1 :  > 0  = T - C Sample Size Re-estimation  = 1 To detect  =  at sig. level  and power 1- , N (per group) = 2(z  +z  ) 2 /  2

23 Sample Size Re-estimation (non-sequential trial) Plan to enroll N=100 subjects/group to detect  = 0.46 at  = and power 90% After 40 subjects per group contribute data, the estimate of  leads to  * = 0.37 Re-estimate total sample size M = 150/group

24 Sample Size Re-estimation (non-sequential trial) At the end of the trial (M = 150), compute the CHW adaptive test [Cui, Hung & Wang, 1999] U = (40/100) 1/2 Z (60/100) 1/2 W 0.60 W 0.60 : normalized test for the additional 110 subjects per group after the interim time t=0.4 Lan (2001 FDA/CDER/OB Mini-symposium, 2002 FDA/ASA workshop)

25 Sample Size Re-estimation (non-sequential trial) U is standard normal under H 0 If U > 1.96, then conclude  > 0 Significance level = U is more powerful than original Z w/o increasing N

26 Sample Size Re-estimation (non-sequential trial) Estimation & CI for  --- Lawrence & Hung (2002, ENAR talk) Construct consistent estimator and valid CI for  CHW test is Z-ratio of the consistent estimator

27 Experimental (T) with N subjects Control (C) with N subjects Baseline 0 N/52N/5 0 IA-1 20% IA-2 40% Final 100% Sample Size Re-estimation (group sequential trial) N Test H 0 :  = 0 vs. H 1 :  > 0  = T - C

28 Sample Size Re-estimation (group sequential trial) N is planned to detect  =  at level  and with power 1-  At interim time s, estimate  s  0 <  * <  (say, based on conditional power)  increase sample size from N to M, approximately M=N(  /  * ) 2 Total information changes from 1 to  = M/N Compute b = (  - s)/(1- s)

29 Sample Size Re-estimation (group sequential trial) For interim analysis at time t when M t subjects contribute information, compute N t = (M t - N s )/b + N s and t = N t /N. Adapt the traditional repeated significance test: Traditional: Z t = Z s (N s /M t ) 1/2 + W t-s (1- N s /M t ) 1/2 New: U t = Z s (N s /N t ) 1/2 + W t-s (1- N s /N t ) 1/2 Cui, Hung, Wang (1999, Biometrics)

30 Sample Size Re-estimation (group sequential trial) {U t } w/ N possibly changed to M & {Z t } w/o change of N have identical distn. Find critical value C t at time t based on the initially selected alpha-spending function Reject H 0 if U t > C t ; otherwise, trial continues Cui, Hung, Wang (1999, Biometrics)

31 Empirical type I error rate (Adaptive test; type I error = w/o N increase; Gaussian) (increase N by <= 4x; O’Brien-Fleming boundary)

32 Empirical power (Adaptive test; 1-  = w/o N increase; Gaussian) (increase N by <=4x; O’Brien-Fleming boundary)

33 Empirical type I error rate (Adaptive test; type I error = w/o N increase; Binomial,  C = 0.20) (increase N by <= 4x; O’Brien-Fleming boundary)

34 Empirical power (Adaptive test; 1-  = 0.60 w/o N increase; Binomial,  C =0.20) (increase N by <= 4x; O’Brien-Fleming boundary)

35 Sample Size Re-estimation (Example: group sequential trial) Plan to have 100 subjects/group to detect  = 0.46 at  = and power 90% After 50 subjects per group contribute data, the estimate suggests to detect  * = 0.46/  2 Re-estimate sample size M = 150/group  = 1.5 b = ( )/(1-0.5) = 2

36 Sample Size Re-estimation (Example: group sequential trial) Suppose that an interim analysis will be done when additional 50 subjects/group contribute data (M t = 100) N t = ( )/ = 75 and t = 0.75 Suppose that O-F alpha spending function is originally used for interim analysis Then the critical value for the adaptive test at t = 0.75 is C 0.75 = 2.36

37 Sample Size Re-estimation (Example: group sequential trial) The adaptive test at M 0.75 = 100 is U 0.75 = T 0.50 (2/3) 1/2 + W 0.25 (1/3) 1/2 W 0.25 is the normalized test performed on the additional 50 subjects per group If U 0.75 > 2.36, then stop the trial and conclude that experimental treatment is superior to control

38 Sample Size Re-estimation (Example: group sequential trial) If the trial continues to the end, then the final adaptive test (i.e., at M 1 = 150) is U 1 = T 0.50 (1/2) 1/2 + W 0.50 (1/2) 1/2 W 0.50 is the normalized test performed on the additional 100 subjects per group If U 1 > 2.01, then conclude that experimental treatment is superior to control

39 Sample Size Re-estimation CHW adaptive test has type I error rate attained at the targeted level and large power increase (relative to w/o re-estimation) and its implementation is very easy Consistent estimator and confidence interval compatible with CHW adaptive test are readily available All the above discussions are based on asymptotic (i.e., ‘sufficiently large’ sample size) theory Cui, Hung, Wang (1999, Biometrics) Lawrence & Hung (2002, ENAR talk)

40 Sample Size Re-estimation CHW adaptive test reduces to the conventional test if sample size is not changed. So do the consistent estimator and confidence interval compatible with CHW adaptive test. Cui, Hung, Wang (1999, Biometrics) Lawrence & Hung (2002, ENAR talk)

41 Sample Size Re-estimation CHW adaptive test has another look using a combination of p-values from the incremental group data [Lehmacher & Wassmer (1999), Brannath, Posch & Bauer (2002)]

42 Sample Size Re-estimation Sample size re-estimation criterion After obtaining the observed  s at time s, one could recalculate sample size M such that conditional power CP(  * ) = Pr{CHW rejects H 0 at the end |  s,  =  *} = 1-  for the new intended  *. Then, the power of CHW for detecting  * is at least 1-  Lawrence (2002, personal communication)

43 Sample Size Re-estimation Sample size re-estimation criterion Better to look for more stable signal via examination of sample path over time - reduce the chance of being misled by possible aberration of early data

44 Operational Conduct Issues Sample size re-estimation based on unblinded data opens rooms for operational biases. During adaptive change, only unblind data that are necessary to be unblinded in order to avoid operational bias. Standard Operation Procedure (SOP) must be in place in the protocol and trial conduct must comply with the SOP.

45 Operational Conduct Issues Sample size re-estimation based on unblinded data opens rooms for multiple analyses that may lead to more protocol amendments and other changes of design elements. These types of changes or amendments (potentially driven by current data) may introduce problems in the interpretation of the results. Attention should be given to this potential hazard with the design.

46 Operational Conduct Issues But …. Most of the operational conduct issues related to sample size re-estimation based on unblinded data are also encountered in traditional designs with interim analyses

47 Operational Conduct Issues Recommend that sample size modification be done, if needed, by an independent third party that has no conflict of interest issue What took place after sample size (or any design) modification needs to be documented fully

48 Other Issues Estimation following data-driven sample size change is an important issue, particularly the effect estimate may be used to plan future superiority trials or active-control non-inferiority trials. Careful consideration of the benefit and risk of such sample size modification is needed in practical applications

49 Other Issues Bauer et al (2002, Method Inform Med) “….. Clearly in such designs more logistics must be put in to properly handle all problems of interim analyses including their consequences for the design. They need rigid rather than flexible planning modalities. ……”

50 Summary Conventional fixed-information design has made tremendous contributions to clinical science. The statistics have good statistical properties. This design needs to permit sample size adjustment when it falls short (perhaps by surprise) in many applications, e.g., in studying endpoints for which - prior data are poor or hard to provide reasonably good educated guess about effect size - minimum clinical meaningful effect not available

51 Summary Conventional design properly adapted for necessary sample size adjustment can be very useful with - proper planning to avoid any operational conduct change that may lead to bias - proper adjustment of statistical analysis method estimation issue needs attention and research

52 Summary Adaptive design with proper planning is very attractive with caution - change sample size or randomization allocation - change study hypothesis (e.g., superiority vs. non- inferiority or equivalence) - change test method - change the primary endpoint from one pre- specified endpoint to another pre-specified one - drop futile or unsafe treatment arms and more……  More work to do 