Use of Candidate Predictive Biomarkers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.

Slides:



Advertisements
Similar presentations
Patient Selection Markers in Drug Development Programs
Advertisements

New Paradigms for Clinical Drug Development in the Genomic Era Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of.
Sample size estimation
Breakout Session 4: Personalized Medicine and Subgroup Selection Christopher Jennison, University of Bath Robert A. Beckman, Daiichi Sankyo Pharmaceutical.
Federal Institute for Drugs and Medical Devices | The Farm is a Federal Institute within the portfolio of the Federal Ministry of Health (Germany) How.
Transforming Correlative Science to Predictive Personalized Medicine Richard Simon, D.Sc. National Cancer Institute
Statistical Issues in Incorporating and Testing Biomarkers in Phase III Clinical Trials FDA/Industry Workshop; September 29, 2006 Daniel Sargent, PhD Sumithra.
Clinical Trial Designs for the Evaluation of Prognostic & Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
Use of Archived Tissue in Evaluating the Medical Utility of Prognostic & Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Targeted (Enrichment) Design. Prospective Co-Development of Drugs and Companion Diagnostics 1. Develop a completely specified genomic classifier of the.
Recursive Partitioning Method on Survival Outcomes for Personalized Medicine 2nd International Conference on Predictive, Preventive and Personalized Medicine.
Statistical Issues in the Evaluation of Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Moving from Correlative Science to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
New designs and paradigms for science- based oncology clinical trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
Use of Prognostic & Predictive Biomarkers in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Evaluation.
Predictive Classifiers Based on High Dimensional Data Development & Use in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch.
Richard Simon, D.Sc. Chief, Biometric Research Branch
Moving from Correlative Studies to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute brb.nci.nih.gov.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Statistical Challenges for Predictive Onclogy Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Sample Size Determination
Predictive Analysis of Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
On the Road to Genomic Predictive Medicine An Interim Analysis Richard Simon Chief, Biometric Research Branch National Cancer Institute.
Re-Examination of the Design of Early Clinical Trials for Molecularly Targeted Drugs Richard Simon, D.Sc. National Cancer Institute linus.nci.nih.gov/brb.
Using Predictive Biomarkers in the Design of Adaptive Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Sample Size Determination Ziad Taib March 7, 2014.
Use of Genomics in Clinical Trial Design and How to Critically Evaluate Claims for Prognostic & Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric.
Thoughts on Biomarker Discovery and Validation Karla Ballman, Ph.D. Division of Biostatistics October 29, 2007.
Predictive Biomarkers and Their Use in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Novel Clinical Trial Designs for Oncology
Predictive Analysis of Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Prospective Subset Analysis in Therapeutic Vaccine Studies Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Use of Prognostic & Predictive Biomarkers in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Some Statistical Aspects of Predictive Medicine
Multiple Choice Questions for discussion
Cancer Clinical Trials in the Genomic Era Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Validation of Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Development and Use of Predictive Biomarkers Dr. Richard Simon.
Hormone Refractory Prostate Cancer A Regulatory Perspective of End Points to Measure Safety and Efficacy of Drugs Hormone Refractory Prostate Cancer Bhupinder.
Use of Prognostic & Predictive Genomic Biomarkers in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
EDRN Approaches to Biomarker Validation DMCC Statisticians Fred Hutchinson Cancer Research Center Margaret Pepe Ziding Feng, Mark Thornquist, Yingye Zheng,
Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Experimental Design and Statistical Considerations in Translational Cancer Research (in 15 minutes) Elizabeth Garrett-Mayer, PhD Associate Professor of.
Steps on the Road to Predictive Oncology Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Moving from Correlative Studies to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
The Use of Predictive Biomarkers in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Steps on the Road to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Integration of Diagnostic Markers into the Development Process of Targeted Agents Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
Adaptive Designs for Using Predictive Biomarkers in Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Using Predictive Classifiers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
New Approaches to Clinical Trial Design Development of New Drugs & Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Introduction to Design of Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
Steps on the Road to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Compliance Original Study Design Randomised Surgical care Medical care.
Design & Analysis of Phase III Trials for Predictive Oncology Richard Simon Chief, Biometric Research Branch National Cancer Institute
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Moving From Correlative Science to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
 Adaptive Enrichment Designs for Confirmatory Clinical Trials Specifying the Intended Use Population and Estimating the Treatment Effect Richard Simon,
S1207: Phase III randomized, placebo-controlled trial adding 1 year of everolimus to adjuvant endocrine therapy for patients with high-risk, HR+, HER2-
Björn Bornkamp, Georgina Bermann
Medical Statistics Exam Technique and Coaching, Part 2 Richard Kay Statistical Consultant RK Statistics Ltd 22/09/2019.
Presentation transcript:

Use of Candidate Predictive Biomarkers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute

Predictive biomarkers Predictive biomarkers Measured before treatment to identify who is likely or unlikely to benefit from a particular treatment Measured before treatment to identify who is likely or unlikely to benefit from a particular treatment ER, HER2, KRAS, EGFR ER, HER2, KRAS, EGFR

Biomarker Validity Analytical validity Analytical validity Measures what it’s supposed to Measures what it’s supposed to Reproducible and robust Reproducible and robust Clinical validity (correlation) Clinical validity (correlation) It correlates with something clinically It correlates with something clinically Medical utility Medical utility Actionable resulting in patient benefit Actionable resulting in patient benefit

Developing a drug with a companion test increases complexity and cost of development but should improve chance of success and has substantial benefits for patients and for the economics of health care How can we do it in a way that provides the kind of reliable answers we expect from phase III trials?

When the Biology is Clear 1. Develop a completely specified classifier of the patients likely (or unlikely) to benefit from a new drug Classifier is based on either a single gene/protein or composite score Classifier is based on either a single gene/protein or composite score 2. Develop an analytically validated 3. Design a focused clinical trial to evaluate effectiveness of the new treatment and how it relates to the test

Using phase II data, develop predictor of response to new drug Develop Predictor of Response to New Drug Patient Predicted Responsive New Drug Control Patient Predicted Non-Responsive Off Study Targeted (Enrichment) Design

Evaluating the Efficiency of Targeted Design Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10: , 2004; Correction and supplement 12:3229, 2006 Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10: , 2004; Correction and supplement 12:3229, 2006 Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24: , Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24: , 2005.

Relative efficiency of targeted design depends on Relative efficiency of targeted design depends on proportion of patients test positive proportion of patients test positive effectiveness of new drug (compared to control) for test negative patients effectiveness of new drug (compared to control) for test negative patients When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients than the standard design in which the marker is not used When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients than the standard design in which the marker is not used

Comparing T vs C on Survival or DFS 5% 2-sided Significance and 90% Power % Reduction in HazardNumber of Events Required 25%509 30%332 35%227 40%162 45%118 50%88

Hazard ratio 0.60 for test + patients Hazard ratio 0.60 for test + patients 40% reduction in hazard 40% reduction in hazard Hazard ratio 1.0 for test – patients Hazard ratio 1.0 for test – patients 0% reduction in hazard 0% reduction in hazard 33% of patients test positive 33% of patients test positive Hazard ratio for unselected population is Hazard ratio for unselected population is 0.33* *1 = * *1 = % reduction in hazard 13% reduction in hazard

To have 90% power for detecting 40% reduction in hazard within a biomarker positive subset To have 90% power for detecting 40% reduction in hazard within a biomarker positive subset Number of events within subset = 162 Number of events within subset = 162 To have 90% power for detecting 13% reduction in hazard overall To have 90% power for detecting 13% reduction in hazard overall Number of events = 2172 Number of events = 2172

Stratification Design Develop Predictor of Response to New Rx Predicted Non- responsive to New Rx Predicted Responsive To New Rx Control New RXControl New RX

Develop prospective analysis plan for evaluation of treatment effect and how it relates to biomarker Develop prospective analysis plan for evaluation of treatment effect and how it relates to biomarker type I error should be protected for multiple comparisons type I error should be protected for multiple comparisons Trial sized for evaluating treatment effect overall and in subsets defined by test Trial sized for evaluating treatment effect overall and in subsets defined by test Stratifying” (balancing) the randomization is useful to ensure that all randomized patients have the test performed but is not necessary for the validity of comparing treatments within marker defined subsets Stratifying” (balancing) the randomization is useful to ensure that all randomized patients have the test performed but is not necessary for the validity of comparing treatments within marker defined subsets Post-stratification provides more time for development of analytically validated tests but risks validity of the results if adequate specimens are not collected in -> 100% of cases

Fallback Analysis Plan Compare the new drug to the control overall for all patients ignoring the classifier. Compare the new drug to the control overall for all patients ignoring the classifier. If p overall ≤ 0.01 claim effectiveness for the eligible population as a whole If p overall ≤ 0.01 claim effectiveness for the eligible population as a whole Otherwise perform a single subset analysis evaluating the new drug in the classifier + patients Otherwise perform a single subset analysis evaluating the new drug in the classifier + patients If p subset ≤ 0.04 claim effectiveness for the classifier + patients. If p subset ≤ 0.04 claim effectiveness for the classifier + patients.

Sample size for Analysis Plan To have 90% power for detecting uniform 33% reduction in overall hazard at 1% two-sided level requires 370 events. To have 90% power for detecting uniform 33% reduction in overall hazard at 1% two-sided level requires 370 events. If 33% of patients are positive, then when there are 370 total events there will be approximately 123 events in positive patients If 33% of patients are positive, then when there are 370 total events there will be approximately 123 events in positive patients 123 events provides 90% power for detecting a 45% reduction in hazard at a 4% two-sided significance level. 123 events provides 90% power for detecting a 45% reduction in hazard at a 4% two-sided significance level.

To detect a 40% reduction in hazard in an a- priori defined subset with 90% power and a 5% significance level requires 162 events in the subset. To detect a 40% reduction in hazard in an a- priori defined subset with 90% power and a 5% significance level requires 162 events in the subset. To detect a 40% reduction in hazard in an a- priori defined subset with 90% power and a 4% two-sided significance level requires 171 events in the subset. To detect a 40% reduction in hazard in an a- priori defined subset with 90% power and a 4% two-sided significance level requires 171 events in the subset. If the prevalence of the marker is 33%, then the trial might be sized for 3*171= total 513 events. If the prevalence of the marker is 33%, then the trial might be sized for 3*171= total 513 events.

R Simon. Using genomics in clinical trial design, Clinical Cancer Research 14: , 2008 R Simon. Using genomics in clinical trial design, Clinical Cancer Research 14: , 2008 R Simon. Designs and adaptive analysis plans for pivotal clinical trials of therapeutics and companion diagnostics, Expert Opinion in Medical Diagnostics 2:721-29, 2008 R Simon. Designs and adaptive analysis plans for pivotal clinical trials of therapeutics and companion diagnostics, Expert Opinion in Medical Diagnostics 2:721-29, 2008

Web Based Software for Planning Clinical Trials of Treatments with a Candidate Predictive Biomarker

The Biology is Often Not So Clear Cancer biology is complex and it is not always possible to have the right single completely defined predictive classifier identified and analytically validated by the time the pivotal trial of a new drug is ready to start accrual Cancer biology is complex and it is not always possible to have the right single completely defined predictive classifier identified and analytically validated by the time the pivotal trial of a new drug is ready to start accrual

K Candidate Biomarkers Design Based on Adaptive Threshold Design W Jiang, B Freidlin & R Simon JNCI 99: , 2007

K Candidate Biomarkers Design Have identified K candidate binary classifiers B 1, …, B K thought to be predictive of patients likely to benefit from T relative to C Have identified K candidate binary classifiers B 1, …, B K thought to be predictive of patients likely to benefit from T relative to C Eligibility not restricted by candidate markers Eligibility not restricted by candidate markers

Compare T vs C for all patients Compare T vs C for all patients If results are significant at level.01 claim broad effectiveness of T If results are significant at level.01 claim broad effectiveness of T Otherwise proceed as follows Otherwise proceed as follows Compare T vs C for the subset of patients positive for marker 1; compute p 1 Compare T vs C for the subset of patients positive for marker 1; compute p 1 Similarly compare T vs C for the subset of patients positive for marker 2 (p 2 ), positive for marker 3 (p 3 ), …positive for marker K (p k ) Similarly compare T vs C for the subset of patients positive for marker 2 (p 2 ), positive for marker 3 (p 3 ), …positive for marker K (p k ) Compute p* = min{p 1, p 2, …, p K } Compute p* = min{p 1, p 2, …, p K } Compute whether a value of p* is statistically significant when adjusted for multiple testing Compute whether a value of p* is statistically significant when adjusted for multiple testing Adjust for multiple testing using permutation of treatment labels to adjust for correlation among tests Adjust for multiple testing using permutation of treatment labels to adjust for correlation among tests

To detect a 40% reduction in hazard in an a- priori defined subset with 90% power and a 4% two-sided significance level requires 171 events in the subset. To detect a 40% reduction in hazard in an a- priori defined subset with 90% power and a 4% two-sided significance level requires 171 events in the subset. If the prevalence of the marker is 33%, then the trial might be sized for 3*171= total 513 events. If the prevalence of the marker is 33%, then the trial might be sized for 3*171= total 513 events. To adjust for multiplicity with 4 independent tests, 171 -> 224; 513 -> 672 total events. To adjust for multiplicity with 4 independent tests, 171 -> 224; 513 -> 672 total events.

Designs When there are Many Candidate Markers and too Much Patient Heterogeneity for any Single Marker

Adaptive Signature Design Adaptive Signature Design Boris Freidlin and Richard Simon Clinical Cancer Research 11:7872-8, 2005

Biomarker Adaptive Signature Design Randomized trial of T vs C Randomized trial of T vs C Large number of candidate predictive biomarkers available Large number of candidate predictive biomarkers available Eligibility not restricted by any biomarker Eligibility not restricted by any biomarker This approach can be used with any set of candidate markers This approach can be used with any set of candidate markers

End of Trial Analysis Fallback Analysis Compare T to C for all patients at significance level α 0 (eg 0.01) Compare T to C for all patients at significance level α 0 (eg 0.01) If overall H 0 is rejected, then claim effectiveness of T for eligible patients If overall H 0 is rejected, then claim effectiveness of T for eligible patients Otherwise proceed as follows Otherwise proceed as follows

Using only a randomly selected subset of patients of pre-specified size (e.g. 1/3 ) to be used as a training set T, develop a binary classifier M based of whether a patient is likely to benefit from T relative to C Using only a randomly selected subset of patients of pre-specified size (e.g. 1/3 ) to be used as a training set T, develop a binary classifier M based of whether a patient is likely to benefit from T relative to C The classifier may use multiple markers The classifier may use multiple markers The classifier classifies patients into only 2 subsets; those predicted to benefit from T and those for whom T is not predicted better than C The classifier classifies patients into only 2 subsets; those predicted to benefit from T and those for whom T is not predicted better than C

Apply the classifier M to classify patients in the validation set V=D-T Apply the classifier M to classify patients in the validation set V=D-T Compare T vs C in the subset of V who are predicted to benefit from T using a threshold of significance of 0.04 Compare T vs C in the subset of V who are predicted to benefit from T using a threshold of significance of 0.04

This approach can also be used to identify the subset of patients who don’t benefit from T in cases where T is superior to C overall at the 0.01 level. This approach can also be used to identify the subset of patients who don’t benefit from T in cases where T is superior to C overall at the 0.01 level.

Cross-Validated Adaptive Signature Design Freidlin B, Jiang W, Simon R Freidlin B, Jiang W, Simon R Clinical Cancer Research 16(2) 2010

At the conclusion of the trial randomly partition the patients into K approximately equally sized sets P 1, …, P K At the conclusion of the trial randomly partition the patients into K approximately equally sized sets P 1, …, P K Let D -i denote the full dataset minus data for patients in P i Let D -i denote the full dataset minus data for patients in P i Omit patients in P 1 Omit patients in P 1 Apply the defined algorithm to analyze the data in D -1 to obtain a classifier M -1 Apply the defined algorithm to analyze the data in D -1 to obtain a classifier M -1 Classify each patient j in P 1 using model M -1 Classify each patient j in P 1 using model M -1 Record the treatment recommendation T or C Record the treatment recommendation T or C

Repeat the above for all K loops of the cross- validation Repeat the above for all K loops of the cross- validation All patients have been classified once as what their optimal treatment is predicted to be All patients have been classified once as what their optimal treatment is predicted to be

Let S T denote the set of patients for whom treatment T is predicted optimal Let S T denote the set of patients for whom treatment T is predicted optimal Compare outcomes for patients in S T who actually received T to those in S T who actually received C Compare outcomes for patients in S T who actually received T to those in S T who actually received C Compute Kaplan Meier curves of those receiving T and those receiving C Compute Kaplan Meier curves of those receiving T and those receiving C Let z T = standardized log-rank statistic Let z T = standardized log-rank statistic

Test of Significance for Effectiveness of T vs C Compute statistical significance of z T by randomly permuting treatment labels and repeating the entire cross-validation procedure Compute statistical significance of z T by randomly permuting treatment labels and repeating the entire cross-validation procedure Do this 1000 or more times to generate the permutation null distribution of treatment effect for the patients in each subset Do this 1000 or more times to generate the permutation null distribution of treatment effect for the patients in each subset

By applying the analysis algorithm to the full RCT dataset D, recommendations are developed for how future patients should be treated By applying the analysis algorithm to the full RCT dataset D, recommendations are developed for how future patients should be treated

The size of the T vs C treatment effect for the indicated population is (conservatively) estimated by the Kaplan Meier survival curves of T and of C in S T The size of the T vs C treatment effect for the indicated population is (conservatively) estimated by the Kaplan Meier survival curves of T and of C in S T

70% Response to T in Sensitive Patients 25% Response to T Otherwise 25% Response to C 30% Patients Sensitive ASDCV-ASD Overall 0.05 Test Overall 0.04 Test Sensitive Subset 0.01 Test Overall Power

506 prostate cancer patients were randomly allocated to one of four arms: Placebo and 0.2 mg of diethylstilbestrol (DES) were combined as control arm C 1.0 mg DES, or 5.0 mg DES were combined as T. The end-point was overall survival (death from any cause). Covariates: Age: In years Performance status (pf): Not bed-ridden at all vs other Tumor size (sz): Size of the primary tumor (cm2) Index of a combination of tumor stage and histologic grade (sg) Serum phosphatic acid phosphatase levels (ap)

Figure 1: Overall analysis. The value of the log-rank statistic is 2.9 and the corresponding p-value is The new treatment thus shows no benefit overall at the 0.05 level.

Figure 2: Cross-validated survival curves for patients predicted to benefit from the new treatment. log-rank statistic = 10.0, permutation p-value is.002

Figure 3: Survival curves for cases predicted not to benefit from the new treatment. The value of the log-rank statistic is 0.54.

Acknowledgements Boris Freidlin Boris Freidlin Yingdong Zhao Yingdong Zhao Wenyu Jiang Wenyu Jiang Aboubakar Maitournam Aboubakar Maitournam