Steps on the Road to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute

Slides:



Advertisements
Similar presentations
A New Paradigm for the Utilization of Genomic Classifiers for Patient Selection in the Critical Path of Medical Product Development Richard Simon, D.Sc.
Advertisements

New Paradigms for Clinical Drug Development in the Genomic Era Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of.
Sample size estimation
Breakout Session 4: Personalized Medicine and Subgroup Selection Christopher Jennison, University of Bath Robert A. Beckman, Daiichi Sankyo Pharmaceutical.
Federal Institute for Drugs and Medical Devices | The Farm is a Federal Institute within the portfolio of the Federal Ministry of Health (Germany) How.
Transforming Correlative Science to Predictive Personalized Medicine Richard Simon, D.Sc. National Cancer Institute
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Statistical Issues in Incorporating and Testing Biomarkers in Phase III Clinical Trials FDA/Industry Workshop; September 29, 2006 Daniel Sargent, PhD Sumithra.
Clinical Trial Designs for the Evaluation of Prognostic & Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
Use of Archived Tissue in Evaluating the Medical Utility of Prognostic & Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Targeted (Enrichment) Design. Prospective Co-Development of Drugs and Companion Diagnostics 1. Develop a completely specified genomic classifier of the.
Statistical Issues in the Evaluation of Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Moving from Correlative Science to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
New designs and paradigms for science- based oncology clinical trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
Use of Prognostic & Predictive Biomarkers in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Differentially expressed genes
Predictive Classifiers Based on High Dimensional Data Development & Use in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch.
Chapter 11 Multiple Regression.
Richard Simon, D.Sc. Chief, Biometric Research Branch
Moving from Correlative Studies to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute brb.nci.nih.gov.
Statistical Challenges for Predictive Onclogy Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Predictive Analysis of Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Re-Examination of the Design of Early Clinical Trials for Molecularly Targeted Drugs Richard Simon, D.Sc. National Cancer Institute linus.nci.nih.gov/brb.
Using Predictive Biomarkers in the Design of Adaptive Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Adaptive Designs for Clinical Trials
Sample Size Determination Ziad Taib March 7, 2014.
Use of Genomics in Clinical Trial Design and How to Critically Evaluate Claims for Prognostic & Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric.
Thoughts on Biomarker Discovery and Validation Karla Ballman, Ph.D. Division of Biostatistics October 29, 2007.
Predictive Biomarkers and Their Use in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Novel Clinical Trial Designs for Oncology
Predictive Analysis of Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Prospective Subset Analysis in Therapeutic Vaccine Studies Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Use of Prognostic & Predictive Biomarkers in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Some Statistical Aspects of Predictive Medicine
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Cancer Clinical Trials in the Genomic Era Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Validation of Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Development and Use of Predictive Biomarkers Dr. Richard Simon.
Use of Prognostic & Predictive Genomic Biomarkers in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
EDRN Approaches to Biomarker Validation DMCC Statisticians Fred Hutchinson Cancer Research Center Margaret Pepe Ziding Feng, Mark Thornquist, Yingye Zheng,
Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Steps on the Road to Predictive Oncology Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Chapter 16 The Chi-Square Statistic
Moving from Correlative Studies to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Therapeutic Equivalence & Active Control Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Use of Candidate Predictive Biomarkers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
The Use of Predictive Biomarkers in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Integration of Diagnostic Markers into the Development Process of Targeted Agents Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
Adaptive Designs for Using Predictive Biomarkers in Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Using Predictive Classifiers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
New Approaches to Clinical Trial Design Development of New Drugs & Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Introduction to Design of Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
© Copyright McGraw-Hill 2004
Sample Size Determination
Steps on the Road to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Advanced Clinical Trial Educational Session Richard Simon, D.Sc. Biometric Research Branch National Cancer Institute
Design & Analysis of Phase III Trials for Predictive Oncology Richard Simon Chief, Biometric Research Branch National Cancer Institute
Moving From Correlative Science to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Chapter 13 Understanding research results: statistical inference.
1 Chapter 6 SAMPLE SIZE ISSUES Ref: Lachin, Controlled Clinical Trials 2:93-113, 1981.
 Adaptive Enrichment Designs for Confirmatory Clinical Trials Specifying the Intended Use Population and Estimating the Treatment Effect Richard Simon,
Sample Size Determination
BINARY LOGISTIC REGRESSION
Medical Statistics Exam Technique and Coaching, Part 2 Richard Kay Statistical Consultant RK Statistics Ltd 22/09/2019.
Presentation transcript:

Steps on the Road to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute

BRB Website brb.nci.nih.gov Powerpoint presentations Reprints & Presentations Reports BRB-ArrayTools software Web based Sample Size Planning –Clinical Trials using predictive biomarkers –Development of gene expression based predictive classifiers

Many cancer treatments benefit only a minority of patients to whom they are administered –Particularly true for molecularly targeted drugs Being able to predict which patients are likely to benefit would –save patients from unnecessary toxicity, and enhance their chance of receiving a drug that helps them –Help control medical costs –Improve the success rate of clinical drug development

Biomarkers Prognostic –Measured before treatment to indicate long- term outcome for patients untreated or receiving standard treatment Predictive –Measured before treatment to select good patient candidates for a particular treatment

Prognostic and Predictive Biomarkers in Oncology Single gene or protein measurement –HER2 protein staining 2+ or 3+ –HER2 amplification –KRAS mutation Index or classifier that summarizes contributions of multiple genes/proteins –Empirically determined based on genome- wide correlating gene expression to patient outcome after treatment

Prospective Co-Development of Drugs and Companion Diagnostics 1.Develop a completely specified genomic classifier of the patients likely to benefit from a new drug 2.Establish analytical validity of the test 3.Design a pivotal RCT evaluating the new treatment with sample size, eligibility, and analysis plan prospectively based on use of the completely specified classifier/test.

Guiding Principle The data used to develop the classifier must be distinct from the data used to test hypotheses about treatment effect in subsets determined by the classifier –Developmental studies can be exploratory –Studies on which treatment effectiveness claims are to be based should be definitive studies that test a treatment hypothesis in a patient population completely pre-specified by the classifier

New Drug Developmental Strategy I Restrict entry to the phase III trial based on the binary predictive classifier, i.e. targeted design

Using phase II data, develop predictor of response to new drug Develop Predictor of Response to New Drug Patient Predicted Responsive New Drug Control Patient Predicted Non-Responsive Off Study

Applicability of Design I Primarily for settings where the classifier is based on a single gene whose protein product is the target of the drug –eg Herceptin With substantial biological basis for the classifier, it may be unacceptable ethically to expose classifier negative patients to the new drug Strong biological rationale or phase II data on unselected patients needed for approval of test

Evaluating the Efficiency of Strategy (I) Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10: , 2004; Correction and supplement 12:3229, 2006 Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24: , 2005

Web Based Software for Comparing Sample Size Requirements

Developmental Strategy (II) Develop Predictor of Response to New Rx Predicted Non- responsive to New Rx Predicted Responsive To New Rx Control New RXControl New RX

Developmental Strategy (II) Do not use the test to restrict eligibility, but to structure a prospective analysis plan Having a prospective analysis plan is essential “Stratifying” (balancing) the randomization is useful to ensure that all randomized patients have tissue available but is not a substitute for a prospective analysis plan The purpose of the study is to evaluate the new treatment overall and for the pre-defined subsets; not to modify or refine the classifier The purpose is not to demonstrate that repeating the classifier development process on independent data results in the same classifier

R Simon. Using genomics in clinical trial design, Clinical Cancer Research 14: , 2008 R Simon. Designs and adaptive analysis plans for pivotal clinical trials of therapeutics and companion diagnostics, Expert Opinion in Medical Diagnostics 2:721-29, 2008

Analysis Plan A Compare the new drug to the control for classifier positive patients –If p + >0.05 make no claim of effectiveness –If p +  0.05 claim effectiveness for the classifier positive patients and Compare new drug to control for classifier negative patients using 0.05 threshold of significance

Sample size for Analysis Plan A 88 events in classifier + patients needed to detect 50% reduction in hazard at 5% two-sided significance level with 90% power If 25% of patients are positive, then when there are 88 events in positive patients there will be about 264 events in negative patients –264 events provides 90% power for detecting 33% reduction in hazard at 5% two-sided significance level –Sequential futility monitoring may have enabled early cessation of accrual of classifier negative patients Not much earlier with time-to-event endpoint

Study-wise false positivity rate is limited to 5% with analysis plan A It is not necessary or appropriate to require that the treatment vs control difference be significant overall before doing the analysis within subsets

Analysis Plan B Compare the new drug to the control overall for all patients ignoring the classifier. –If p overall  0.03 claim effectiveness for the eligible population as a whole Otherwise perform a single subset analysis evaluating the new drug in the classifier + patients –If p subset  0.02 claim effectiveness for the classifier + patients.

This analysis strategy is designed to not penalize sponsors for having developed a classifier It provides sponsors with an incentive to develop genomic classifiers

Sample size for Analysis Plan B To have 90% power for detecting uniform 33% reduction in overall hazard at 3% two-sided level requires 297 events (instead of 263 for similar power at 5% level) If 25% of patients are positive, then when there are 297 total events there will be approximately 75 events in positive patients –75 events provides 75% power for detecting 50% reduction in hazard at 2% two-sided significance level –By delaying evaluation in test positive patients, 80% power is achieved with 84 events and 90% power with 109 events

Analysis Plan C Test for interaction between treatment effect in test positive patients and treatment effect in test negative patients If interaction is significant at level  int then compare treatments separately for test positive patients and test negative patients Otherwise, compare treatments overall

Sample Size Planning for Analysis Plan C 88 events in classifier + patients needed to detect 50% reduction in hazard at 5% two-sided significance level with 90% power If test is predictive but not prognostic, and if 25% of patients are positive, then when there are 88 events in positive patients there will be about 264 events in negative patients –264 events provides 90% power for detecting 33% reduction in hazard at 5% two-sided significance level

Simulation Results for Analysis Plan C Using  int =0.10, the interaction test has power 93.7% when there is a 50% reduction in hazard in test positive patients and no treatment effect in test negative patients A significant interaction and significant treatment effect in test positive patients is obtained in 88% of cases under the above conditions If the treatment reduces hazard by 33% uniformly, the interaction test is negative and the overall test is significant in 87% of cases

Biomarker Adaptive Threshold Design Wenyu Jiang, Boris Freidlin & Richard Simon JNCI 99: , 2007

Biomarker Adaptive Threshold Design Randomized trial of T vs C Have identified a univariate biomarker index B thought to be predictive of patients likely to benefit from T relative to C Eligibility not restricted by biomarker No threshold for biomarker determined Biomarker value scaled to range (0,1) Time-to-event data

Procedure A Compare T vs C for all patients –If results are significant at level.04 claim broad effectiveness of T –Otherwise proceed as follows

Procedure A Test T vs C restricted to patients with biomarker B > b –Let S(b) be log likelihood ratio statistic Repeat for all values of b Let S* = max{S(b)} Compute null distribution of S* by permuting treatment labels If the data value of S* is significant at 0.01 level, then claim effectiveness of T for a patient subset Compute point and interval estimates of the threshold b

Procedure B S(b)=log likelihood ratio statistic for treatment effect in subset of patients with B  b S*=max{S(0)+R, max{S(b)}} Compute null distribution of T by permuting treatment labels If the data value of T is significant at 0.05 level, then reject null hypothesis that T is ineffective Compute point and interval estimates of the threshold b

Estimation of Threshold

Prostate Cancer Data Covariate# patients with measured covariate Overall Test p value Procedure A Stage 2 p value Procedure B p value AP SG

Prostate Cancer Data Covariate# patients with measured covariate Estimated Threshold 95% CI80% CI AP50536(9,170)(25,108) SG49411(10,13)(11,11)

Sample Size Planning (A) Standard broad eligibility trial is sized for 80% power to detect reduction in hazard D at significance level 5% Biomarker adaptive threshold design is sized for 80% power to detect same reduction in hazard D at significance level 4% for overall analysis

Estimated Power of Broad Eligibility Design (n=386 events) vs Adaptive Design A (n=412 events) 80% power for 30% hazard reduction ModelBroad Eligibility Design Biomarker Adaptive Threshold A 40% reduction in 50% of patients (22% overall reduction) % reduction in 25% of patients (20% overall reduction) % reduction in 10% of patients (14% overall reduction).35.93

Sample Size Planning (B) Estimate power of procedure B relative to standard broad eligibility trial based on Table 1 for the row corresponding to the expected proportion of sensitive patients (  ) and the target hazard ratio for sensitive patients –e.g.  =25% and  =.4 gives RE=.429/.641=.67 When B has power 80%, overall test has power 80*.67=53% Use formula B.2 to determine the approximate number of events needed for overall test to have power 53% for detecting  =.4 limited to  =25% of patients

Events needed to Detect Hazard Ratio  With Proportional Hazards

Events (D’) Needed for Overall Test to Detect Hazard Ratio  Limited to Fraction 

Example Sample Size Planning for Procedure B Design a trial to detect  =0.4 (60% reduction) limited to  =25% of patients –Relative efficiency from Table 1.429/.641=.67 When procedure B has power 80%, standard test has power 80%*.67=53% Formula B.2 gives D’=230 events to have 53% power for overall test and thus approximate 80% power for B Overall test needs D=472 events for 80% power for detecting the diluted treatment effect

Adaptive Signature Design Boris Freidlin and Richard Simon Clinical Cancer Research 11:7872-8, 2005

Adaptive Signature Design End of Trial Analysis Compare T to C for all patients at significance level  overall –If overall H 0 is rejected, then claim effectiveness of T for eligible patients –Otherwise

Otherwise: –Using only the first half of patients accrued during the trial, develop a binary classifier that predicts the subset of patients most likely to benefit from the new treatment T compared to control C –Compare T to C for patients accrued in second stage who are predicted responsive to E based on classifier Perform test at significance level  overall If H 0 is rejected, claim effectiveness of T for subset defined by classifier

True Model

Classifier Development Using data from stage 1 patients, fit all single gene logistic models (j=1,…,M) Select genes with interaction significant at level 

Classification of Stage 2 Patients For i’th stage 2 patient, selected gene j votes to classify patient as preferentially sensitive to T if

Classification of Stage 2 Patients Classify i’th stage 2 patient as differentially sensitive to T relative to C if at least G selected genes vote for differential sensitivity of that patient

Simulation Parameters Gene expression levels of sensitivity genes MVN –mean m, variance v 1 and correlation r in sensitive patients –mean 0, variance v 2 and correlation r in non- sensitive patients Gene expression levels of other genes MVN with mean 0, variance v 0 and correlation r in all patients

Treatment-expression interaction parameters (  *) same for all sensitivity genes  * value scaled (depending on K) so that log odds ratio of treatment effect is 5 for hypothetical patient with sensitivity gene expression levels at their expected values –i.e. m  *K=5 Intercept  scaled for control response rate of 25%

Treatment effect restricted to subset. 10% of patients sensitive, 10 sensitivity genes, 10,000 genes, 400 patients. TestPower Overall.05 level test46.7 Overall.04 level test43.1 Sensitive subset.01 level test (performed only when overall.04 level test is negative) 42.2 Overall adaptive signature design85.3

Treatment effect restricted to subset. 25% of patients sensitive, 10 sensitivity genes, 10,000 genes, 400 patients. TestPower Overall.05 level test99.0 Overall.04 level test98.9 Sensitive subset.01 level test (performed only when overall.04 level test is negative) 99.7 Overall adaptive signature design99.9

Overall treatment effect, no subset effect. 10% of patients sensitive, 10 sensitivity genes, 10,000 genes, 400 patients. TestPower Overall.05 level test74.2 Overall.04 level test70.9 Sensitive subset.01 level test1.0 Overall adaptive signature design70.9

Stronger treatment effect for sensitive subset. 10% of patients sensitive, 10 sensitivity genes, 10,000 genes, 400 patients. TestPower Overall.05 level test97.0 Overall.04 level test96.0 Sensitive subset.01 level test45.6 Overall adaptive signature design97.2

Empirical Power RR for Control Patients 25% Response Rate in Sensitive Subset Overall.05Overall.04Subset.01Overall Adaptive 98% % % % %

Cross-Validated Adaptive Signature Design Wenyu Jiang, Boris Freidlin, Richard Simon

Cross-Validated Adaptive Signature Design End of Trial Analysis Compare T to C for all patients at significance level  overall –If overall H 0 is rejected, then claim effectiveness of T for eligible patients –Otherwise

Otherwise: –Partition the full data set into K parts –Form a training set by omitting one of the K parts. The omitted part is the test set Using the training set, develop a predictive classifier of the subset of patients who benefit preferentially from the new treatment T compared to control C using the methods developed for the ASD Classify the patients in the test set as either sensitive or not sensitive to T relative to C –Repeat this procedure K times, leaving out a different part each time After this is completed, all patients in the full dataset are classified as sensitive or insensitive –Compare T to C for sensitive patients by computing a test statistic S e.g. the difference in response proportions –Generate the null distribution of S by permuting the treatment labels and repeating the entire K-fold cross-validation proceedure –Perform test at significance level  overall –If H 0 is rejected, claim effectiveness of E for subset defined by classifier The sensitive subset is determined by developing a classifier using the full dataset

80% Response to T in Sensitive Patients 25% Response to T or C Otherwise 10% Patients Sensitive ASDCV-ASD Overall 0.05 Test Overall 0.04 Test Sensitive Subset 0.01 Test Overall Power

70% Response to T in Sensitive Patients 25% Response to T or C Otherwise 20% Patients Sensitive ASDCV-ASD Overall 0.05 Test Overall 0.04 Test Sensitive Subset 0.01 Test Overall Power

70% Response to T in Sensitive Patients 25% Response to T or C Otherwise 30% Patients Sensitive ASDCV-ASD Overall 0.05 Test Overall 0.04 Test Sensitive Subset 0.01 Test Overall Power

35% Response to T 25% Response to C No Subset Effect ASDCV-ASD Overall 0.05 Test Overall 0.04 Test Sensitive Subset 0.01 Test Overall Power

25% Response to T 25% Response to C No Subset Effect ASDCV-ASD Overall 0.05 Test Overall 0.04 Test Sensitive Subset 0.01 Test Overall Power