Design & Analysis of Phase III Trials for Predictive Oncology Richard Simon Chief, Biometric Research Branch National Cancer Institute
How can therapeutics development be successful if tumors contain dozens, hundreds or thousands of mutations, with substantial intra-tumor heterogeneity? How should we modify our paradigms for clinical development in light of inter and intra tumor genomic heterogeneity? Closing comments on translational research
Long History of Multi-hit Models of Oncogenesis Armitage & Doll Knudson Moolgavkar Loeb Tomlinson & Bodmer Simon & Zheng Novack & Michor others
Data Age-incidence curves of human tumors by primary site Steps of oncogenesis in model systems Sequencing of human tumors
My Synthesis of the Models A small number (e.g. 2-4) of rate limiting events, occurring approximately at normal mammalian mutation rates (10 -9 /base pair /cell division) establish a tumor or 10 7 or more cells Models based on 2-4 events occurring at normal mammalian mutation rates account for observed age- incidence curves of carcinomas of many primary sites where they have been evaluated. The initiated tumor then accumulates additional mutations, some of which may be important to the tumor phenotype, but which are not rate-limiting to the development of an invasive, metastatic tumor –i.e. the initial mutations put in place a process which inevitably leads, over time, to cancers containing numerous additional mutations
Even at normal mutation rates, by the time that there are 10 9 clonogenic tumor cells, every possible base mutation will occur in some cell with each round of cell division Mutator phenotypes can accelerate the process and are presumably important in some cases since genes with key functions for ensuring DNA and chromosome fidelity are commonly mutated
Mutational complexity of the tumor at diagnosis is influenced by tumor age (number of generations of replication) –“Old tumors” are more mutationally complex Treatment effectiveness depends on mutational age at time of treatment – high growth fraction” tumors like pediatric ALL, DLBCL, Burkitt’s lymphoma, germ cell are relatively young tumors
Success, Where Possible, Likely Requires Inhibiting pathways deregulated by early oncogenic mutations Using combinations of molecularly targeted drugs Treating early –Before mutational meltdown Treating the right tumors with the right drugs
House of Cards Model P. Workman “The tumor requires each of the initial oncogenic mutations to power up malignancy; remove any one of the molecular batteries and the cancer cell collapses like a house of cards.”
Oncogene Addiction Model B. Weinstein “Subsequent mutations are viable only in the context of the initial oncogenic mutations. The initial mutations lead to the ‘hard-wiring’ of mission critical oncogenic pathways and the loss of alternative or redundant signal transduction pathways.”
Barn Door Model The initial oncogenic mutations facilitate the acquisition of numerous additional mutations. Once the additional mutations occur, the protein products of the initial oncogenic mutations are no longer key molecular targets because alternative pathways to expansion and invasion have been activated.
Phase II Trials Find or evaluate predictive biomarkers for identifying patients whose tumors are sensitive to the regimen –Patients tumors should be molecularly characterized –If it works at all, it’s likely to work only in some patients but may work very well for them Combinations of molecularly targeted agents Screening multiple combinations of targeted agents in molecularly defined subsets of patients
Phase III Trials Transition from a culture of broad eligibility phase III trials followed by exploratory subset analysis to targeted phase III trials or trials incorporating focused prospectively defined subset analysis in the primary analysis plan
Roadmap for Co-Development of New Drugs with Companion Diagnostics 1.Develop during phase II a completely specified genomic classifier of the patients likely to benefit from a new drug Single gene/protein or composite gene expression classifier 2.Develop an analytic validated assay (reproducibe and robust) for the classifier 3.Use the completely specified classifier to design and analyze a phase III clinical trial to evaluate effectiveness of the new treatment with a pre-defined analysis plan.
Targeted (Enrichment) Design Restrict entry to the phase III trial based on the binary predictive classifier
Using phase II data, develop predictor of response to new drug Develop Predictor of Response to New Drug Patient Predicted Responsive New Drug Control Patient Predicted Non-Responsive Off Study
Applicability of Targeted Design Primarily for settings where the drug effect is specific, the biology of the target is well understood, and an accurate assay is available Advantage of design is that the target population is clear and trial clearly must be sized for the test+ patients With a strong biological basis for the test and a drug with potentially serious toxicity, it may be unacceptable to expose test negative patients to the drug Analytical validation, biological rationale and phase II data provide basis for regulatory approval of the test, if needed
Relative efficiency of targeted design depends on –proportion of patients test positive –effectiveness of new drug (compared to control) for test negative patients When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients
Biomarker Stratified Design Develop Predictor of Response to New Rx Predicted Non- responsive to New Rx Predicted Responsive To New Rx Control New RXControl New RX
Biomarker Stratified Design Do not use the diagnostic to restrict eligibility, but to structure a prospective analysis plan Having a prospective analysis plan for how the test will be used in the analysis and having the trial appropriately sized are essential “Stratifying” (balancing) the randomization ensures that all randomized patients have tissue available but is not a substitute for a prospective analysis plan –Delaying assay performance provides additional time for assay development but inhibits early termination of accrual of assay negative patients The purpose of the study is to evaluate the new treatment overall and for the pre-defined subsets
R Simon. Using genomics in clinical trial design, Clinical Cancer Research 14: , 2008
Fallback Analysis Plan (Limited confidence in test) Compare the new drug to the control overall for all patients ignoring the classifier. –If p overall 0.03 claim effectiveness for the eligible population as a whole Otherwise perform a single subset analysis evaluating the new drug in the classifier + patients –If p subset 0.02 claim effectiveness for the classifier + patients.
Analysis Plan with K Binary Classifiers Test T vs C restricted to patients positive for B k for k=1,…,K –Let p k be the p value for treatment effect effect in patients positive for B k (k=1,…,K) Let p* = min {p 1, …, p K } Compute null distribution of p* by permuting treatment labels If the data value of p* is significant at the 0.02 level, then claim effectiveness of T for patients positive for B k*
Marker Strategy Design Randomize Perform test and employ test determined treatment Standard of care treatment
Marker Strategy Design Randomize Perform test and employ test determined rx Randomize TC
Phase III RCT of new regimen T vs control C Multiple candidate predictive biomarkers or whole genome expression profiling Prospectively specified classifier development algorithm
Partition the patients into K (e.g. 5 or 10) groups V 1, V 2, …, V K Form a training set by omitting one of the K parts T 1 ={1,2,…,N} - V 1 The omitted part V 1 is the validation set –Using the training set, apply the prospectively defined classifier development algorithm to develop a model that classifies patients (based on their measured covariates and biomarkers) as either Sensitive: Likely to benefit from T more than control C Not Sensitive: not likely to benefit from T more than C –Using this model, classify the patients in the test set
Repeat this procedure K times, leaving out a different part each time –After this is completed, all patients in the full dataset are classified as sensitive or insensitive –All patients have been classified using a classifier developed on a training set that did not include them
Identify the “sensitive” subset i.e. those predicted as likely to benefit more from T than from C. Also identify the remaining “insensitive” subset. Sensitive subset analysis –Compare outcomes of patients who received T to outcomes of patients who received C Compute Kaplan-Meier curves of T vs C and log-rank test statistic L S Insensitive subset analysis –Compare outcomes of patients who received T to outcomes of patients who received C Compute Kaplan-Meier curves of T vs C and log-rank test statistic L IS
Generate the null distributions of L S and L IS by permuting the treatment labels and repeating the entire K-fold cross- validation If significant, claim effectiveness of T for subset defined by classifier
Two-Treatment Classifier Development Algorithm for Binary Endpoint Develop models in training set of the probability of “success” for a patient based on the covariate vector x –Separate models for treatment group T and control group C –P(X | T) and P(X | C) Many kinds of model development algorithms can be used If P(X | T) – P(X | C) > delta –Classify patient in validation set with covariate vector X as likely to benefit more from T than C –Otherwise, classify patient as not likely to benefit more from T
70% Response to T in Sensitive Patients 25% Response to T Otherwise 25% Response to C 20% Patients Sensitive ASDCV-ASD Overall 0.05 Test Overall 0.04 Test Sensitive Subset 0.01 Test Overall Power
Classifier for future use is determined by applying the classification development algorithm to the full dataset
Prediction Based Clinical Trials Using cross-validation we can perform prospective “subset analysis” as part of the primary analysis Using a prospectively defined model building algorithm we can internally validate the treatment comparison predictions of the model Using cross-validation we can evaluate new predictive tools –Based on predictive accuracy –With regard to their intended use which is informing therapeutic decision making
Final Comments on Translational Research
“ Translational research” is in many cases a misnomer Many basic research findings do not go far enough to be “translated” –do not provide key drug-able molecular targets. P53, Rb, APC
When the gap is relatively narrow, effective translation takes place, often by industry (large or small) Broad gaps are in many cases too difficult and high risk to bridge by industry or by investigator initiated research
Breakthrough Around the Corner
Bridging broad gaps may in some cases be accomplished by prioritization and resource mobilization –Penicillin development languished for over a decade until it was stimulated by targeted funding from Rockefeller Foundation and a major project commitment by US govt with over 1000 chemists involved –The atomic bomb would not have been developed without a Manhattan project Major focused initiatives involving academic investigators, industry, and government may be needed for bridging key roadblocks to progress.
Acknowledgements Boris Freidlin Wenyu Jian Xinan Zhang