Presentation is loading. Please wait.

Presentation is loading. Please wait.

Steps on the Road to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute

Similar presentations


Presentation on theme: "Steps on the Road to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute"— Presentation transcript:

1 Steps on the Road to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov

2 BRB Website brb.nci.nih.gov Powerpoint presentations Powerpoint presentations Reprints & Technical Reports Reprints & Technical Reports BRB-ArrayTools software BRB-ArrayTools software Web based Sample Size Planning Web based Sample Size Planning Clinical Trials using predictive biomarkers Clinical Trials using predictive biomarkers Development of gene expression based predictive classifiers Development of gene expression based predictive classifiers

3 Many cancer treatments benefit only a minority of patients to whom they are administered Many cancer treatments benefit only a minority of patients to whom they are administered Particularly true for molecularly targeted drugs Particularly true for molecularly targeted drugs Being able to predict which patients are likely to benefit would Being able to predict which patients are likely to benefit would save patients from unnecessary toxicity, and enhance their chance of receiving a drug that helps them save patients from unnecessary toxicity, and enhance their chance of receiving a drug that helps them Help control medical costs Help control medical costs Improve the success rate of clinical drug development Improve the success rate of clinical drug development

4 “Hypertension is not one single entity, neither is schizophrenia. It is likely that we will find 10 if we are lucky, or 50, if we are not very lucky, different disorders masquerading under the umbrella of hypertension. I don’t see how once we have that knowledge, we are not going to use it to genotype individuals and try to tailor therapies, because if they are that different, then they’re likely fundamentally … different problems…” “Hypertension is not one single entity, neither is schizophrenia. It is likely that we will find 10 if we are lucky, or 50, if we are not very lucky, different disorders masquerading under the umbrella of hypertension. I don’t see how once we have that knowledge, we are not going to use it to genotype individuals and try to tailor therapies, because if they are that different, then they’re likely fundamentally … different problems…” George Poste George Poste

5 Biomarkers Prognostic Prognostic Measured before treatment to indicate long-term outcome for patients untreated or receiving standard treatment Measured before treatment to indicate long-term outcome for patients untreated or receiving standard treatment Predictive Predictive Measured before treatment to select good patient candidates for a particular treatment Measured before treatment to select good patient candidates for a particular treatment

6 Prognostic and Predictive Biomarkers in Oncology Single gene or protein measurement Single gene or protein measurement e.g. HER2 protein staining 2+ or 3+ e.g. HER2 protein staining 2+ or 3+ HER2 amplification HER2 amplification KRAS mutation KRAS mutation Scalar index or classifier that summarizes contributions of multiple genes/proteins Scalar index or classifier that summarizes contributions of multiple genes/proteins Empirically determined based on genome-wide correlating gene expression to patient outcome after treatment Empirically determined based on genome-wide correlating gene expression to patient outcome after treatment

7 Prognostic Factors in Oncology Most prognostic factors are not used because they are not therapeutically relevant Most prognostic factors are not used because they are not therapeutically relevant Most prognostic factor studies do not have a clear medical objective Most prognostic factor studies do not have a clear medical objective They use a convenience sample of patients for whom tissue is available. They use a convenience sample of patients for whom tissue is available. Generally the patients are too heterogeneous to support therapeutically relevant conclusions Generally the patients are too heterogeneous to support therapeutically relevant conclusions

8 Prognostic Biomarkers Can be Therapeutically Relevant <10% of node negative ER+ breast cancer patients require or benefit from the cytotoxic chemotherapy that they receive <10% of node negative ER+ breast cancer patients require or benefit from the cytotoxic chemotherapy that they receive OncotypeDx OncotypeDx 21 gene RTPCR assay for FFPE tissue 21 gene RTPCR assay for FFPE tissue

9 Predictive Biomarkers In the past often studied as un-focused post-hoc subset analyses of RCTs. In the past often studied as un-focused post-hoc subset analyses of RCTs. Numerous subsets examined Numerous subsets examined Same data used to define subsets for analysis and for comparing treatments within subsets Same data used to define subsets for analysis and for comparing treatments within subsets No control of type I error No control of type I error

10 Statisticians have taught physicians not to trust subset analysis unless the overall treatment effect is significant Statisticians have taught physicians not to trust subset analysis unless the overall treatment effect is significant This was good advice for post-hoc data dredging subset analysis This was good advice for post-hoc data dredging subset analysis For many molecularly targeted cancer being developed, the subset analysis will be an essential component of the primary analysis and analysis of the subsets will not be contingent on demonstrating that the overall effect is significant For many molecularly targeted cancer being developed, the subset analysis will be an essential component of the primary analysis and analysis of the subsets will not be contingent on demonstrating that the overall effect is significant

11

12 Prospective Co-Development of Drugs and Companion Diagnostics 1. Develop a completely specified genomic classifier of the patients likely to benefit from a new drug 2. Establish analytical validity of the classifier 3. Use the completely specified classifier to design and analyze a new clinical trial to evaluate effectiveness of the new treatment with a pre-defined analysis plan that preserves the overall type-I error of the study.

13 Guiding Principle The data used to develop the classifier must be distinct from the data used to test hypotheses about treatment effect in subsets determined by the classifier The data used to develop the classifier must be distinct from the data used to test hypotheses about treatment effect in subsets determined by the classifier Developmental studies can be exploratory Developmental studies can be exploratory Studies on which treatment effectiveness claims are to be based should be definitive studies that test a treatment hypothesis in a patient population completely pre-specified by the classifier Studies on which treatment effectiveness claims are to be based should be definitive studies that test a treatment hypothesis in a patient population completely pre-specified by the classifier

14 New Drug Developmental Strategy I Restrict entry to the phase III trial based on the binary predictive classifier, i.e. targeted design Restrict entry to the phase III trial based on the binary predictive classifier, i.e. targeted design

15 Using phase II data, develop predictor of response to new drug Develop Predictor of Response to New Drug Patient Predicted Responsive New Drug Control Patient Predicted Non-Responsive Off Study

16 Applicability of Design I Primarily for settings where the classifier is based on a single gene whose protein product is the target of the drug Primarily for settings where the classifier is based on a single gene whose protein product is the target of the drug eg Herceptin eg Herceptin With substantial biological basis for the classifier, it may be unacceptable ethically to expose classifier negative patients to the new drug With substantial biological basis for the classifier, it may be unacceptable ethically to expose classifier negative patients to the new drug

17 Evaluating the Efficiency of Strategy (I) Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006 Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006 Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005 Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005

18 Relative efficiency of targeted design depends on Relative efficiency of targeted design depends on proportion of patients test positive proportion of patients test positive effectiveness of new drug (compared to control) for test negative patients effectiveness of new drug (compared to control) for test negative patients When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients The targeted design may require fewer or more screened patients than the standard design The targeted design may require fewer or more screened patients than the standard design

19 Trastuzumab Herceptin Metastatic breast cancer Metastatic breast cancer 234 randomized patients per arm 234 randomized patients per arm 90% power for 13.5% improvement in 1-year survival over 67% baseline at 2-sided.05 level 90% power for 13.5% improvement in 1-year survival over 67% baseline at 2-sided.05 level If benefit were limited to the 25% test + patients, overall improvement in survival would have been 3.375% If benefit were limited to the 25% test + patients, overall improvement in survival would have been 3.375% 4025 patients/arm would have been required 4025 patients/arm would have been required

20 Web Based Software for Comparing Sample Size Requirements http://brb.nci.nih.gov http://brb.nci.nih.gov

21

22

23

24

25

26 Developmental Strategy (II) Develop Predictor of Response to New Rx Predicted Non- responsive to New Rx Predicted Responsive To New Rx Control New RXControl New RX

27 Developmental Strategy (II) Do not use the test to restrict eligibility, but to structure a prospective analysis plan Do not use the test to restrict eligibility, but to structure a prospective analysis plan Having a prospective analysis plan is essential Having a prospective analysis plan is essential “Stratifying” (balancing) the randomization is useful to ensure that all randomized patients have tissue available but is not a substitute for a prospective analysis plan “Stratifying” (balancing) the randomization is useful to ensure that all randomized patients have tissue available but is not a substitute for a prospective analysis plan The purpose of the study is to evaluate the new treatment overall and for the pre-defined subsets; not to modify or refine the classifier The purpose of the study is to evaluate the new treatment overall and for the pre-defined subsets; not to modify or refine the classifier The purpose is not to demonstrate that repeating the classifier development process on independent data results in the same classifier The purpose is not to demonstrate that repeating the classifier development process on independent data results in the same classifier

28 Analysis Plan A Compare the new drug to the control for classifier positive patients Compare the new drug to the control for classifier positive patients If p + >0.05 make no claim of effectiveness If p + >0.05 make no claim of effectiveness If p +  0.05 claim effectiveness for the classifier positive patients and If p +  0.05 claim effectiveness for the classifier positive patients and Compare new drug to control for classifier negative patients using 0.05 threshold of significance Compare new drug to control for classifier negative patients using 0.05 threshold of significance

29 Analysis Plan B (Limited confidence in test) Compare the new drug to the control overall for all patients ignoring the classifier. Compare the new drug to the control overall for all patients ignoring the classifier. If p overall  0.03 claim effectiveness for the eligible population as a whole If p overall  0.03 claim effectiveness for the eligible population as a whole Otherwise perform a single subset analysis evaluating the new drug in the classifier + patients Otherwise perform a single subset analysis evaluating the new drug in the classifier + patients If p subset  0.02 claim effectiveness for the classifier + patients. If p subset  0.02 claim effectiveness for the classifier + patients.

30 Analysis Plan C Test for difference (interaction) between treatment effect in test positive patients and treatment effect in test negative patients Test for difference (interaction) between treatment effect in test positive patients and treatment effect in test negative patients If interaction is significant at level  int then compare treatments separately for test positive patients and test negative patients If interaction is significant at level  int then compare treatments separately for test positive patients and test negative patients Otherwise, compare treatments overall Otherwise, compare treatments overall

31 Sample Size Planning for Analysis Plan C 88 events in test + patients needed to detect 50% reduction in hazard at 5% two-sided significance level with 90% power 88 events in test + patients needed to detect 50% reduction in hazard at 5% two-sided significance level with 90% power If 25% of patients are positive, when there are 88 events in positive patients there will be about 264 events in negative patients If 25% of patients are positive, when there are 88 events in positive patients there will be about 264 events in negative patients 264 events provides 90% power for detecting 33% reduction in hazard at 5% two-sided significance level 264 events provides 90% power for detecting 33% reduction in hazard at 5% two-sided significance level

32

33

34

35 Biomarker Adaptive Threshold Design Wenyu Jiang, Boris Freidlin & Richard Simon JNCI 99:1036-43, 2007

36 Biomarker Adaptive Threshold Design Randomized phase III trial comparing new treatment E to control C Randomized phase III trial comparing new treatment E to control C Survival or DFS endpoint Survival or DFS endpoint

37 Biomarker Adaptive Threshold Design Have identified a predictive index B thought to be predictive of patients likely to benefit from E relative to C Have identified a predictive index B thought to be predictive of patients likely to benefit from E relative to C Eligibility not restricted by biomarker Eligibility not restricted by biomarker No threshold for biomarker determined No threshold for biomarker determined

38 Analysis Plan S(b)=log likelihood ratio statistic for treatment versus control comparison in subset of patients with B  b S(b)=log likelihood ratio statistic for treatment versus control comparison in subset of patients with B  b Compute S(b) for all possible threshold values Compute S(b) for all possible threshold values Determine T=max{S(b)} Determine T=max{S(b)} Compute null distribution of T by permuting treatment labels Compute null distribution of T by permuting treatment labels Permute the labels of which patients are in which treatment group Permute the labels of which patients are in which treatment group Re-analyze to determine T for permuted data Re-analyze to determine T for permuted data Repeat for 10,000 permutations Repeat for 10,000 permutations Compute point and bootstrap confidence interval estimates of the threshold b Compute point and bootstrap confidence interval estimates of the threshold b

39 DNA Microarray Technology Powerful tool for understanding mechanisms and enabling predictive medicine Powerful tool for understanding mechanisms and enabling predictive medicine Challenges the ability of biomedical scientists to analyze data Challenges the ability of biomedical scientists to analyze data Challenges statisticians with new problems for which existing analysis paradigms are often inapplicable Challenges statisticians with new problems for which existing analysis paradigms are often inapplicable Excessive hype and skepticism Excessive hype and skepticism

40 Good microarray studies have clear objectives, but not generally gene specific mechanistic hypotheses Good microarray studies have clear objectives, but not generally gene specific mechanistic hypotheses Design and analysis methods should be tailored to study objectives Design and analysis methods should be tailored to study objectives

41 Class Prediction Predict which tumors will respond to a particular treatment Predict which tumors will respond to a particular treatment Predict survival or relapse-free survival risk group Predict survival or relapse-free survival risk group

42 Class Prediction ≠ Class Comparison Prediction is not Inference The criteria for gene selection for class prediction and for class comparison are different The criteria for gene selection for class prediction and for class comparison are different For class comparison false discovery rate is important For class comparison false discovery rate is important For class prediction, predictive accuracy is important For class prediction, predictive accuracy is important Most statistical methods were not developed for p>>n prediction problems Most statistical methods were not developed for p>>n prediction problems

43 Evaluating a Classifier “Prediction is difficult, especially the future.” “Prediction is difficult, especially the future.” Neils Bohr Neils Bohr But easier than “understanding” But easier than “understanding”

44 Validating a Predictive Classifier Goodness of fit is no evidence of prediction accuracy for independent data Goodness of fit is no evidence of prediction accuracy for independent data Demonstrating statistical significance of prognostic factors is not the same as demonstrating predictive accuracy Demonstrating statistical significance of prognostic factors is not the same as demonstrating predictive accuracy Demonstrating stability of selected genes is not demonstrating predictive accuracy of a model for independent data Demonstrating stability of selected genes is not demonstrating predictive accuracy of a model for independent data

45 Types of Validation for Prognostic and Predictive Biomarkers Analytical validation Analytical validation When there is a gold standard When there is a gold standard Sensitivity, specificity Sensitivity, specificity No gold standard No gold standard Reproducibility and robustness Reproducibility and robustness Clinical validation Clinical validation Does the biomarker predict what it’s supposed to predict for independent data Does the biomarker predict what it’s supposed to predict for independent data Clinical utility Clinical utility Does use of the biomarker result in patient benefit Does use of the biomarker result in patient benefit Depends on available treatments and practice standards Depends on available treatments and practice standards

46 Internal Clinical Validation of a Predictive Classifier Split sample validation Split sample validation Training-set Training-set Used to select features, select model type, fit all parameters including cut-off thresholds and tuning parameters Used to select features, select model type, fit all parameters including cut-off thresholds and tuning parameters Test set Test set Count errors for single completely pre-specified model Count errors for single completely pre-specified model Cross-validation Cross-validation Omit one sample Omit one sample Build completely specified classifier from scratch in the training set of n-1 samples Build completely specified classifier from scratch in the training set of n-1 samples Classify the omitted sample Classify the omitted sample Repeat Repeat Total number of classification errors Total number of classification errors

47 Cross validation is only valid if the test set is not used in any way in the development of the model. Using the complete set of samples to select genes violates this assumption and invalidates cross-validation Cross validation is only valid if the test set is not used in any way in the development of the model. Using the complete set of samples to select genes violates this assumption and invalidates cross-validation The cross-validated estimate of misclassification error is an estimate of the prediction error for model fit using specified algorithm to full dataset The cross-validated estimate of misclassification error is an estimate of the prediction error for model fit using specified algorithm to full dataset

48

49 Sample Size Planning References K Dobbin, R Simon. Sample size determination in microarray experiments for class comparison and prognostic classification. Biostatistics 6:27, 2005 K Dobbin, R Simon. Sample size determination in microarray experiments for class comparison and prognostic classification. Biostatistics 6:27, 2005 K Dobbin, R Simon. Sample size planning for developing classifiers using high dimensional DNA microarray data. Biostatistics 8:101, 2007 K Dobbin, R Simon. Sample size planning for developing classifiers using high dimensional DNA microarray data. Biostatistics 8:101, 2007 K Dobbin, Y Zhao, R Simon. How large a training set is needed to develop a classifier for microarray data? Clinical Cancer Res 14:108, 2008 K Dobbin, Y Zhao, R Simon. How large a training set is needed to develop a classifier for microarray data? Clinical Cancer Res 14:108, 2008

50 Sample Size Planning for Classifier Development The expected value (over training sets) of the probability of correct classification PCC(n) should be within  of the maximum achievable PCC(  ) The expected value (over training sets) of the probability of correct classification PCC(n) should be within  of the maximum achievable PCC(  )

51 Sample size as a function of effect size (log-base 2 fold-change between classes divided by standard deviation). Two different tolerances shown,. Each class is equally represented in the population. 22000 genes on an array.

52 BRB-ArrayTools Architect – R Simon Architect – R Simon Developer – Emmes Corporation Developer – Emmes Corporation Contains wide range of analysis tools that I have selected Contains wide range of analysis tools that I have selected Designed for use by biomedical scientists Designed for use by biomedical scientists Imports data from all gene expression and copy-number platforms Imports data from all gene expression and copy-number platforms Automated import of data from NCBI Gene Express Omnibus Automated import of data from NCBI Gene Express Omnibus Highly computationally efficient Highly computationally efficient Extensive annotations for identified genes Extensive annotations for identified genes Integrated analysis of expression data, copy number data, pathway data and data other biological data Integrated analysis of expression data, copy number data, pathway data and data other biological data

53 Predictive Classifiers in BRB-ArrayTools Classifiers Classifiers Diagonal linear discriminant Diagonal linear discriminant Compound covariate Compound covariate Bayesian compound covariate Bayesian compound covariate Support vector machine with inner product kernel Support vector machine with inner product kernel K-nearest neighbor K-nearest neighbor Nearest centroid Nearest centroid Shrunken centroid (PAM) Shrunken centroid (PAM) Random forrest Random forrest Tree of binary classifiers for k-classes Tree of binary classifiers for k-classes Survival risk-group Survival risk-group Supervised pc’s Supervised pc’s With clinical covariates With clinical covariates Cross-validated K-M curves Cross-validated K-M curves Predict quantitative trait Predict quantitative trait LARS, LASSO LARS, LASSO Feature selection options Univariate t/F statistic Hierarchical random variance model Restricted by fold effect Univariate classification power Recursive feature elimination Top-scoring pairs Validation methods Split-sample LOOCV Repeated k-fold CV.632+ bootstrap Permutational statistical significance

54 Cross-validated Kaplan-Meier curves for risk groups using 50th percentile cut-off GENE MODEL COVARIATES MODEL COMBINED MODEL DISTANT EVENT FREE SURVIVAL

55 BRB-ArrayTools July 2008 8934 Registered users 8934 Registered users 68 Countries 68 Countries 616 Citations 616 Citations 19,628 hits/month to website 19,628 hits/month to website Registered users Registered users 4655 in US 4655 in US 898 at NIH 898 at NIH 387 at NCI 387 at NCI 2994 US EDU 2994 US EDU 1161 US Gov (non NIH) 1161 US Gov (non NIH) 4655 Non US 4655 Non US

56 Countries With Most BRB ArrayTools Registered Users Germany 292 Germany 292 France 289 France 289 Canada 287 Canada 287 UK 278 UK 278 Italy 250 Italy 250 China 241 China 241 Netherlands 240 Netherlands 240 Taiwan 222 Taiwan 222 Korea 192 Korea 192 Japan 187 Japan 187 Spain 168 Spain 168 Australia 155 India 139 Belgium 103 New Zeland 63 Brazil 54 Singapore 53 Denmark 52 Sweden 50 Israel 45

57 Conclusions New technology makes it increasingly feasible to identify which patients are most likely to benefit from a specified treatment New technology makes it increasingly feasible to identify which patients are most likely to benefit from a specified treatment Predictive oncology is feasible based on genomic characterization of a patient’s tumor Predictive oncology is feasible based on genomic characterization of a patient’s tumor Targeting treatment can provide Targeting treatment can provide Patient benefit Patient benefit Economic benefit for society Economic benefit for society Improved chance of success for new drug development Improved chance of success for new drug development Not necessarily simpler or less expensive development Not necessarily simpler or less expensive development

58 Conclusions Achieving the potential of new technology requires paradigm changes in focus and methods of “correlative science.” Achieving the potential of new technology requires paradigm changes in focus and methods of “correlative science.” Effective interdisciplinary research requires increased emphasis on cross education of laboratory, clinical and statistical/computational scientists Effective interdisciplinary research requires increased emphasis on cross education of laboratory, clinical and statistical/computational scientists

59 Acknowledgements BRB Senior Staff BRB Senior Staff Boris Freidlin Boris Freidlin Ed Korn Ed Korn Lisa McShane Lisa McShane Joanna Shih Joanna Shih George Wright George Wright Yingdong Zhao, Yingdong Zhao, Post-docs Post-docs Kevin Dobbin Kevin Dobbin Alain Dupuy Alain Dupuy Wenyu Jiang Wenyu Jiang Aboubakar Maitournam Aboubakar Maitournam Annette Molinaro Annette Molinaro Michael Radmacher Michael Radmacher BRB-ArrayTools Development Team BRB-ArrayTools Development Team


Download ppt "Steps on the Road to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute"

Similar presentations


Ads by Google