Development and Validation of Predictive Classifiers using Gene Expression Profiles Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.

Slides:



Advertisements
Similar presentations
Publications Reviewed Searched Medline Hand screening of abstracts & papers Original study on human cancer patients Published in English before December.
Advertisements

Relating Gene Expression to a Phenotype and External Biological Information Richard Simon, D.Sc. Chief, Biometric Research Branch, NCI
Transforming Correlative Science to Predictive Personalized Medicine Richard Simon, D.Sc. National Cancer Institute
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Clinical Trial Designs for the Evaluation of Prognostic & Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
Model generalization Test error Bias, variance and complexity
Targeted (Enrichment) Design. Prospective Co-Development of Drugs and Companion Diagnostics 1. Develop a completely specified genomic classifier of the.
Myths and Statistical Principles in DNA Microarray Research Richard Simon, D.Sc. Chief, Biometric Research Branch Head, Molecular Statistics & Bioinformatics.
Moving from Correlative Science to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Evaluation.
Predictive Classifiers Based on High Dimensional Data Development & Use in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Richard Simon, D.Sc. Chief, Biometric Research Branch
Richard Simon, D.Sc. Chief, Biometric Research Branch
3 rd Summer School in Computational Biology September 10, 2014 Frank Emmert-Streib & Salissou Moutari Computational Biology and Machine Learning Laboratory.
Moving from Correlative Studies to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute brb.nci.nih.gov.
Statistical Challenges for Predictive Onclogy Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Predictive Analysis of Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Topics in the Development and Validation of Gene Expression Profiling Based Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch.
Re-Examination of the Design of Early Clinical Trials for Molecularly Targeted Drugs Richard Simon, D.Sc. National Cancer Institute linus.nci.nih.gov/brb.
Opportunity and Pitfalls in Cancer Prediction, Prognosis and Prevention Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Ensemble Learning (2), Tree and Forest
Use of Genomics in Clinical Trial Design and How to Critically Evaluate Claims for Prognostic & Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric.
Thoughts on Biomarker Discovery and Validation Karla Ballman, Ph.D. Division of Biostatistics October 29, 2007.
Predictive Biomarkers and Their Use in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Development and Validation of Prognostic Classifiers using High Dimensional Data Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
Some Statistical Aspects of Predictive Medicine
Gene expression profiling identifies molecular subtypes of gliomas
Simple Linear Regression
Validation of Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Classification (Supervised Clustering) Naomi Altman Nov '06.
Gene Expression Profiling Illustrated Using BRB-ArrayTools.
Analysis of Molecular and Clinical Data at PolyomX Adrian Driga 1, Kathryn Graham 1, 2, Sambasivarao Damaraju 1, 2, Jennifer Listgarten 3, Russ Greiner.
Molecular Diagnosis Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
Use of Prognostic & Predictive Genomic Biomarkers in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Chapter 9 – Classification and Regression Trees
Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Statistical Aspects of the Development and Validation of Predictive Classifiers for High Dimensional Data Richard Simon, D.Sc. Chief, Biometric Research.
The Broad Institute of MIT and Harvard Classification / Prediction.
Steps on the Road to Predictive Oncology Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
1 Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting Authors: A. Dupuy and R.M. Simon.
Gene Expression Profiling. Good Microarray Studies Have Clear Objectives Class Comparison (gene finding) –Find genes whose expression differs among predetermined.
Use of Candidate Predictive Biomarkers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
The Use of Predictive Biomarkers in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Steps on the Road to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Adaptive Designs for Using Predictive Biomarkers in Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Using Predictive Classifiers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
New Approaches to Clinical Trial Design Development of New Drugs & Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Cluster validation Integration ICES Bioinformatics.
Introduction to Design of Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Steps on the Road to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Moving From Correlative Science to Predictive Medicine Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Design & Analysis of Microarray Studies for Diagnostic & Prognostic Classification Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
 Adaptive Enrichment Designs for Confirmatory Clinical Trials Specifying the Intended Use Population and Estimating the Treatment Effect Richard Simon,
PCA, Clustering and Classification by Agnieszka S. Juncker
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
Presentation transcript:

Development and Validation of Predictive Classifiers using Gene Expression Profiles Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute

BRB Website brb.nci.nih.gov Powerpoint presentations and audio files Reprints & Technical Reports BRB-ArrayTools software BRB-ArrayTools Data Archive –100+ published cancer gene expression datasets with clinical annotations Sample Size Planning for Clinical Trials with Predictive Biomarkers

Types of Clinical Outcome Survival or disease-free survival Response to therapy

90 publications identified that met criteria –Abstracted information for all 90 Performed detailed review of statistical analysis for the 42 papers published in 2004

Major Flaws Found in 40 Studies Published in 2004 Inadequate control of multiple comparisons in gene finding –9/23 studies had unclear or inadequate methods to deal with false positives 10,000 genes x.05 significance level = 500 false positives Misleading report of prediction accuracy –12/28 reports based on incomplete cross-validation Misleading use of cluster analysis –13/28 studies invalidly claimed that expression clusters based on differentially expressed genes could help distinguish clinical outcomes 50% of studies contained one or more major flaws

Kinds of Biomarkers Surrogate endpoint –Pre & post rx, early measure of clinical outcome Pharmacodynamic –Pre & post rx, measures an effect of rx on disease Prognostic –Which patients need rx Predictive –Which patients are likely to benefit from a specific rx Product characterization

Cardiac Arrhythmia Supression Trial Ventricular premature beats was proposed as a surrogate for survival Antiarrythmic drugs supressed ventricular premature beats but killed patients at approximately 2.5 times that of placebo

Prognostic Biomarkers Most prognostic factors are not used because they are not therapeutically relevant Most prognostic factor studies are poorly designed –They are not focused on a clear therapeutically relevant objective –They use a convenience sample of patients for whom tissue is available. Generally the patients are too heterogeneous to support therapeutically relevant conclusions –They address statistical significance rather than predictive accuracy relative to standard prognostic factors

Pusztai et al. The Oncologist 8:252-8, articles on “prognostic markers” or “prognostic factors” in breast cancer in past 20 years ASCO guidelines only recommend routine testing for ER, PR and HER-2 in breast cancer “With the exception of ER or progesterone receptor expression and HER-2 gene amplification, there are no clinically useful molecular predictors of response to any form of anticancer therapy.”

Prognostic and Predictive Classifiers Most cancer treatments benefit only a minority of patients to whom they are administered –Particularly true for molecularly targeted drugs Being able to predict which patients are likely to benefit would –save patients from unnecessary toxicity, and enhance their chance of receiving a drug that helps them –Help control medical costs –Improve the success rate of clinical drug development

Molecularly targeted drugs may benefit a relatively small population of patients with a given primary site/stage of disease –Iressa –Herceptin

Prognostic Biomarkers Can be Therapeutically Relevant 3-5% of node negative ER+ breast cancer patients require or benefit from systemic rx other than endocrine rx Prognostic biomarker development should focus on specific therapeutic decision context

B-14 Results—Relapse-Free Survival 338 pts 149 pts 181 pts p< Paik et al, SABCS 2003

Key Features of OncotypeDx Development Identification of important therapeutic decision context Prognostic marker development was based on patients with node negative ER positive breast cancer receiving tamoxifen as only systemic treatment –Use of patients in NSABP clinical trials Staged development and validation –Separation of data used for test development from data used for test validation Development of robust assay with rigorous analytical validation –21 gene RTPCR assay for FFPE tissue –Quality assurance by single reference laboratory operation

Predictive Biomarkers Cancers of a primary site are often a heterogeneous grouping of diverse molecular diseases The molecular diseases vary enormously in their responsiveness to a given treatment It is feasible (but difficult) to develop prognostic markers that identify which patients need systemic treatment and which have tumors likely to respond to a given treatment –e.g. breast cancer and ER/PR, Her2

Mutations Copy number changes Translocations Expression profile Treatment

DNA Microarray Technology Powerful tool for understanding mechanisms and enabling predictive medicine Challenges ability of biomedical scientists to use effectively to produce biological knowledge or clinical utility Challenges statisticians with new problems for which existing analysis paradigms are often inapplicable Excessive hype and skepticism

Myth That microarray investigations should be unstructured data-mining adventures without clear objectives

Good microarray studies have clear objectives, but not generally gene specific mechanistic hypotheses Design and analysis methods should be tailored to study objectives

Good Microarray Studies Have Clear Objectives Class Comparison –Find genes whose expression differs among predetermined classes –Fing genes whose expression varies over a time course in response to a defined stimulus Class Prediction –Prediction of predetermined class (phenotype) using information from gene expression profile –Survival risk group prediction Class Discovery –Discover clusters of specimens having similar expression profiles –Discover clusters of genes having similar expression profiles

Class Comparison and Class Prediction Not clustering problems –Global similarity measures generally used for clustering arrays may not distinguish classes –Don’t control multiplicity or for distinguishing data used for classifier development from data used for classifier evaluation Supervised methods Requires multiple biological samples from each class

Levels of Replication Technical replicates –RNA sample divided into multiple aliquots and re- arrayed Biological replicates –Multiple subjects –Replication of the tissue culture experiment

Biological conclusions generally require independent biological replicates. The power of statistical methods for microarray data depends on the number of biological replicates. Technical replicates are useful insurance to ensure that at least one good quality array of each specimen will be obtained.

Class Prediction Predict which tumors will respond to a particular treatment Predict which patients will relapse after a particular treatment

Microarray Platforms for Developing Predictive Classifiers Single label arrays –Affymetrix GeneChips Dual label arrays using common reference design –Dye swaps are unnecessary

Common Reference Design A1A1 R A2A2 B1B1 B2B2 RRR RED GREEN Array 1Array 2Array 3Array 4 A i = ith specimen from class A R = aliquot from reference pool B i = ith specimen from class B

The reference generally serves to control variation in the size of corresponding spots on different arrays and variation in sample distribution over the slide. The reference provides a relative measure of expression for a given gene in a given sample that is less variable than an absolute measure. The reference is not the object of comparison. The relative measure of expression will be compared among biologically independent samples from different classes.

Class Prediction A set of genes is not a classifier Testing whether analysis of independent data results in selection of the same set of genes is not an appropriate test of predictive accuracy of a classifier

Components of Class Prediction Feature (gene) selection –Which genes will be included in the model Select model type –E.g. Diagonal linear discriminant analysis, Nearest-Neighbor, … Fitting parameters (regression coefficients) for model –Selecting value of tuning parameters Estimating prediction accuracy

Class Prediction ≠ Class Comparison The criteria for gene selection for class prediction and for class comparison are different –For class comparison false discovery rate is important –For class prediction, predictive accuracy is important Demonstrating statistical significance of prognostic factors is not the same as demonstrating predictive accuracy. Statisticians are used to inference, not prediction Most statistical methods were not developed for p>>n prediction problems

Myth Complex classification algorithms such as neural networks perform better than simpler methods for class prediction.

Simple Gene Selection Select genes that are differentially expressed among the classes at a significance level  (e.g. 0.01) –The  level is a tuning parameter –For class comparison false discovery rate is important –For class prediction, predictive accuracy is important –For prediction it is usually more serious to exclude an informative variable than to include some noise variables

Optimal significance level cutoffs for gene selection. 50 differentially expressed genes out of 22,000 on n arrays 2δ/σ standardized difference n=10n=30n=

Complex Gene Selection Small subset of genes which together give most accurate predictions –Genetic algorithms Little evidence that complex feature selection is useful in microarray problems –Failure to compare to simpler methods –Improper use of cross-validation

Linear Classifiers for Two Classes

Fisher linear discriminant analysis Diagonal linear discriminant analysis (DLDA) assumes features are uncorrelated Compound covariate predictor (Radmacher) Golub’s weighted voting method Support vector machines with inner product kernel Perceptron

Fisher LDA

The Compound Covariate Predictor (CCP) Motivated by J. Tukey, Controlled Clinical Trials, 1993 A compound covariate is built from the basic covariates (log- ratios) t j is the two-sample t-statistic for gene j. x ij is the log-expression measure of sample i for gene j. Sum is over selected genes. Threshold of classification: midpoint of the CCP means for the two classes.

Linear Classifiers for Two Classes Compound covariate predictor Instead of for DLDA

Support Vector Machine

Perceptrons Perceptrons are neural networks with no hidden layer and linear transfer functions between input output –Number of input nodes equals number of genes selected –Number of output nodes equals number of classes minus 1 –Number of inputs may be major principal components of genes or major principal components of informative genes Perceptrons are linear classifiers

Other Simple Methods Nearest neighbor classification Nearest k-neighbors Nearest centroid classification Shrunken centroid classification

Nearest Neighbor Classifier To classify a sample in the validation set as being in outcome class 1 or outcome class 2, determine which sample in the training set it’s gene expression profile is most similar to. –Similarity measure used is based on genes selected as being univariately differentially expressed between the classes –Correlation similarity or Euclidean distance generally used Classify the sample as being in the same class as it’s nearest neighbor in the training set

When p>>n It is always possible to find a set of features and a weight vector for which the classification error on the training set is zero. Why consider more complex models?

Artificial intelligence sells to journal reviewers and peers who cannot distinguish hype from substance when it comes to microarray data analysis. Comparative studies generally indicate that simpler methods work as well or better for microarray problems because they avoid overfitting the data.

Other Methods Top-scoring pairs CART Random Forrest

Apparent Dimension Reduction Based Methods Principal component regression Supervised principal component regression Partial least squares Stepwise logistic regression

When There Are More Than 2 Classes Nearest neighbor type methods Decision tree of binary classifiers

Decision Tree of Binary Classifiers Partition the set of classes {1,2,…,K} into two disjoint subsets S 1 and S 2 –e.g. S 1 ={1}, S 2 ={2,3,4} –Develop a binary classifier for distinguishing the composite classes S 1 and S 2 Compute the cross-validated classification error for distinguishing S 1 and S 2 Repeat the above steps for all possible partitions in order to find the partition S 1 and S 2 for which the cross-validated classification error is minimized If S 1 and S 2 are not singleton sets, then repeat all of the above steps separately for the classes in S 1 and S 2 to optimally partition each of them

Evaluating a Classifier “Prediction is difficult, especially the future.” –Neils Bohr

Validating a Predictive Classifier Fit of a model to the same data used to develop it is no evidence of prediction accuracy for independent data –Goodness of fit is not prediction accuracy Demonstrating statistical significance of prognostic factors is not the same as demonstrating predictive accuracy Demonstrating stability of selected genes is not demonstrating predictive accuracy of a model for independent data

Split-Sample Evaluation Training-set –Used to select features, select model type, determine parameters and cut-off thresholds Test-set –Withheld until a single model is fully specified using the training-set. –Fully specified model is applied to the expression profiles in the test-set to predict class labels. –Number of errors is counted –Ideally test set data is from different centers than the training data and assayed at a different time

training set test set specimens log-expression ratios Cross-Validated Prediction (Leave-One-Out Method) 1. Full data set is divided into training and test sets (test set contains 1 specimen). 2. Prediction rule is built from scratch using the training set. 3. Rule is applied to the specimen in the test set for class prediction. 4. Process is repeated until each specimen has appeared once in the test set.

Leave-one-out Cross Validation Omit sample 1 –Develop multivariate classifier from scratch on training set with sample 1 omitted –Predict class for sample 1 and record whether prediction is correct

Leave-one-out Cross Validation Repeat analysis for training sets with each single sample omitted one at a time e = number of misclassifications determined by cross-validation Subdivide e for estimation of sensitivity and specificity

Cross validation is only valid if the test set is not used in any way in the development of the model. Using the complete set of samples to select genes violates this assumption and invalidates cross-validation. With proper cross-validation, the model must be developed from scratch for each leave-one-out training set. This means that feature selection must be repeated for each leave-one-out training set. –Simon R, Radmacher MD, Dobbin K, McShane LM. Pitfalls in the analysis of DNA microarray data. Journal of the National Cancer Institute 95:14-18, The cross-validated estimate of misclassification error is an estimate of the prediction error for model fit using specified algorithm to full dataset

Prediction on Simulated Null Data Generation of Gene Expression Profiles 14 specimens (P i is the expression profile for specimen i) Log-ratio measurements on 6000 genes P i ~ MVN(0, I 6000 ) Can we distinguish between the first 7 specimens (Class 1) and the last 7 (Class 2)? Prediction Method Compound covariate prediction (discussed later) Compound covariate built from the log-ratios of the 10 most differentially expressed genes.

Partial Cross-Validation of Random Data Generate data for p features and n cases identically distributed in two classes –No model should predict more accurately than the flip of a fair coin Using all the data select k<<p features that appear most differentially expressed between the two classes Cross validate the estimation of model parameters using the same k features for all LOOCV training sets The cross-validated estimate of prediction error will be 0 over 99% of the time.

Major Flaws Found in 40 Studies Published in 2004 Inadequate control of multiple comparisons in gene finding –9/23 studies had unclear or inadequate methods to deal with false positives 10,000 genes x.05 significance level = 500 false positives Misleading report of prediction accuracy –12/28 reports based on incomplete cross-validation Misleading use of cluster analysis –13/28 studies invalidly claimed that expression clusters based on differentially expressed genes could help distinguish clinical outcomes 50% of studies contained one or more major flaws

Class Prediction Cluster analysis is frequently used in publications for class prediction in a misleading way

Fallacy of Clustering Classes Based on Selected Genes Even for arrays randomly distributed between classes, genes will be found that are “significantly” differentially expressed With 10,000 genes measured, about 500 false positives will be differentially expressed with p < 0.05 Arrays in the two classes will necessarily cluster separately when using a distance measure based on genes selected to distinguish the classes

Major Flaws Found in 40 Studies Published in 2004 Inadequate control of multiple comparisons in gene finding –9/23 studies had unclear or inadequate methods to deal with false positives 10,000 genes x.05 significance level = 500 false positives Misleading report of prediction accuracy –12/28 reports based on incomplete cross-validation Misleading use of cluster analysis –13/28 studies invalidly claimed that expression clusters based on differentially expressed genes could help distinguish clinical outcomes 50% of studies contained one or more major flaws

Myth Split sample validation is superior to LOOCV or 10-fold CV for estimating prediction error

Simulated Data 40 cases, 10 genes selected from 5000 MethodEstimateStd Deviation True.078 Resubstitution LOOCV fold CV fold CV Split sample Split sample bootstrap

Comparison of Internal Validation Methods Molinaro, Pfiffer & Simon For small sample sizes, LOOCV is much more accurate than split-sample validation –Split sample validation over-estimates prediction error For small sample sizes, LOOCV is preferable to 10-fold, 5-fold cross-validation or repeated k-fold versions For moderate sample sizes, 10-fold is preferable to LOOCV Some claims for bootstrap resampling for estimating prediction error are not valid for p>>n problems

Simulated Data 40 cases, 10 genes selected from 5000 MethodEstimateStd Deviation True.078 Resubstitution LOOCV fold CV fold CV Split sample Split sample bootstrap

Simulated Data 40 cases MethodEstimateStd Deviation True fold Repeated 10-fold fold Repeated 5-fold Split Repeated split

DLBCL Data MethodBiasStd DeviationMSE LOOCV fold CV fold CV Split Split bootstrap

Permutation Distribution of Cross- validated Misclassification Rate of a Multivariate Classifier Randomly permute class labels and repeat the entire cross-validation Re-do for all (or 1000) random permutations of class labels Permutation p value is fraction of random permutations that gave as few misclassifications as e in the real data

Gene-Expression Profiles in Hereditary Breast Cancer Breast tumors studied: 7 BRCA1+ tumors 8 BRCA2+ tumors 7 sporadic tumors Log-ratios measurements of 3226 genes for each tumor after initial data filtering cDNA Microarrays Parallel Gene Expression Analysis RESEARCH QUESTION Can we distinguish BRCA1+ from BRCA1– cancers and BRCA2+ from BRCA2– cancers based solely on their gene expression profiles?

BRCA1

BRCA2

Classification of BRCA2 Germline Mutations Classification MethodLOOCV Prediction Error Compound Covariate Predictor14% Fisher LDA36% Diagonal LDA14% 1-Nearest Neighbor9% 3-Nearest Neighbor23% Support Vector Machine (linear kernel) 18% Classification Tree45%

Myth Huge sample sizes are needed to develop effective predictive classifiers

Sample Size Planning References K Dobbin, R Simon. Sample size determination in microarray experiments for class comparison and prognostic classification. Biostatistics 6:27, 2005 K Dobbin, R Simon. Sample size planning for developing classifiers using high dimensional DNA microarray data. Biostatistics 8:101, 2007 K Dobbin, Y Zhao, R Simon. How large a training set is needed to develop a classifier for microarray data? Clinical Cancer Res 14:108, 2008

Sample Size Planning for Classifier Development The expected value (over training sets) of the probability of correct classification PCC(n) should be within  of the maximum achievable PCC(  )

Probability Model Two classes Log expression or log ratio MVN in each class with common covariance matrix m differentially expressed genes p-m noise genes Expression of differentially expressed genes are independent of expression for noise genes All differentially expressed genes have same inter-class mean difference 2  Common variance for differentially expressed genes and for noise genes

Classifier Feature selection based on univariate t- tests for differential expression at significance level  Simple linear classifier with equal weights (except for sign) for all selected genes. Power for selecting each of the informative genes that are differentially expressed by mean difference 2  is 1-  (n)

For 2 classes of equal prevalence, let 1 denote the largest eigenvalue of the covariance matrix of informative genes. Then

Sample size as a function of effect size (log-base 2 fold-change between classes divided by standard deviation). Two different tolerances shown,. Each class is equally represented in the population genes on an array.

BRB-ArrayTools Survival Risk Group Prediction No need to transform data to good vs bad outcome. Censored survival is directly analyzed Gene selection based on significance in univariate Cox Proportional Hazards regression Uses k principal components of selected genes Gene selection re-done for each resampled training set Develop k-variable Cox PH model for each leave-one-out training set

BRB-ArrayTools Survival Risk Group Prediction Classify left out sample as above or below median risk based on model not involving that sample Repeat, leaving out 1 sample at a time to obtain cross-validated risk group predictions for all cases Compute Kaplan-Meier survival curves of the two predicted risk groups Permutation analysis to evaluate statistical significance of separation of K-M curves

BRB-ArrayTools Survival Risk Group Prediction Compare Kaplan-Meier curves for gene expression based classifier to that for standard clinical classifier Develop classifier using standard clinical staging plus genes that add to standard staging

Does an Expression Profile Classifier Predict More Accurately Than Standard Prognostic Variables? Some publications fit logistic model to standard covariates and the cross-validated predictions of expression profile classifiers This is valid only with split-sample analysis because the cross-validated predictions are not independent

Does an Expression Profile Classifier Predict More Accurately Than Standard Prognostic Variables? Not an issue of which variables are significant after adjusting for which others or which are independent predictors –Predictive accuracy and inference are different The predictiveness of the expression profile classifier can be evaluated within levels of the classifier based on standard prognostic variables

Survival Risk Group Prediction LOOCV loop: –Create training set by omitting i’th case Develop PH model for training set Compute predictive index for i’th case using PH model developed for training set Compute percentile of predictive index for i’th case among predictive indices for cases in the training set

Survival Risk Group Prediction Plot Kaplan Meier survival curves for cases with predictive index percentiles above 50% and for cases with cross- validated risk percentiles below 50% –Or for however many risk groups and thresholds is desired Compute log-rank statistic comparing the cross-validated Kaplan Meier curves

Survival Risk Group Prediction Evaluate individual genes by fitting single variable proportional hazards regression models to log expression for each gene Select genes based on p-value threshold for single gene PH regressions Compute first k principal components of the selected genes Fit PH regression model with the k pc’s as predictors. Let b 1, …, b k denote the estimated regression coefficients To predict for case with expression profile vector x, compute the k supervised pc’s y 1, …, y k and the predictive index = b 1 y 1 + … + b k y k

Survival Risk Group Prediction Repeat the entire procedure for permutations of survival times and censoring indicators to generate the null distribution of the log-rank statistic –The usual chi-square null distribution is not valid because the cross-validated risk percentiles are correlated among cases Evaluate statistical significance of the association of survival and expression profiles by referring the log-rank statistic for the unpermuted data to the permutation null distribution

Outcome prediction in estrogen-receptor positive, chemotherapy and tamoxifen treated patients with locally advanced breast cancer R. Simon, G. Bianchini, M. Zambetti, S. Govi, G. Mariani, M. L. Carcangiu, P. Valagussa, L. Gianni National Cancer Institute, Bethesda, MD; Fondazione IRCCS - Istituto Tumori di Milano, Milan, Italy

PATIENTS AND METHODS - I Fifty-seven patients with ER positive tumors enrolled in a neoadjuvant clinical trial for LABC were evaluated. All patients had been treated with doxorubicin and paclitaxel q 3wk x 3, followed by weekly paclitaxel x 12 before surgery, then adjuvant intravenous CMF q 4wk x 4 and thereafter tamoxifen. High-throughput qRT-PCR gene expression analysis in paraffin- embedded formalin-fixed core biopsies at diagnosis was performed by Genomic Health to quantify expression of 363 genes (plus 21 for Oncotype DX TM determination), as described previously (Gianni L, JCO 2005). RS genes were excluded from analysis.

PATIENTS AND METHODS - II Three models (prognostic index) were developed to predict Distant Event Free Survival (DEFS): –GENE MODEL Using only expression data, genes were selected based on univariate Cox analysis p value under a specific threshold significance level. –COVARIATES MODEL Using RS (as continuous variable), age and IBC status (covariates) a multivariate proportional hazards model was developed. –COMBINED MODEL Using a combination of these covariates and expression data, genes were selected which add to predicting survival over the predictive value provided by the covariates and under a specific threshold significance level. Survival risk groups were constructed using the supervised principal component method implemented in BRB-ArrayTools (Bair E, Tibshirani R, PLOS Biology 2004).

PATIENTS AND METHODS - III In order to evaluate the predictive value for each model a complete Leave-One-Out Cross-Validation was used. –For each i-th cross-validated training set (with one case removed) a prognostic index (PI) function was created. The PI for the omitted patient is ranked relative to the PI for the i-th training set. Because the PI is a continuous variable, a cut-off percentiles have to be pre-specified for defining the risk groups. The omitted patient is placed into a risk group based on her percentile ranking. The entire procedure has been repeated using different cut-off percentiles (BRB-ArrayTools User’s Manual v3.7).

PATIENTS AND METHODS - IV Statistical significance was determined by repeating the entire cross- validation process 1000 random permutations of the survival data. –For GENE MODEL the p value was testing the null hypothesis that there is no relation between the expression data and survival (by providing a null-distribution of the log-rank statistic) –For COVARIATES MODEL the p value was the parametric log- rank test statistic between risk groups –For COMBINED MODEL the p value addressed whether the expression data adds significantly to risk prediction compared to the covariates

RESULTS Patients characteristics at diagnosis The median follow-up was 76 months (range ) (by inverse Kaplan-Meier method) Patients characteristics were summarized in Table 1.

OS DEFS Overall Survival and Distant Event Free survival – All patients OS and DEFS of all patients

Genes selected for the GENE MODEL and COMBINED MODEL The significance level for gene selection used for the identified models was p= All genes included in the COMBINED MODEL were also selected in the GENE MODEL.

Cross-validated Kaplan-Meier curves for risk groups using 50th percentile cut-off GENE MODEL COVARIATES MODEL COMBINED MODEL DISTANT EVENT FREE SURVIVAL

BRB-ArrayTools Contains analysis tools that I have selected as valid and useful Analysis wizard and multiple help screens for biomedical scientists Imports data from all platforms and major databases Automated import of data from NCBI Gene Express Omnibus

Predictive Classifiers in BRB-ArrayTools Classifiers –Diagonal linear discriminant –Compound covariate –Bayesian compound covariate –Support vector machine with inner product kernel –K-nearest neighbor –Nearest centroid –Shrunken centroid (PAM) –Random forrest –Tree of binary classifiers for k- classes Survival risk-group –Supervised pc’s Feature selection options –Univariate t/F statistic –Hierarchical variance option –Restricted by fold effect –Univariate classification power –Recursive feature elimination –Top-scoring pairs Validation methods –Split-sample –LOOCV –Repeated k-fold CV –.632+ bootstrap

Selected Features of BRB-ArrayTools Multivariate permutation tests for class comparison to control number and proportion of false discoveries with specified confidence level –Permits blocking by another variable, pairing of data, averaging of technical replicates SAM –Fortran implementation 7X faster than R versions Extensive annotation for identified genes –Internal annotation of NetAffx, Source, Gene Ontology, Pathway information –Links to annotations in genomic databases Find genes correlated with quantitative factor while controlling number of proportion of false discoveries Find genes correlated with censored survival while controlling number or proportion of false discoveries Analysis of variance

Selected Features of BRB-ArrayTools Gene set enrichment analysis. –Gene Ontology groups, signaling pathways, transcription factor targets, micro-RNA putative targets –Automatic data download from Broad Institute –KS & LS test statistics for null hypothesis that gene set is not enriched –Efron/Tibshirani max mean test –Goeman’s Global test of null hypothesis that no genes in set are differentially expressed Class prediction –Multiple classifiers – Complete LOOCV, k-fold CV, repeated k-fold,.632 bootstrap –permutation significance of cross-validated error rate

Selected Features of BRB-ArrayTools Survival risk-group prediction –Supervised principal components with and without clinical covariates –Cross-validated Kaplan Meier Curves –Permutation test of cross-validated KM curves Clustering tools for class discovery with reproducibility statistics on clusters –Internal access to Eisen’s Cluster and Treeview Visualization tools including rotating 3D principal components plot exportable to Powerpoint with rotation controls Extensible via R plug-in feature Tutorials and datasets

BRB-ArrayTools Extensive built-in gene annotation and linkage to gene annotation websites Publicly available for non-commercial use –

Conclusions New technology and biological knowledge make it increasingly feasible to identify which patients are most likely to benefit from a specified treatment “Predictive medicine” is feasible based on genomic characterization of a patient’s tumor Targeting treatment can greatly improve the therapeutic ratio of benefit to adverse effects –Treated patients benefit –Economic benefit for society

Conclusions Achieving the potential of new technology requires paradigm changes in focus and methods of “correlative science.” Effective interdisciplinary research requires increased emphasis on cross education of laboratory, clinical and statistical scientists

Acknowledgements Kevin Dobbin Alain Dupuy Wenyu Jiang Annette Molinaro Michael Radmacher Joanna Shih Yingdong Zhao BRB-ArrayTools Development Team