Presentation is loading. Please wait.

Presentation is loading. Please wait.

Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.

Similar presentations


Presentation on theme: "Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research."— Presentation transcript:

1 Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research Branch National Cancer Institute

2 One Potential of Gene Expression Data
Specimens will be distinguishable by their gene expression profiles NCI Director’s Challenge: Toward a Molecular Classification of Tumors “This challenge is intended to lay the groundwork for changing the basis of tumor classification from morphological to molecular characteristics.” Purpose is “...to define comprehensive profiles of molecular alterations in tumors that can be used to identify subsets of patients.” So one important goal is: Classification

3 What is meant by “Classification”? Two important and distinct answers:
Class Discovery Identification of previously unknown classes of specimens Use of “unsupervised” methods Hierarchical Clustering k-means Clustering SOMs Others Prevalent method used in literature for analysis of gene expression data. Class Prediction Assignment of specimens into known classes Use of “supervised” methods Logistic Regression CART Discriminant Analysis Others Class prediction is more powerful than class discovery for distinguishing specimens based on a priori defined classes.

4 Example of Class Discovery: Distinct Types of Diffuse Large B-Cell Lymphoma
DLBCL is clinically heterogeneous Specimens were clustered based on their expression profiles of GC B-cell associated genes. Two subgroups were discovered: GC B-like DLBCL Activated B-like DLBCL (Figures and information taken from Alizadeh et al., Nature 403:503-11, 2000)

5 What is meant by “classification”? Two important and distinct answers:
Class Discovery Identification of previously unknown classes of specimens Use of “unsupervised” methods Hierarchical Clustering k-means Clustering SOMs Others Prevalent method used in literature for analysis of gene expression data. Class Prediction Assignment of specimens into known classes Use of “supervised” methods Logistic Regression CART Discriminant Analysis Others Class prediction is more powerful than class discovery for distinguishing specimens based on a priori defined classes.

6 Study of Gene Expression in Breast Tumors (NHGRI, J. Trent)
cDNA Microarrays Parallel Gene Expression Analysis 6526 genes /tumor How similar are the gene expression profiles of BRCA1 and BRCA2 (+) and sporadic breast cancer patient biopsies? Can we identify a set of genes that distinguish the different tumor types? Tumors studied: 7 BRCA1 + 8 BRCA2 + 7 Sporadic

7 BRCA1 +/- and BRCA2 +/- Classification: Results from Hierarchical Clustering
BRCA1 Clustering BRCA2 Clustering

8 Class Prediction Paradigm
Begin with a data set that can be separated into known groups. Choose a method of class prediction. Perform class prediction on the data set using “leave-one-out” cross-validation. Leave one specimen out of data set. Build the class predictor using remaining data. Predict class of the left out specimen. Repeat so that a prediction is made for every specimen. Use a permutation test to determine if there is a significant difference in expression patterns between the groups. Permute class labels among specimens. Perform class prediction on the permuted data. Repeat many times. Report the % of permuted sets with an error rate equivalent to or less than that for the actual data set.

9 The Compound Covariate Predictor (CCP)
We consider only genes that are differentially expressed between the two groups (using a two-sample t-test with small a). The CCP Motivated by J. Tukey, Controlled Clinical Trials, 1993 Simple approach that may serve better than complex multivariate analysis A compound covariate is built from the basic covariates (log-ratios) tj is the two-sample t-statistic for gene j. xij is the log-ratio measure of sample i for gene j. Sum is over all differentially expressed genes. Threshold of classification: midpoint of the CCP means for the two classes.

10 BRCA1 +/- and BRCA2 +/- Classification: Results from Class Prediction with CCP

11 Sample Size Considerations for Accurate Class Prediction

12 Summary Class discovery and prediction methods have distinct goals.
When class information is known, class prediction is a more powerful method for detecting differences. BRCA1 and BRCA2 mutation positive tumors have distinguishable gene expression patterns. BRCA1 distinction is stronger than BRCA2. Some biological insight concerning misclassified specimens. Not at level of clinical classification yet. Sample size issues

13 Collaborators NCI Richard Simon NHGRI Mike Bittner Yidong Chen
David Duggan Ingrid Hedenfalk Jeff Trent


Download ppt "Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research."

Similar presentations


Ads by Google