Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.

Slides:



Advertisements
Similar presentations
Shibing Deng Pfizer, Inc. Efficient Outlier Identification in Lung Cancer Study.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Molecular Systems Biology 3; Article number 140; doi: /msb
Relating Gene Expression to a Phenotype and External Biological Information Richard Simon, D.Sc. Chief, Biometric Research Branch, NCI
Expression profiles for prognosis and prediction Laura J. Van ‘t Veer The Netherlands Cancer Institute, Amsterdam.
Myths and Statistical Principles in DNA Microarray Research Richard Simon, D.Sc. Chief, Biometric Research Branch Head, Molecular Statistics & Bioinformatics.
Microarrays Dr Peter Smooker,
Logical Analysis of Diffuse Large B Cell Lymphoma Gabriela Alexe 1, Sorin Alexe 1, David Axelrod 2, Peter Hammer 1, and David Weissmann 3 of RUTCOR(1)
. Differentially Expressed Genes, Class Discovery & Classification.
Alizadeh et. al. (2000) Stephen Ayers 12/2/01. Clustering “Clustering is finding a natural grouping in a set of data, so that samples within a cluster.
Discriminant Analysis Objective Classify sample objects into two or more groups on the basis of a priori information.
Supervised gene expression data analysis using SVMs and MLPs Giorgio Valentini
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Clustering and Classification In Gene Expression Data Carlo Colantuoni Slide Acknowledgements: Elizabeth Garrett-Mayer, Rafael Irizarry,
Evaluating Performance for Data Mining Techniques
Gene expression profiling identifies molecular subtypes of gliomas
CZ5225: Modeling and Simulation in Biology Lecture 6, Microarray Cancer Classification Prof. Chen Yu Zong Tel:
Gene Expression Profiling Illustrated Using BRB-ArrayTools.
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
Chapter 7 Essential Concepts in Molecular Pathology Companion site for Molecular Pathology Author: William B. Coleman and Gregory J. Tsongalis.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Michael Birrer Ian McNeish New Developments in Biology and Targets of Epithelial Ovarian Cancer.
1 Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting Authors: A. Dupuy and R.M. Simon.
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
The Use of Predictive Biomarkers in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Using Predictive Classifiers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Whole Genome Approaches to Cancer 1. What other tumor is a given rare tumor most like? 2. Is tumor X likely to respond to drug Y?
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Brad Windle, Ph.D Unsupervised Learning and Microarrays Web Site: Link to Courses and.
Prof. Yechiam Yemini (YY) Computer Science Department Columbia University (c)Copyrights; Yechiam Yemini; Lecture 2: Introduction to Paradigms 2.3.
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)
Gene expression. Gene Expression 2 protein RNA DNA.
Jin MENG Shen FU (DPD 08) Biology 2 - Head/Neck and CNS Tumors
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
Annals of Oncology 23: 298–304, 2012 종양혈액내과 R4 김태영 / prof. 김시영.
Estrogen-Regulated Genes Predict Survival in Hormone Receptor–Positive Breast Cancers J Clin Oncol 24: Daniel S. Oh, Melissa A. Troester,
Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences
Machine Learning – Classification David Fenyő
Classification with Gene Expression Data
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
FINAL PROJECT- Key dates
Gene Expression Analysis
Gene expression.
Hallett, et al., - Supplementary Figure 1
What are the Advantages?
Molecular Classification of Cancer
Claudio Lottaz and Rainer Spang
Regulatory Industry Statistics Workshop 2018
Christos Sotiriou, Chand Khanna, Amir A
Loyola Marymount University
Global approach to the diagnosis of leukemia using gene expression profiling by Torsten Haferlach, Alexander Kohlmann, Susanne Schnittger, Martin Dugas,
Microarray Gene Expression Analysis of Fixed Archival Tissue Permits Molecular Classification and Identification of Potential Therapeutic Targets in Diffuse.
Mapping Cancer Origins
Loyola Marymount University
Loyola Marymount University
Figure 1. Identification of three tumour molecular subtypes in CIT and TCGA cohorts. We used CIT multi-omics data ( Figure 1. Identification of.
Loyola Marymount University
Loyola Marymount University
Distinct molecular and clinical correlates of H3F3A mutation subgroups
A, unsupervised hierarchical clustering of the expression of probe sets differentially expressed in the oral mucosa of smokers versus never smokers. A,
Claudio Lottaz and Rainer Spang
Targetable alterations and pathways in TNBCs after NAC
Subtype classification of breast functional screening results.
Molecular characterization of esophagogastric tumors.
Presentation transcript:

Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research Branch National Cancer Institute

One Potential of Gene Expression Data Specimens will be distinguishable by their gene expression profiles NCI Director’s Challenge: Toward a Molecular Classification of Tumors “This challenge is intended to lay the groundwork for changing the basis of tumor classification from morphological to molecular characteristics.” Purpose is “...to define comprehensive profiles of molecular alterations in tumors that can be used to identify subsets of patients.” So one important goal is: Classification

What is meant by “Classification”? Two important and distinct answers: Class Discovery Identification of previously unknown classes of specimens Use of “unsupervised” methods Hierarchical Clustering k-means Clustering SOMs Others Prevalent method used in literature for analysis of gene expression data. Class Prediction Assignment of specimens into known classes Use of “supervised” methods Logistic Regression CART Discriminant Analysis Others Class prediction is more powerful than class discovery for distinguishing specimens based on a priori defined classes.

Example of Class Discovery: Distinct Types of Diffuse Large B-Cell Lymphoma DLBCL is clinically heterogeneous Specimens were clustered based on their expression profiles of GC B-cell associated genes. Two subgroups were discovered: GC B-like DLBCL Activated B-like DLBCL (Figures and information taken from Alizadeh et al., Nature 403:503-11, 2000)

What is meant by “classification”? Two important and distinct answers: Class Discovery Identification of previously unknown classes of specimens Use of “unsupervised” methods Hierarchical Clustering k-means Clustering SOMs Others Prevalent method used in literature for analysis of gene expression data. Class Prediction Assignment of specimens into known classes Use of “supervised” methods Logistic Regression CART Discriminant Analysis Others Class prediction is more powerful than class discovery for distinguishing specimens based on a priori defined classes.

Study of Gene Expression in Breast Tumors (NHGRI, J. Trent) cDNA Microarrays Parallel Gene Expression Analysis 6526 genes /tumor How similar are the gene expression profiles of BRCA1 and BRCA2 (+) and sporadic breast cancer patient biopsies? Can we identify a set of genes that distinguish the different tumor types? Tumors studied: 7 BRCA1 + 8 BRCA2 + 7 Sporadic

BRCA1 +/- and BRCA2 +/- Classification: Results from Hierarchical Clustering BRCA1 Clustering BRCA2 Clustering

Class Prediction Paradigm Begin with a data set that can be separated into known groups. Choose a method of class prediction. Perform class prediction on the data set using “leave-one-out” cross-validation. Leave one specimen out of data set. Build the class predictor using remaining data. Predict class of the left out specimen. Repeat so that a prediction is made for every specimen. Use a permutation test to determine if there is a significant difference in expression patterns between the groups. Permute class labels among specimens. Perform class prediction on the permuted data. Repeat many times. Report the % of permuted sets with an error rate equivalent to or less than that for the actual data set.

The Compound Covariate Predictor (CCP) We consider only genes that are differentially expressed between the two groups (using a two-sample t-test with small a). The CCP Motivated by J. Tukey, Controlled Clinical Trials, 1993 Simple approach that may serve better than complex multivariate analysis A compound covariate is built from the basic covariates (log-ratios) tj is the two-sample t-statistic for gene j. xij is the log-ratio measure of sample i for gene j. Sum is over all differentially expressed genes. Threshold of classification: midpoint of the CCP means for the two classes.

BRCA1 +/- and BRCA2 +/- Classification: Results from Class Prediction with CCP

Sample Size Considerations for Accurate Class Prediction

Summary Class discovery and prediction methods have distinct goals. When class information is known, class prediction is a more powerful method for detecting differences. BRCA1 and BRCA2 mutation positive tumors have distinguishable gene expression patterns. BRCA1 distinction is stronger than BRCA2. Some biological insight concerning misclassified specimens. Not at level of clinical classification yet. Sample size issues

Collaborators NCI Richard Simon NHGRI Mike Bittner Yidong Chen David Duggan Ingrid Hedenfalk Jeff Trent