Gene expression profiling identifies molecular subtypes of gliomas

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro Tom Khabaza Sridhar Ramaswamy Presented briefly by Joey.
Instance-based Classification Examine the training samples each time a new query instance is given. The relationship between the new query instance and.
Wenting Zhou, Weichen Wu, Nathan Palmer, Emily Mower, Noah Daniels, Lenore Cowen, Anselm Blumer Tufts University Microarray Data.
Chapter 1 Introduction to Clustering. Section 1.1 Introduction.
T. R. Golub, D. K. Slonim & Others Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer.
Microarrays Dr Peter Smooker,
Mutual Information Mathematical Biology Seminar
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Glioblastoma Multiforme (GBM) – Subtype Analysis Lance Parsons.
Bio277 Lab 2: Clustering and Classification of Microarray Data Jess Mar Department of Biostatistics Quackenbush Lab DFCI
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
. Differentially Expressed Genes, Class Discovery & Classification.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Cluster Analysis Class web site: Statistics for Microarrays.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Introduction to Hierarchical Clustering Analysis Pengyu Hong 09/16/2005.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
1 Masterseminar „A statistical framework for the diagnostic of meningioma cancer“ Chair for Bioinformatics, Saarland University Andreas Keller Supervised.
Re-Examination of the Design of Early Clinical Trials for Molecularly Targeted Drugs Richard Simon, D.Sc. National Cancer Institute linus.nci.nih.gov/brb.
Performance Metrics for Graph Mining Tasks
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Evaluating Performance for Data Mining Techniques
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
From motif search to gene expression analysis
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Molecular Diagnosis Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
The Broad Institute of MIT and Harvard Classification / Prediction.
Microarrays.
Microarray - Leukemia vs. normal GeneChip System.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
1 Gene Ontology Javier Cabrera. 2 Outline Goal: How to identify biological processes or biochemical pathways that are changed by treatment.Goal: How to.
Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희.
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Flat clustering approaches
The Broad Institute of MIT and Harvard Differential Analysis.
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)
Gene expression. Gene Expression 2 protein RNA DNA.
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Annals of Oncology 23: 298–304, 2012 종양혈액내과 R4 김태영 / prof. 김시영.
Unsupervised Learning
PREDICT 422: Practical Machine Learning
Cluster Analysis II 10/03/2012.
Gene expression.
Dimension reduction : PCA and Clustering
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
Microarray Gene Expression Analysis of Fixed Archival Tissue Permits Molecular Classification and Identification of Potential Therapeutic Targets in Diffuse.
Volume 26, Issue 12, Pages e5 (March 2019)
Unsupervised Learning
Presentation transcript:

Gene expression profiling identifies molecular subtypes of gliomas Ruty Shai, Tao Shi, Thomas J Kremen, Steve Horvath, Linda M Liau, Timothy F Cloughesy, Paul S Mischel* and Stanley F Nelson Presented by Stephanie Tsung

Outline Descriptions of Data Statistical Methods Multidimensional Scaling Plot Hierarchical Clustering K-means Clustering Gene Filtering/Selection Predictor Comparison Conclusion/ Future works

Background Brain tumors can be classified by tumor origins, cell type origin or the tumor site etc; Tumor classification has been critical in treatment selection and outcome prediction. However, current classification methods are still far from perfect; As a new technology, DNA microarray has been introduced to cancer classification on the basis of gene expression levels.

Background: Cancer Classification Cancer classification can be divided into two challenges: class discovery and class prediction.  Class discovery refers to defining previously unrecognized tumor subtypes.  Class prediction refers to the assignment of particular tumor samples to already-defined classes.

Objectives To test whether gene expression measurements can be used to classify different brain tumors; To determine sets of significant genes to distinguish brain tumor of different pathological types, grades and survival times; To validate the selected informative genes in brain tumor classification and prediction.

Data and Pre-Processing Affymetrix HG-U95Av2 chips 12,555 Genes and total 42 samples Tumor Types (#): N(7) O(3) D(18) A(2) AA(3) P(9) Data pre-processing: Each tumor was examined by a neuropathologist and dissected into two portions: tissue diagnosis and RNA extraction. Normalization and Model-Based Expression indices in dChip.

Q. Are the global transcriptional signatures of the different pathologic subtypes of gliomas molecularly distinct? As a first step in the analysis, we asked whether the global transcriptional signatures of the different pathologic subtypes of gliomas were molecularly distinct.

Multidimensional Scaling Plot (MDS Plot) To uncover the hidden structure of data. D(N) -> D(2) Dimension reduction technique 12,555 dimensional space to low dimensional Euclidean space Explain observed similarities and dissimilarity between objects such as correlation, euclidean distance etc. R: cmd1 <- cmdscale(dist(dat1[,1:30]),k=2,eig=T) We performed multidimensional scaling, an unsupervised method of data reduction, in which high-dimensional gene expression data are projected onto two viewable dimensions representing linear combinations of genes that provide the most variation in the data set

MDS Plot Multidimensional scaling analysis of our samples based on expression of all 12 555 probe sets demonstrated that the global gene expression profiles of gliomas of different type and grade have distinctive global gene expression signatures. The glioblastomas, lower grade astrocytomas and oligodendrogliomas were all separable from each other, and from normal brain tissue The multidimensional scaling data also indicate that primary glioblastomas, which arise as de novo grade IV tumors, are not molecularly distinct from secondary glioblastomas, which develop from lower grade gliomas. However, the secondary GBMs are more diverse than the primary GBMs. Figure 1. (a)Multidimensional scaling plot of all 42 tissue samples plotted in two-dimensional space using expression values from all 12 555 probesets.

Hierarchical Clustering Evaluate all pair wise distance between objects Look for a pair with shortest distance Construct ‘new obj’ by avg. of two obj. Evaluate distance from ‘new obj’ to all other objects and Go to Step 2 R: h1 <- hclust(dist(x), method=“average”)

Hierarchical Clustering II III IV Figure 1. (b) The same 42 tissue samples were grouped into hierarchical clusters. Tissue samples are color-coded. I & II : P=0.00006, Fisher’s exact test III & IV : P=0.00001

Fisher’s Exact Test Sample w/ charat. w/o charat. Total 1 A B A+B 2 C D C+D A+C B+D N Ho: Whether proportion of interest differs between two groups. Ex. 55 8 7 Fisher's exact test is an alternative to the chi-squared test for testing the hypothesis that some proportion of interest differs between two groups. It has the advantage that it does not make any approximations, and so is suitable for small sample sizes. Fisher's exact test calculates the exact probability . Fisher's exact test in the Tests menu is used to calculate an exact P-value for a 2x2 frequency table with small number of expected frequencies, for which the Chi-square test is not appropriate. The Fisher exact test for 2 x 2 tables is used when members of two independent groups can fall into one of two mutually exclusive categories. The test is used to determine whether the proportions of those falling into each category differs by group. The chi-square test of independence can also be used in such situations, but it is only an approximation, whereas Fisher's exact test returns exact one-tailed and two-tailed p-values for a given frequency table. Fisher's exact test computes the probability, given the observed marginal frequencies, of obtaining exactly the frequencies observed and any configuration more extreme. By "more extreme," we mean any configuration (given observed marginals) with a smaller probability of occurrence in the same direction (one-tailed) or in both directions (two-tailed). The two-tailed probability: .326 + .007+ .093 + .163 + .019 = .608

Q. Can we uncover these subtypes without prior knowledge? i.e. How many categories of gliomas are suggested by the gene expression data? Next, we asked if our data might be used to uncover molecular subtypes of gliomas without prior knowledge of their pathologic type or grade. That is, how many categories of glioma are suggested by the gene expression data?

K-means Clustering To find a K-partition of the observations that minimizes the within sum of squares (WSS) for each clusters The number of clusters, k, needs to be pre-specified. Tibshirani prediction strength can be used to determine the optimal k. R: cl1<- kmeans (x, 3)

Three groups were defined Three groups were defined. Each tumor is assigned to one of three cluster groups by color: red is group 1, green is group 2 and black is group 3 These data indicate that there are three main molecular subsets of gliomas, which correspond to glioblastomas, astrocytomas, and oligodendrogliomas. Figure 2 Grouping of tumors. All tumor samples were plotted using multidimensional scaling using all 12 555 probesets. We performed nonhierarchical Kmeans clustering (Kaufmann and Rousseeu, 1990).

Gene Filtering/Selection To find the interesting genes which differently expressed in 6 two groups comparisons Using top 30 genes based on T-test 170 most differentially expressed genes using T-test (6 * 30 – 10 = 170) A final gene list was then constructed by pooling the most differentially expressed genes from these individual comparisons, and redundant genes were eliminated.

Predictor Comparison Compare the performance of predictors: Gene Vote Leave-one-out crossvalidation error rates were calculated. For a given method and sample size, n, a classifier is generated using (n - l) cases and tested on the single remaining case. This is repeated n times, each time designing a classifier by leaving-one-out. Thus, each case in the sample is used as a test case, and each time nearly all the cases are used to design a classifier Leaving-one-out is an elegant and straightforward technique for estimating classifier error rates. Because it is computationally expensive, it has often been reserved for problems where relatively small sample sizes are available. For a given method and sample size, n, a classifier is generated using (n - l) cases and tested on the single remaining case. This is repeated n times, each time designing a classifier by leaving-one-out. Thus, each case in the sample is used as a test case, and each time nearly all the cases are used to design a classifier. The error rate is the number of errors on the single test cases divided by n. The leave-one-out error rate estimator is an almost unbiased estimator of the true error rate of a classifier.

Table 1.

Using 170 filtered genes based on t-test Figure 3 Hierarchical clustering of seven normal white matter tissue samples and 26 glial tumor samples using 170 filtered genes. We used dChip to perform hierarchical clustering of the samples using 1-r where r is Pearson’s correlation coefficient as the distance measure Samples are coded by color. Gene expression values are represented as expression relative to the mean of all samples; red is a relatively higher expression and green is a relatively lower expression

Table 2.

Conclusion Performed MDS plots and K-means clustering analysis and found evidence for three clusters: glioblastomas, lower grade astrocytomas, and oligodendrogilmas (p<0.00001). A relatively small number of genes can be used to distinguish between molecular subtypes. Subsets of gliomas can be potentially used for patient stratification and potential targets for treatment.

Future Directions Construct predictors using different gene selection methods. Validate the selected genes with new tumor samples. ……

K=3 gave us the best prediction power Number of Cluster (K) 1 2 3 4 5 Tibshirani Prediction Strength 1.000 0.766 0.881 0.501 0.510

Statistical problems in response-based classification Identification of new or unknown classes--unsupervised learning Classification into known classes— supervised learning Identification of “best” predictor variables—variable selection, e.g. marker genes in microarray data (gene voting, hierarchical clustering)