Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희.

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro Tom Khabaza Sridhar Ramaswamy Presented briefly by Joey.
Instance-based Classification Examine the training samples each time a new query instance is given. The relationship between the new query instance and.
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek.
Linking Genetic Profiles to Biological Outcome Paul Fogel Consultant, Paris S. Stanley Young National Institute of Statistical Sciences NISS, NMF Workshop.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
T. R. Golub, D. K. Slonim & Others Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 9 Clustering Algorithms Bioinformatics Data Analysis and Tools.
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Differentially expressed genes
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
1 Cluster Analysis EPP 245 Statistical Analysis of Laboratory Data.
Computational Biology Algorithmic Techniques & Medical Applications CSE 590YA August 15, 2001.
Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab.
Generate Affy.dat file Hyb. cRNA Hybridize to Affy arrays Output as Affy.chp file Text Self Organized Maps (SOMs) Functional annotation Pathway assignment.
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Evaluating Performance for Data Mining Techniques
Chapter 1: Introduction to Statistics
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Cancer classification by Regularized Least Square Classifiers Annarita D’Addabbo a, Rosalia Maglietta a, Sabino Liuni b, Graziano Pesole b,c and Nicola.
Gene expression profiling identifies molecular subtypes of gliomas
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek.
Sp’10Bafna/Ideker Classification (SVMs / Kernel method)
JM - 1 Introduction to Bioinformatics: Lecture VIII Classification and Supervised Learning Jarek Meller Jarek Meller Division.
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
CZ5225: Modeling and Simulation in Biology Lecture 6, Microarray Cancer Classification Prof. Chen Yu Zong Tel:
Whole Genome Expression Analysis
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Molecular Diagnosis Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
Exagen Diagnostics, Inc., all rights reserved Biomarker Discovery in Genomic Data with Partial Clinical Annotation Cole Harris, Noushin Ghaffari.
The Broad Institute of MIT and Harvard Classification / Prediction.
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Whole Genome Approaches to Cancer 1. What other tumor is a given rare tumor most like? 2. Is tumor X likely to respond to drug Y?
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
LOGO iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance- Pairs and Reduced Alphabet Profile into the General Pseudo Amino.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
The Broad Institute of MIT and Harvard Differential Analysis.
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
A Combinatorial Approach to the Analysis of Differential Gene Expression Data The Use of Graph Algorithms for Disease Prediction and Screening.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Classifiers!!! BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Classifiers!!! BCH339N Systems Biology / Bioinformatics – Spring 2016
Molecular Classification of Cancer
Computational Biology Lecture #9: Analyzing Gene Expression Data
Gene expression correlates of clinical prostate cancer behavior
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희

Introduction Gene expression: process of transcribing DNA sequence into RNA for protein production Gene expression level: approximate number of copies of RNA in a cell  Correlate with amount of the corresponding protein made.  May provide the additional information for improving cancer classification and diagnosis Class discovery: dividing samples into groups with similar behavior or properties Class prediction: given a set of known classes, determine the correct class for a new patient

Definitions Data Class vector: datan samples m genes Gene expression vector

Method for Choosing Correlated Genes Metric for gene selection  Predictive gene’s typical expression in one class must be quite different from its typical expression in the other.  Variation of expression in one class must be as little as possible.  Correlation metric:

Neighborhood analysis  Whether there are any genes likely to be predictors of given class distinction  Determine if the neighborhood around c holds more gene expression vectors than we’d expect to see by chance(around random permutation of c).

Choosing a prediction set S  Could simply choose the top k genes by the absolute value of P(g, c).  Choose the top k 1 genes(highly expressed in class 1) and the bottom k 2 genes(highly expressed in class 2). Optimal size of the prediction set  Tradeoff between additional information and robustness and amount of additional noise.  Variant |S| with constraint that k 1 and k 2 are roughly equal. This prediction method is not highly sensitive to the exact number of genes used.

Prediction by Weighted Voting Each gene casts a weighted vote  V=weight(g) * distance(x, b)  g: each gene in S, x: new sample in test set.  b: ‘decision boundary’  weight(g)=P(g,c) Tradeoff of reliability vs. utility  PS(‘prediction strength’)  In this paper, PS threshold was 0.3. That is,  Error rate, ‘no call’ rate

Application: Classifying Patient Samples Training set: 38 leukemia samples(11 AML, 28 ALL) Test set: 34 samples(14 AML, 20 ALL) ALL/AML distiction

About 700 genes above the 1% level in each direction Arbitrarily chose to use 50-gene predictor  36 correct prediction out of 38 training samples.  29 correct prediction out of 34 test samples.

Application: Verifying Proposed Classes One needs to show that the class distinctions discovered are real and biologically interesting. Validate the clusters by testing predictability.  If clusters reflect true structure, the distinction should be predictable in additional samples  Examine prediction strengths in cross-validation  Test if distribution of prediction strengths for given class distinction is significantly higher than we’d expect for a random class distinction.

Discussion and Conclusion Use similar methods to predict any trait of characteristic at the transcriptional level. Future work  When no one biological pathway is responsible for all the cases in either class.  When there is multiple classes.