Classification with Gene Expression Data

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Publications Reviewed Searched Medline Hand screening of abstracts & papers Original study on human cancer patients Published in English before December.
Random Forest Predrag Radenković 3237/10
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Tree Approach in Data Mining
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
Lecture outline Classification Decision-tree classification.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.
Induction of Decision Trees
Predictive sub-typing of subjects Retrospective and prospective studies Exploration of clinico-genomic data Identify relevant gene expression patterns.
Three kinds of learning
Classification.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
Chapter 5 Data mining : A Closer Look.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Whole Genome Expression Analysis
Gene Expression Profiling Illustrated Using BRB-ArrayTools.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
The Broad Institute of MIT and Harvard Classification / Prediction.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
1 Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting Authors: A. Dupuy and R.M. Simon.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Today’s Topics Learning Decision Trees (Chapter 18) –We’ll use d-trees to introduce/motivate many general issues in ML (eg, overfitting reduction) “Forests”
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
Classification using Decision Trees 1.Data Mining and Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
10. Decision Trees and Markov Chains for Gene Finding.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning with Spark MLlib
Introduction to Machine Learning
Heping Zhang, Chang-Yung Yu, Burton Singer, Momian Xiong
Results for all features Results for the reduced set of features
Chapter 6 Classification and Prediction
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Data Science Algorithms: The Basic Methods
Molecular Classification of Cancer
Evaluating classifiers for disease gene discovery
Claudio Lottaz and Rainer Spang
Classification and Prediction
Machine Learning Week 1.
Hierarchical clustering approaches for high-throughput data
Statistical Learning Dong Liu Dept. EEIS, USTC.
A task of induction to find patterns
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Evaluating Classifiers for Disease Gene Discovery
Claudio Lottaz and Rainer Spang
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Classification with Gene Expression Data BMI/CS 576 www.biostat.wisc.edu/bmi576/ Colin Dewey cdewey@biostat.wisc.edu Fall 2008

Classification of Gene Expression Profiles: The Learning Task given: expression profiles for a set of genes or experiments/individuals/time points (whatever columns represent) a class label for each expression profile do: learn a model that is able to accurately predict the class labels for other expression profiles

Classification of Gene Expression Profiles: The Classification Task given: expression profiles a classification model do: predict the associated class for each expression profile

Breast Cancer Outcomes Prediction Nevins et al., Lancet 2003, Human Molecular Genetics 2003 microarray and clinical data from 86 lymph-node positive breast cancer patients 12,625 genes measured using Affymetrix arrays goal is to distinguish between high risk (recurrence w/in 5 years) and low risk (recurrence-free for 5 years)

Calculating “Metagenes” the features used in their model are not mRNA measurements from individual genes instead they compute “metagenes”, which consist of linear combinations of gene measurements procedure ran k-means clustering (with k=500) on original microarray data set computed first principal component of each cluster each of these principal components becomes a metagene

A Decision Tree Classifier outcome of test at internal node above low risk/high risk cases in training set that reach this node smoothed probability estimate of high risk

Decision Tree Classifiers tree-based classifiers partition the data using axis-parallel splits

Inducing Tree-Based Classifiers there are many decision-tree learning methods two most common are C4.5 (Quinlan) CART (Breiman, Friedman, Olshen, Stone) Nevins et al. use their own method all DT learning methods have the same basic algorithm structure: recursively grow a tree top-down

Generic DT Induction Pseudocode MakeSubtree(set of instances I) if stopping criteria met make a leaf node N determine class label/probabilities for N else make an internal node N select best splitting criterion for N for each outcome k of the split Ik = subset of instances that have outcome k kth child of N = MakeSubtree(Ik) return subtree rooted at N

Breast Cancer Outcomes Prediction predictive accuracy estimated by cross-validation 85-90% correct predictions (low-risk, high-risk)

Multiple Myeloma vs. MGUS Classification Hardin et al., Statistical Applications in Genetics and Mol. Bio. ‘04 standard lab classification of MM and MGUS is quite accurate… but biologically uninformative MGUS = Monoclonal gammopathy of undetermined significance MM = multiple myeloma Can this classification be done using expression profiles? learned models might enable molecular diagnosis lend insight into disease progression ~12,625 genes measured on Affymetrix chips

Hardin et al. Empirical Evaluation applied six supervised machine learning algorithms to this data set logistic regression C5.0 decision tree induction method ensembles of voters naïve Bayes nearest shrunken centroid support vector machines

Empirical Evaluation:MM vs. Normal

Empirical Evaluation: MGUS vs. Normal

Empirical Evaluation: MM vs. MGUS

Comments Gene Expression Analysis we discussed two computational tasks classification: do this when you do know the categories of interest clustering: do this when you don’t know the categories of interest class discovery is an interesting task that falls between classification and clustering identify classes of profiles that don’t seem to fit into any of the modeled categories e.g. new subtypes of cancer, new types of toxic substances we’ve discussed methods in the context of microarray data, but they can be applied to a wide variety of high-throughput biological data