Discrimination Methods As Used In Gene Array Analysis.

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Component Analysis (Review)
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Dimension reduction (1)
K Means Clustering , Nearest Cluster and Gaussian Mixture
Model generalization Test error Bias, variance and complexity
Chapter 4: Linear Models for Classification
Discrimination Class web site: Statistics for Microarrays.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Ensemble Learning: An Introduction
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Clustering.
Classification 10/03/07.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Ensemble Learning (2), Tree and Forest
Radial Basis Function Networks
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Xuelian Wei Department of Statistics Most of Slides Adapted from by Darlene Goldstein Classification.
Principles of Pattern Recognition
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Statistics for Microarray Data Analysis with R Session 8: Discrimination Class web site:
SLIDES RECYCLED FROM ppt slides by Darlene Goldstein Supervised Learning, Classification, Discrimination.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Microarray Workshop 1 Introduction to Classification Issues in Microarray Data Analysis Jane Fridlyand Jean Yee Hwa Yang University of California, San.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
1 Advanced analysis: Classification, clustering and other multivariate methods. Statistics for Microarray Data Analysis – Lecture 4 The Fields Institute.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
1 E. Fatemizadeh Statistical Pattern Recognition.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Lecture 4 Linear machine
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Linear Models for Classification
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Flat clustering approaches
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Dimensionality reduction
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Machine Learning 5. Parametric Methods.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Supervised learning in high-throughput data  General considerations  Dimension reduction with outcome variables  Classification models.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Chapter 3: Maximum-Likelihood Parameter Estimation
Instance Based Learning
Trees, bagging, boosting, and stacking
CH 5: Multivariate Methods
Clustering (3) Center-based algorithms Fuzzy k-means
REMOTE SENSING Multispectral Image Classification
REMOTE SENSING Multispectral Image Classification
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Instance Based Learning
Generally Discriminant Analysis
Text Categorization Berlin Chen 2003 Reference:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Discrimination Methods As Used In Gene Array Analysis

Discrimination Methods Microarray Background Clustering and Classifiers Discrimination Methods:  Nearest Neighbor  Classification Trees  Maximum Likelihood Discrimination  Fisher Linear Discrimination Aggregating Classifiers Results Conclusions

Microarray Background Nowadays, very little is known about genes functionality Biologists provides experimental information for analyze, in order to find biological function to genes Their tool - Microarray

Microarray Background The process:  DNA samples are taken from the test subjects  Samples are dyed with fluorescent colors, and placed on the Microarray, which is an array of DNA built for each experiment  Hybridization of DNA and cDNA The result:  Spots in the array are dyed in shades of Red to Green, relative to their expression level on the particular experiment

Microarray Background Microarray data is translated into an n x p table, where p is the number of genes in the experiment, and n is the number of samples Sample 2Sample Gene Gene Gene Gene 4

Clustering What to do with all this data? Find clusters in the n x p space Easy in low dimensions, but in our multi- dimensional space, it is much harder example for clusters in 3D

Clustering Why Clustering? Find patterns in our experiments Connect specific genes with specific results Mapping genes

Classifiers The tool – Classifiers Classifier is a function that splits the space into K disjoint sets Two approaches:  Supervised Learning (Discrimination Analysis): K is known learning set is used to classify new samples used to classify malignancies into known classes  Unsupervised Learning (Cluster Analysis): K is unknown the data “organizes itself” used for identification of new tumors Feature Selection – another use for classifiers  used for identification of marker genes

Classifiers We will discuss only about supervised learning Discrimination methods:  Fisher Linear Discrimination  Maximum Likelihood Discrimination  K Nearest Neighbor  Classification Trees Aggregating classifiers

Nearest Neighbor We use a predefined learning set, already classified New samples are being classified into the same classes of the learning set Each sample is classified its K nearest neighbors, according to a distance metric (usually Euclidian distance) The classification is made by majority of votes

Nearest Neighbor NN, example

Nearest Neighbor Cross-Validation: Method for finding the best K to use Test each of {1,...,T} as K, by running the algorithm T times on a known test set, and choosing the K which gives the best results

Classification Trees Partitioning of the space into K classes Intuitively presented as a tree Two aspects:  Constructing the tree from the training set  Using the tree to classify new samples Two building approaches:  Top-Down  Bottom-Up

Classification Trees Bottom-Up approach:  Start with n clusters  In each iteration: merge the two closest clusters, using a measure on clusters  Stop when a certain criteria is met Measures on clusters:  minimum pairwise distance  average pairwise distance  maximum pairwise distance

c1 c2 c3 c4 c5 c6 Classification Trees Bottom-Up approach, example

Classification Trees Top-Down approach:  In each iteration: Choose one attribute Divide the samples space according to this attribute Use each of the sub-groups just created as the samples space for the next iteration

Classification Trees Top-Down approach, example c1 c2 c3 c4 c5 c6

Classification Trees Three main aspects of tree construction:  split selection rule which attribute we should choose for splitting in each iteration?  split stopping rule when should we stop clustering?  class assignment rule which class will each leaf represent? Many variants:  CART (classification and regression trees)  ID3 (iterative dichotomizer)  C4.5 (Quinlan)

Classification Trees - CART Structure  Binary tree Splitting criterion  Gini index: for a node t and classes (1,...,k), let Gini index be where P(j|t) is the relative part of class j at node t  Split by a minimized Gini index of a node Stopping criterion  Relatively balanced tree

Classification Trees Classify new samples, example Left color Right color c1c2c3c4c5c6 blue red green blue yellow orange

Classification Trees Over Fitting: Bias-Variance trade-off  The deeper the tree the bigger its variance  The shorter the tree the bigger the bias  Balance trees will give the best results

Maximum Likelihood Probabilistic approach Suppose a training set is given, and we want to classify a sample x Lets compute the probability of a class ‘a’ when x is given, denoted as P(a|x). Compute it for each of the K classes, and assess x to the class with the highest resulting probability:

Maximum Likelihood Obstacle: P(a|x) is unknown Solution: Bayes rule Usage:  P(a) is fixed (the relative part of a in the test set)  P(x) is class independent so also fixed  P(x|a) is what we need to compute now

Maximum Likelihood Remember that x is a sample of p genes: If the genes’ densities were independent, then as a multiplication of the relative parts of samples on each gene Independence hypothesis:  makes computation possible  yields optimal classifiers when satisfied  but seldom satisfied in practice, as attributes (variables) are often correlated

Maximum Likelihood If the conditional densities of the classes are fully known, a learning set is not needed If the conditional densities are known, we still have to find their parameters More information may lead to some familiar results:  Densities with multivariate class densities  Densities with diagonal covariance matrices  Densities with the same diagonal covariance matrix

Fisher Linear Discrimination Lower the problem from multi- dimensional to single-dimensional  Let ‘v’ be a vector in our space  Project the data on the vector ‘v’  Estimate the ‘scatterness’ of the data as projected on ‘v’  Use this ‘v’ to create a classifier

Fisher Linear Discrimination Suppose we are in a 2D space Which of the three vectors is an optimal ‘v’?

Fisher Linear Discrimination The optimal vector maximizes the ratio of between-group-sum-of-squares to within- group-sum-of-squares, denoted within between within

Fisher Linear Discrimination Suppose a case two classes Mean of these classes samples: Mean of the projected samples: ‘Scatterness’ of the projected samples: Criterion function:

Fisher Linear Discrimination Criterion function should be maximized Present J as a function of a vector ‘v’

Fisher Linear Discrimination The matrix version of the criterion works the same for more than two classes J(v) is maximized when

Fisher Linear Discrimination Classification of a new observation ‘x’: Let the class of ‘x’ be the class whose mean vector is closest to ‘x’ in terms of the discriminant variables In other words, the class whose mean vector’s projection on ‘v’ is the closest to the projection of ‘x’ on ‘v’

Fisher Linear Discrimination Gene selection most of the genes in the experiment will not be significant reducing the number of genes reduces the error rate, and makes computations easier For example, selection by the ratio of each gene’s between-groups and within-groups sum of squares For each gene j, let and select the genes with the larger ratio

Fisher Linear Discrimination Error reduction Small number of samples makes the error more significant Noise will affect measurements of small values, and thus the WSS can be too big in some measurements This will make the selecting criterion of a gene bigger than its real importance to the discrimination Solution - Adding a minimal value to the WSS

Aggregating Classifiers A concept for enhancing performance of classification procedures A classification procedure uses some prior knowledge (i.e. training set) to get its classifier parameters Lets aggregate these parameters from more training sets into a stronger classifier

Aggregating Classifiers Bagging (Bootstrap Aggregating) algorithm  Generate B training sets from the original training set, by replacing some of the data in the training set with other data  Generate B classifiers,  Let x be a new sample to be classified. The class of x is the majority class of x on the B classifiers

Aggregating Classifiers Boosting, example training set T1 T2 Tb Classifier 1 Classifier 2 Classifier b Aggregated classifier

Aggregating Classifiers Weighted Bagging algorithm  Generate B training sets from the original training set, by replacing some of the data in the training set with other data  Save the replaced data from each set as a training set, T(1),...,T(b)  Generate B classifiers, C(1),...,C(b)  Give each classifier C(i) a weight w(i) according to its accuracy on the test set T(i)  Let x be a new sample to be classified. The class of x is the majority class of x on the B classifiers C(1),...,C(b), with respect to the weights w(1),...,w(b).

training set T1 T2 Tb Classifier 1 Classifier 2 Classifier b Aggregated classifier Aggregating Classifiers Improved Boosting, example Weight function

Imputation of Missing Data Most of the classifiers need information about each spot in the array in order to work properly Many methods of missing data imputation For example - Nearest Neighbor:  each missing value gets the majority value of its K nearest neighbors

Results Dudoit, Fridlyand and Speed (2002) Methods tested:  Fisher Linear Discrimination  Nearest Neighbor  CART classification tree  Aggregating classifiers Data sets:  Leukemia – Golub et al. (1999) 72 samples, 3,571 genes, 3 classes (B-cell ALL, T-cell ALL, AML)  Lymphoma – Alizadeh et al. (2000) 81 samples, 4,682 genes, 3 classes (B-CLL, FL, DLBCL)  NCI 60 – Ross et al. (2000) 64 samples, 5,244 genes, 8 classes

Results - Leukemia data set

Results - Lymphoma data set

Results - NCI 60 data set

Conclusions “Diagonal” LDA: ignoring correlation between genes improved error rates Unlike classification trees and nearest neighbors, LDA is unable to take into account gene interactions Although nearest neighbor is s simple and intuitive classifier, its main limitation is that it give very little insight into mechanisms underlying the class distinctions

Conclusions Classification trees are capable of handling and revealing interactions between variables Variable selection: a crude criterion such as BSS/WSS may not identify the genes that discriminate between all the classes and may not reveal interactions between genes With larger training sets, expect improvement in performance of aggregated classifiers