Classification (Supervised Clustering) Naomi Altman Nov '06.

Slides:



Advertisements
Similar presentations
Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.
Advertisements

Component Analysis (Review)
Achim Tresch Computational Biology ‘Omics’ - Analysis of high dimensional Data.
Dimension reduction (1)
Classification of Cancer Patients Naomi Altman Nov. 06.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
More on ANOVA. Overview ANOVA as Regression Comparison Methods.
Classification and risk prediction
Principal Component Analysis
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Discrim Continued Psy 524 Andrew Ainsworth. Types of Discriminant Function Analysis They are the same as the types of multiple regression Direct Discrim.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Classification 10/03/07.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Basics of discriminant analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2014.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Outline Separating Hyperplanes – Separable Case
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Principles of Pattern Recognition
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.
Microarray Workshop 1 Introduction to Classification Issues in Microarray Data Analysis Jane Fridlyand Jean Yee Hwa Yang University of California, San.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Multiple Discriminant Analysis
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
Discriminant Analysis
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Classification of tissues and samples 指導老師:藍清隆 演講者:張許恩、王人禾.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Classification Methods
CSE 554 Lecture 8: Alignment
Lecture 2. Bayesian Decision Theory
PREDICT 422: Practical Machine Learning
Object Orie’d Data Analysis, Last Time
Machine Learning Logistic Regression
LECTURE 10: DISCRIMINANT ANALYSIS
CH 5: Multivariate Methods
Gene Set Enrichment Analysis
Machine Learning Logistic Regression
REMOTE SENSING Multispectral Image Classification
REMOTE SENSING Multispectral Image Classification
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Feature space tansformation methods
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Machine Learning – a Probabilistic Perspective
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

Classification (Supervised Clustering) Naomi Altman Nov '06

Objective Starting from a sample from known groups: 1) Select a set of genes that identify the groups 2) Compute a function of the expression values that can be used to classify a new sample. e.g. Normal and cancer prostate tissues from 24 patients 1) a) Find the set of genes that may be involved in the disease process (differential expression analysis). b) Find a set of genes that mark the disease (possibly not all the genes involved.) 2) Take a sample from a new patient - does this person have prostate cancer?

The Main Picture For Linear Discriminant Analysis separating hyperplane linear discrimination direction To classify a new point, see what side of the hyperplane it lies on

The Main Picture For Support Vector Machines separating hyperplane To classify a new point, see what side of the hyperplane it lies on

The Main Picture For Linear Discriminant Analysis To classify a new point, see what side of the hypercurve it lies on

The Main Picture For Recursive Partitioning To classify a new point, see the classification of its partition in space

Linear and Quadratic Discriminant Analysis, Logistic Regression Each sample belongs to A or B. Linear and quadratic discriminant analysis are essentially regressions on a 0/1 indicator variable. Suppose we have samples of sizes m from A and n from B. Sample ts (t=group, s=sample within group) has gene expression values Y 1ts... Y Gts For each group we can compute the mean expression values for each gene,Ŷ t, the variance of each gene, s it 2 and the covariance between genes s ijt. We can also compute the pooled variance and covariance of each gene, which is essentially the average over the 2 groups.

Linear Discriminant Analysis where S is the pooled variance matrix is the linear discriminant function. In the simplest case, we classify each sample depending on whether it is above (A) or below (B) the midpoint of the line, which is If the 2 conditions are not equally likely, we may wish to weight so that we classify new samples proportionally to the expected percentages.

Linear Discriminant Analysis This is extended to p groups by considering the discriminant score, which is another SVD decomposition and is similar to multivariate ANOVA. 1. Consider the covariance matrix of the sample means weighted by the sample sizes. between variance between covariance Assemble these into the Between group variance matrix V. 2. Consider the pooled covariance matrix S, (which in this context is often called W for Within group variance matrix).

Now consider the SVD of S -1/2 BS -1/2. (It is symmetric, so the left and right eigenvectors are the same.) The first eigenvector is the direction of greatest separation of the means, in terms of the axes of the ellipses defining the groups. The 2nd eigenvector is the direction of 2nd greatest separation that is orthogonal to the first. etc. The rank of B is p-1, so there are only p-1 non-zero eigenvalues. Each sample is assigned to the group with nearest mean in the eigenvector coordinates. This is equivalent to looking at the combinations of the pairwise discriminant functions and mapping every sample to the group with the nearest mean. Linear Discriminant Analysis

SVD LDA LDA Regions As in the 2-group case, you can weight the discriminant scores by the prior probability of group membership

Quadratic Discriminant Analysis Is very similar to linear discriminant analysis, except that every group is allowed to have its own variance matrix, allowing the ellipses to have a different orientation.

Logistic Regression Let  t be the probability of membership in group t. Use maximum likelihood to fit log(  t /(1-  t )) =  0 +  i Y its Classify a sample into group t if the predicted log(  t /(1-  t )) is the maximum over all groups. Again, we can weight by prior probability.

Recursive Partitioning PL< 2.45 se(50 0 0) PW< 1.75 ve (0 49 5) vi (0 1 45)

Assessing Accuracy Count the number of misclassifications of the training sample (optimistic). Cross-validation: Do not use a fraction of the data (test data). "Train" using the remainder of the sample, with the same rule used for the complete data. Count the number of misclassifications of the test data. Repeat. About 1/3 test data appears to be best.

But... a) If the number of genes exceeds the number of samples, we always "overfit" - e.g. with logistic regression we can almost always achieve perfect classification b) Rank S=min(row rank, col rank) so S is not invertible (LDA) and neither are the within treatment variance matrices (QDA) b) Most of the methods use all of the genes. i.e. With microarray data, we will need to select a smaller set of genes to work with. For medical diagnostics we often want a very small set of markers.

Reducing the Number of Genes 1.With n samples, use the n-k most significantly differentially expressing genes. 2. Cluster the genes and take the most significantly differentially expressing gene in each cluster. 3. Add variables to your discrimination function stepwise. 4. PAM - shrink the group center to the overall center, and then apply a robust QDA with moderated variance estimates (like SAM). The method ends up with the within group centroid=total centroid for most genes. So the differences among groups rely only on the other genes, which are the only genes used in the QDA. Problem: (All methods) Often replicability is lost when studies are repeated. e.g. we can tell the difference between ALL and AML in all studies, but different discriminant functions are required, maybe different genes.