Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

Basic Gene Expression Data Analysis--Clustering
Cluster Analysis: Basic Concepts and Algorithms
PARTITIONAL CLUSTERING
Cluster analysis for microarray data Anja von Heydebreck.
Machine Learning and Data Mining Clustering
Introduction to Bioinformatics
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Clustering II.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Clustering.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Introduction to Bioinformatics - Tutorial no. 12
What is Cluster Analysis?
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
What is Cluster Analysis?
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering and MDS Exploratory Data Analysis. Outline What may be hoped for by clustering What may be hoped for by clustering Representing differences.
Clustering Unsupervised learning Generating “classes”
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
Clustering II. 2 Finite Mixtures Model data using a mixture of distributions –Each distribution represents one cluster –Each distribution gives probabilities.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Data Clustering 2 – K Means contd & Hierarchical Methods Data Clustering – An IntroductionSlide 1.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
More About Clustering Naomi Altman Nov '06. Assessing Clusters Some things we might like to do: 1.Understand the within cluster similarity and between.
Clustering Gene Expression Data BMI/CS 576 Colin Dewey Fall 2010.
Clustering.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Clustering / Scaling. Cluster Analysis Objective: – Partitions observations into meaningful groups with individuals in a group being more “similar” to.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Unsupervised Learning
PREDICT 422: Practical Machine Learning
Data Mining K-means Algorithm
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Principal Component Analysis (PCA)
Revision (Part II) Ke Chen
Clustering and Multidimensional Scaling
Revision (Part II) Ke Chen
Multivariate Statistical Methods
Dimension reduction : PCA and Clustering
Text Categorization Berlin Chen 2003 Reference:
Unsupervised Learning
Presentation transcript:

Clustering Petter Mostad

Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning set of objects with known classes Goal: put new objects into existing classes Goal: put new objects into existing classes Also called: Supervised learning, or classification Also called: Supervised learning, or classification Clustering: Clustering: No learning set, no given classes No learning set, no given classes Goal: discover the ”best” classes or groupings Goal: discover the ”best” classes or groupings Also called: Unsupervised learning, or class discovery Also called: Unsupervised learning, or class discovery

Overview General clustering theory General clustering theory Steps, methods, algorithms, issues... Steps, methods, algorithms, issues... Clustering microarray data Clustering microarray data Recommendations for this kind of data Recommendations for this kind of data Programs for clustering Programs for clustering Some other visualization techniques Some other visualization techniques

Issues in clustering Used to explore and visualize data, with few preconceptions Used to explore and visualize data, with few preconceptions Many subjective choices must be made, so a clustering output tends to be subjective Many subjective choices must be made, so a clustering output tends to be subjective It is difficult to get truly statistically ”significant” conclusions It is difficult to get truly statistically ”significant” conclusions Algorithms will always produce clusters, whether any exist in the data or not Algorithms will always produce clusters, whether any exist in the data or not

Steps in clustering 1. Feature selection and extraction 2. Defining and computing similarities 3. Clustering or grouping objects 4. Assessing, presenting, and using the result

1. Feature selection and extraction Deciding which measurements matter for similarity Deciding which measurements matter for similarity Data reduction Data reduction Filtering away objects Filtering away objects Normalization of measurements Normalization of measurements

The data matrix Every row contains the measurements for one object. Every row contains the measurements for one object. Similarities are computed between all pairs of rows Similarities are computed between all pairs of rows If measurements are of same type, one can instead cluster them! If measurements are of same type, one can instead cluster them! measurements objects

2. Defining and computing similarities Similarity measures for continuous data vectors: Similarity measures for continuous data vectors: Euclidean distance Euclidean distance Minkowski distance (including Manhattan metric) Minkowski distance (including Manhattan metric) Mahalanobis distance where S is a covariance matrix Mahalanobis distance where S is a covariance matrix

Centered and non-centered (absolute) Pearson correlation Centered and non-centered (absolute) Pearson correlation centered: centered: non-centered: non-centered:where Spearman rank correlation Spearman rank correlation Compute the ranking of the numbers in each vector Compute the ranking of the numbers in each vector Find correlation between ranking numbers Find correlation between ranking numbers

Geometrical view of clustering If measurements are coordinates, objects become points in some space If measurements are coordinates, objects become points in some space If the simiarity measure is Euclidean distance, the goal is to group nearby points If the simiarity measure is Euclidean distance, the goal is to group nearby points Note: When we have only 2 or 3 measurements per object, we can do better than most algorithms using visual inspection Note: When we have only 2 or 3 measurements per object, we can do better than most algorithms using visual inspection

Similarity measures for discrete data Comparing two binary vectors, count the numbers a,b,c,d of 1-1’s, 1-0’s, 0-1’s, and 0-0’s, respectively Comparing two binary vectors, count the numbers a,b,c,d of 1-1’s, 1-0’s, 0-1’s, and 0-0’s, respectively Construct different similarity measurements based on these numbers: Construct different similarity measurements based on these numbers: Similarity of for example trees or other objects can be defined in reasonable ways Similarity of for example trees or other objects can be defined in reasonable ways

Similarities using contexts Mutual Neighbour Distance: Mutual Neighbour Distance: where is the neighbour number of x with respect to y This is not a metric, but similarities do not need to be based on metrics. This is not a metric, but similarities do not need to be based on metrics.

3. Clustering or grouping Hierarchical clusterings Hierarchical clusterings Divisive: Starts with one big cluster and subdivides on cluster in each step Divisive: Starts with one big cluster and subdivides on cluster in each step Agglomerative: Starts with each object in separate cluster. In each step, joins the two closest clusters Agglomerative: Starts with each object in separate cluster. In each step, joins the two closest clusters Partitional clusterings Partitional clusterings Probabilistic or fuzzy clusterings Probabilistic or fuzzy clusterings

Hierarchical clustering Agglomerative clustering depends on type of linkage, i.e., how to compute the distance between merged cluster (UV) and old cluster (W): Agglomerative clustering depends on type of linkage, i.e., how to compute the distance between merged cluster (UV) and old cluster (W): d(UV, W) = min(d(U, W), d(V,W)) (single linkage) d(UV, W) = min(d(U, W), d(V,W)) (single linkage) d(UV, W) = max(d(U,W), d(V,W)) (complete linkage) d(UV, W) = max(d(U,W), d(V,W)) (complete linkage) d(UV, W) = average over all distances between objects in (UV) and objects in W (average linkage, or UPGMA: Unweighted Pair Group Method with Arithmetic mean) d(UV, W) = average over all distances between objects in (UV) and objects in W (average linkage, or UPGMA: Unweighted Pair Group Method with Arithmetic mean) The output is a dendrogram The output is a dendrogram A simplification of average linkage is often implemented (“average group linkage”): It may lead to inverted dendrograms! A simplification of average linkage is often implemented (“average group linkage”): It may lead to inverted dendrograms!

Dendrograms, visualizations The data matrix is often visualized using three colors, representing positive, negative, and zero values. The data matrix is often visualized using three colors, representing positive, negative, and zero values. Hierarchical clustering results often represented with a dendrogram. The similarity at which clusters merge should correspond to height of corresponding horizontal line in dendrogram! Hierarchical clustering results often represented with a dendrogram. The similarity at which clusters merge should correspond to height of corresponding horizontal line in dendrogram! To display the dendrogram, the objects (lines or columns) need to be sorted, this can be done in two ways at every time when two clusters are merged. To display the dendrogram, the objects (lines or columns) need to be sorted, this can be done in two ways at every time when two clusters are merged.

Ward’s hierarchical clustering Agglomerative. Agglomerative. Goal: minimize ”Error Sum of Squares” (ESS) at every step. Goal: minimize ”Error Sum of Squares” (ESS) at every step. ESS = The sum over all clusters, of the sum of the squares of the distances from the objects to the cluster centroid. ESS = The sum over all clusters, of the sum of the squares of the distances from the objects to the cluster centroid. When joining two clusters, find the pair that results in the smallest increase in ESS. When joining two clusters, find the pair that results in the smallest increase in ESS.

Partitional clusterings The number of desired clusters is fixed at the start The number of desired clusters is fixed at the start K-means clustering: K-means clustering: Partition into k initial clusters Partition into k initial clusters Iteratively, reassign points to groups with the closest centroid. Recompute centroids. Iteratively, reassign points to groups with the closest centroid. Recompute centroids. Repeat until stability Repeat until stability The result may depend on initial clusters The result may depend on initial clusters May include a procedure joining or splitting clusters according to size May include a procedure joining or splitting clusters according to size The choice of number of clusters may not be obvious The choice of number of clusters may not be obvious

Probabilistic or fuzzy clustering The output is, for each object and each cluster, a probability or weight that the object belongs to the cluster The output is, for each object and each cluster, a probability or weight that the object belongs to the cluster Example: The observations are modelled as produced by drawing from a number of probability densities (often multivariate normal). Parameters are then estimated with Maximum Likelihood (for example using EM algorithm). Example: The observations are modelled as produced by drawing from a number of probability densities (often multivariate normal). Parameters are then estimated with Maximum Likelihood (for example using EM algorithm). Example: A ”fuzzy” version of k-means, where weights for objects are changed iteratively Example: A ”fuzzy” version of k-means, where weights for objects are changed iteratively

Neural networks for clustering Neural networks are mathematical models made to be similar to actual neural networks Neural networks are mathematical models made to be similar to actual neural networks They consist of layers of nodes that send out ”signals” based probabilistically on input signals They consist of layers of nodes that send out ”signals” based probabilistically on input signals Most known uses are classifications, i.e., with learning sets Most known uses are classifications, i.e., with learning sets

Self-Organising Maps (SOM)

Clustering as optimization Given similarity definition and definition of what is an ”optimal” clustering, it can often be a huge algorithmic challenge to find the optimum. Given similarity definition and definition of what is an ”optimal” clustering, it can often be a huge algorithmic challenge to find the optimum. Example: Subdivide many thousand objects into 50 clusters, minimizing e.g. the sum of the squared distances to centroids. Example: Subdivide many thousand objects into 50 clusters, minimizing e.g. the sum of the squared distances to centroids. Then, algorithms for optimization are central. Then, algorithms for optimization are central.

Genetic algorithms Tries to use ”evolution” to obtain good solutions to a problem Tries to use ”evolution” to obtain good solutions to a problem A number of solutions are kept at every step: They may then mate or mutate, to produce new solutions. The ”fittest” solutions are kept. A number of solutions are kept at every step: They may then mate or mutate, to produce new solutions. The ”fittest” solutions are kept. Can be seen as an optimization algorithm Can be seen as an optimization algorithm A great challenge to design ways of mating and mutating that produce an efficient algorithm A great challenge to design ways of mating and mutating that produce an efficient algorithm

Simulated annealing A general optimization technique A general optimization technique Iterative: At every step, nearby solutions are chosen with probabilities depending on their optimality (so even less optimal solutions may be chosen) Iterative: At every step, nearby solutions are chosen with probabilities depending on their optimality (so even less optimal solutions may be chosen) As the algorithm proceeds, and the ”temperature” sinks, the probability of choosing less optimal solutions also sinks. As the algorithm proceeds, and the ”temperature” sinks, the probability of choosing less optimal solutions also sinks. Is a good general way to avoid local optima. Is a good general way to avoid local optima.

4. Assessing and using the result Visualization and summarization of the clusters Visualization and summarization of the clusters Note: You should always investigate the dependence of your results on the choices you have made for the clustering! Note: You should always investigate the dependence of your results on the choices you have made for the clustering!

Examples of applications of clustering Image analysis Image analysis Speech recognition Speech recognition Data mining Data mining

Clustering microarray data Samples are columns, genes are rows, in data matrix Samples are columns, genes are rows, in data matrix What values to cluster? What values to cluster? What is a biologically relevant measure of similarity? What is a biologically relevant measure of similarity? One can cluster genes and/or samples One can cluster genes and/or samples samples genes

Clustering microarray data Use logged data, usually Use logged data, usually Data should be on same scale (but usually is if you use data that is already normalized) Data should be on same scale (but usually is if you use data that is already normalized) You may have to filter away genes that show too little variation over samples. You may have to filter away genes that show too little variation over samples. Use an appropriate distance measure for the question you want to focus on (Pearson correlation often works OK). Use an appropriate distance measure for the question you want to focus on (Pearson correlation often works OK). Use appropriate clustering algorithm (Hierarchical average linkage usually works OK). Use appropriate clustering algorithm (Hierarchical average linkage usually works OK). If you draw some conclusion from the clustering results, try to vary your clustering choices to see how stable these results are. If you draw some conclusion from the clustering results, try to vary your clustering choices to see how stable these results are. Clustering works best as a tool to generate hypotheses and ideas, which may then be tested in other ways. Clustering works best as a tool to generate hypotheses and ideas, which may then be tested in other ways.

Clustering tumor samples

Clustering to confirm or reject hypotheses? A clustering may appear to validate, or be validated by, a grouping derived by using other data A clustering may appear to validate, or be validated by, a grouping derived by using other data Caution: The many different ways to do a clustering may make it possible to tweak it to produce the clusters you want Caution: The many different ways to do a clustering may make it possible to tweak it to produce the clusters you want There is a huge and complex multiple testing problem There is a huge and complex multiple testing problem Note that small changes in data can change result dramatically Note that small changes in data can change result dramatically If you insist on trying to get ”significance”: If you insist on trying to get ”significance”: Using permutations of data Using permutations of data Using resampling of data (bootstrapping) Using resampling of data (bootstrapping)

How to do clustering: Programs A good program for clustering and visualization: HCE A good program for clustering and visualization: HCE Great visualization options Great visualization options Adapted to microarray data Adapted to microarray data Can import similarity matrices Can import similarity matrices Classic for microarray data: Cluster & TreeView (Eisen) Classic for microarray data: Cluster & TreeView (Eisen) R/BioConductor: package cluster, hclust function, heatmap function,... R/BioConductor: package cluster, hclust function, heatmap function,... Many other programs/packages Many other programs/packages

Other visualization techniques: Principal Components The principal components can be viewed as the axes of a “better” coordinate system for the data. The principal components can be viewed as the axes of a “better” coordinate system for the data. “Better” in the sense that the data is maximally spread out along the first principal components. “Better” in the sense that the data is maximally spread out along the first principal components. The principal components correspond to eigenvectors of the covariance matrix of the data. The principal components correspond to eigenvectors of the covariance matrix of the data. The eigenvalues represent the part of the total variance explained by each of the principal components. The eigenvalues represent the part of the total variance explained by each of the principal components.

Principal component analysis of expression data

Other visualization techniques: Multidimensional scaling Start with some points in a very high dimension. Start with some points in a very high dimension. Goal: Display these points in a lower dimension, so that distances between them are similar to distances in original dimension. Goal: Display these points in a lower dimension, so that distances between them are similar to distances in original dimension. May also try to preserve only the ranking of the pairwise distances. May also try to preserve only the ranking of the pairwise distances. Makes it possible to use powerful visual inspection, in 2 or 3 dimensions. Makes it possible to use powerful visual inspection, in 2 or 3 dimensions. Can sometimes give very convincing pictures separating samples in a predicted way. Can sometimes give very convincing pictures separating samples in a predicted way.