Hierarchical Clustering

Slides:



Advertisements
Similar presentations
CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering.
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Hierarchical Clustering, DBSCAN The EM Algorithm
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
Chapter 3: Cluster Analysis
Introduction to Bioinformatics
IT 433 Data Warehousing and Data Mining Hierarchical Clustering Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department.
2004/05/03 Clustering 1 Clustering (Part One) Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information.
Clustering II.
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Image Segmentation Chapter 14, David A. Forsyth and Jean Ponce, “Computer Vision: A Modern Approach”.
Cluster Analysis.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 17: Hierarchical Clustering 1.
Cluster Analysis CS240B Lecture notes based on those by © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Clustering Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Clustering Unsupervised learning Generating “classes”
Cluster Analysis Part II. Learning Objectives Hierarchical Methods Density-Based Methods Grid-Based Methods Model-Based Clustering Methods Outlier Analysis.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
Lecture 11. Microarray and RNA-seq II
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies A hierarchical clustering method. It introduces two concepts : Clustering feature Clustering.
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Information Retrieval and Organisation Chapter 17 Hierarchical Clustering Dell Zhang Birkbeck, University of London.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Hierarchical Clustering
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining and Text Mining. The Standard Data Mining process.
COMP24111 Machine Learning K-means Clustering Ke Chen.
Clustering CSC 600: Data Mining Class 21.
Hierarchical Clustering
Ke Chen Reading: [7.3, EA], [9.1, CMB]
Data Clustering Michael J. Watts
CS 685: Special Topics in Data Mining Jinze Liu
Clustering.
Hierarchical and Ensemble Clustering
Revision (Part II) Ke Chen
Information Organization: Clustering
Ke Chen Reading: [7.3, EA], [9.1, CMB]
Revision (Part II) Ke Chen
DATA MINING Introductory and Advanced Topics Part II - Clustering
Hierarchical and Ensemble Clustering
Data Mining – Chapter 4 Cluster Analysis Part 2
Hierarchical Clustering
Hierarchical Clustering
SEEM4630 Tutorial 3 – Clustering.
Hierarchical Clustering
CS 685: Special Topics in Data Mining Jinze Liu
Presentation transcript:

Hierarchical Clustering Ke Chen COMP24111 Machine Learning

Outline Introduction Cluster Distance Measures Agglomerative Algorithm Example and Demo Relevant Issues Summary COMP24111 Machine Learning

Introduction Hierarchical Clustering Approach A typical clustering analysis approach via partitioning data set sequentially Construct nested partitions layer by layer via grouping objects into a tree of clusters (without the need to know the number of clusters in advance) Use (generalised) distance matrix as clustering criteria Agglomerative vs. Divisive Two sequential clustering strategies for constructing a tree of clusters Agglomerative: a bottom-up strategy Initially each data object is in its own (atomic) cluster Then merge these atomic clusters into larger and larger clusters Divisive: a top-down strategy Initially all objects are in one single cluster Then the cluster is subdivided into smaller and smaller clusters COMP24111 Machine Learning

Introduction Illustrative Example Agglomerative and divisive clustering on the data set {a, b, c, d ,e } Step 0 Step 1 Step 2 Step 3 Step 4 b d c e a a b d e c d e a b c d e Agglomerative Divisive Cluster distance Termination condition COMP24111 Machine Learning

Cluster Distance Measures single link (min) complete link (max) average Single link: smallest distance between an element in one cluster and an element in the other, i.e., d(Ci, Cj) = min{d(xip, xjq)} Complete link: largest distance between an element in one cluster and an element in the other, i.e., d(Ci, Cj) = max{d(xip, xjq)} Average: avg distance between elements in one cluster and elements in the other, i.e., d(Ci, Cj) = avg{d(xip, xjq)} Define D(C, C)=0, the distance between two exactly same clusters is ZERO! d(C, C)=0 COMP24111 Machine Learning

Cluster Distance Measures Example: Given a data set of five objects characterised by a single feature, assume that there are two clusters: C1: {a, b} and C2: {c, d, e}. 1. Calculate the distance matrix. 2. Calculate three cluster distances between C1 and C2. a b c d e Feature 1 2 4 5 6 Single link Complete link Average a b c d e 1 3 4 5 2 COMP24111 Machine Learning

Agglomerative Algorithm The Agglomerative algorithm is carried out in three steps: Convert all object features into a distance matrix Set each object as a cluster (thus if we have N objects, we will have N clusters at the beginning) Repeat until number of cluster is one (or known # of clusters) Merge two closest clusters Update “distance matrix” COMP24111 Machine Learning

Example Problem: clustering analysis with agglomerative algorithm data matrix Euclidean distance distance matrix COMP24111 Machine Learning

Example Merge two closest clusters (iteration 1) COMP24111 Machine Learning

Example Update distance matrix (iteration 1) COMP24111 Machine Learning

Example Merge two closest clusters (iteration 2) COMP24111 Machine Learning

Example Update distance matrix (iteration 2) COMP24111 Machine Learning

Example Merge two closest clusters/update distance matrix (iteration 3) COMP24111 Machine Learning

Example Merge two closest clusters/update distance matrix (iteration 4) COMP24111 Machine Learning

Example Final result (meeting termination condition) COMP24111 Machine Learning

Example Dendrogram tree representation lifetime object 2 3 4 5 6 object lifetime In the beginning we have 6 clusters: A, B, C, D, E and F We merge clusters D and F into cluster (D, F) at distance 0.50 We merge cluster A and cluster B into (A, B) at distance 0.71 We merge clusters E and (D, F) into ((D, F), E) at distance 1.00 We merge clusters ((D, F), E) and C into (((D, F), E), C) at distance 1.41 We merge clusters (((D, F), E), C) and (A, B) into ((((D, F), E), C), (A, B)) at distance 2.50 The last cluster contain all the objects, thus conclude the computation For a dendrogram tree, its horizontal axis indexes all objects in a given data set, while its vertical axis expresses the lifetime of all possible cluster formation. The lifetime of a cluster (individual cluster) in the dendrogram is defined as a distance interval from the moment that the cluster is created to the moment that it disappears by merging with other clusters. COMP24111 Machine Learning

Example Dendrogram tree representation: “clustering” USA states 1) The y-axis is a measure of closeness of either individual data points or clusters. 2) California and Arizona are equally distant from Florida because CA and AZ are in a cluster before either joins FL. 3) Hawaii does join rather late; at about 50. This means that the cluster it joins is closer together before HI joins. But not much closer. Note that the cluster it joins (the one all the way on the right) only forms at about 45. The fact that HI joins a cluster later than any other state simply means that (using whatever metric you selected) HI is not that close to any particular state. COMP24111 Machine Learning

Exercise Given a data set of five objects characterised by a single feature: Apply the agglomerative algorithm with single-link, complete-link and averaging cluster distance measures to produce three dendrogram trees, respectively. a b C d e Feature 1 2 4 5 6 a b c d e 1 3 4 5 2 COMP24111 Machine Learning

Demo Agglomerative Demo COMP24111 Machine Learning

Relevant Issues How to determine the number of clusters If the number of clusters known, termination condition is given! The K-cluster lifetime as the range of threshold value on the dendrogram tree that leads to the identification of K clusters Heuristic rule: cut a dendrogram tree with maximum K-cluster life time Emphasize the difference between individual cluster life time and K-cluster life time! COMP24111 Machine Learning

Summary Hierarchical algorithm is a sequential clustering algorithm Use distance matrix to construct a tree of clusters (dendrogram) Hierarchical representation without the need of knowing # of clusters (can set termination condition with known # of clusters) Major weakness of agglomerative clustering methods Can never undo what was done previously Sensitive to cluster distance measures and noise/outliers Less efficient: O (n2 ), where n is the number of total objects There are several variants to overcome its weaknesses BIRCH: scalable to a large data set ROCK: clustering categorical data CHAMELEON: hierarchical clustering using dynamic modelling Online tutorial: the hierarchical clustering functions in Matlab https://www.youtube.com/watch?v=aYzjenNNOcc COMP24111 Machine Learning