COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong

Slides:



Advertisements
Similar presentations
Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Advertisements

Clustering II.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Hierarchical Clustering
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Introduction to Bioinformatics
AEB 37 / AE 802 Marketing Research Methods Week 7
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
6-1 ©2006 Raj Jain Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different.
COMP 578 Discovering Clusters in Databases
Clustering II.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Introduction to Bioinformatics - Tutorial no. 12
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
CLUSTERING (Segmentation)
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Clustering Unsupervised learning Generating “classes”
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Cluster analysis 포항공과대학교 산업공학과 확률통계연구실 이 재 현. POSTECH IE PASTACLUSTER ANALYSIS Definition Cluster analysis is a technigue used for combining observations.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
tch?v=Y6ljFaKRTrI Fireflies.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Fuzzy C-Means Clustering
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Machine Learning Queens College Lecture 7: Clustering.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L10.1 Lecture 10: Cluster analysis l Uses of cluster analysis.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Clustering Patrice Koehl Department of Biological Sciences National University of Singapore
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Unsupervised Learning
Data Mining: Basic Cluster Analysis
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Chapter 15 – Cluster Analysis
Clustering 1 (Introduction and kmean)
Clustering and Multidimensional Scaling
Information Organization: Clustering
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Data Mining – Chapter 4 Cluster Analysis Part 2
Text Categorization Berlin Chen 2003 Reference:
SEEM4630 Tutorial 3 – Clustering.
Cluster analysis Presented by Dr.Chayada Bhadrakom
Hierarchical Clustering
Unsupervised Learning
Presentation transcript:

COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP53312 Clustering ComputerHistory Raymond Louis 9045 Wyman 2095 ……… Computer History Cluster 1 (e.g. High Score in Computer and Low Score in History) Cluster 2 (e.g. High Score in History and Low Score in Computer) Problem: to find all clusters

COMP53313 Why Clustering? Clustering for Utility Summarization Compression

COMP53314 Why Clustering? Clustering for Understanding Applications Biology Group different species Psychology and Medicine Group medicine Business Group different customers for marketing Network Group different types of traffic patterns Software Group different programs for data analysis

COMP53315 Clustering Methods K-means Clustering Original k-means Clustering Sequential K-means Clustering Forgetful Sequential K-means Clustering Hierarchical Clustering Methods Agglomerative methods Divisive methods – polythetic approach and monothetic approach

COMP53316 K-mean Clustering Suppose that we have n example feature vectors x 1, x 2, …, x n, and we know that they fall into k compact clusters, k < n Let m i be the mean of the vectors in cluster i. we can say that x is in cluster i if distance from x to m i is the minimum of all the k distances.

COMP53317 Procedure for finding k-means Make initial guesses for the means m 1, m 2,.., m k Until there is no change in any mean Assign each data point to the cluster whose mean is the nearest Calculate the mean of each cluster For i from 1 to k Replace m i with the mean of all examples for cluster i

COMP53318 Procedure for finding k-means k=2 Arbitrarily choose k means Assign each objects to most similar center Update the cluster means reassign

COMP53319 Initialization of k-means The way to initialize the means was not specified. One popular way to start is to randomly choose k of the examples The results produced depend on the initial values for the means, and it frequently happens that suboptimal partitions are found. The standard solution is to try a number of different starting points

COMP Disadvantages of k-means Disadvantages In a “ bad ” initial guess, there are no points assigned to the cluster with the initial mean m i. The value of k is not user-friendly. This is because we do not know the number of clusters before we want to find clusters.

COMP Clustering Methods K-means Clustering Original k-means Clustering Sequential K-means Clustering Forgetful Sequential K-means Clustering Hierarchical Clustering Methods Agglomerative methods Divisive methods – polythetic approach and monothetic approach

COMP Sequential k-Means Clustering Another way to modify the k-means procedure is to update the means one example at a time, rather than all at once. This is particularly attractive when we acquire the examples over a period of time, and we want to start clustering before we have seen all of the examples Here is a modification of the k-means procedure that operates sequentially

COMP Sequential k-Means Clustering Make initial guesses for the means m 1, m 2, …, m k Set the counts n 1, n 2,.., n k to zero Until interrupted Acquire the next example, x If m i is closest to x Increment n i Replace m i by m i + (1/n i ) x (x – m i )

COMP Clustering Methods K-means Clustering Original k-means Clustering Sequential K-means Clustering Forgetful Sequential K-means Clustering Hierarchical Clustering Methods Agglomerative methods Divisive methods – polythetic approach and monothetic approach

COMP Forgetful Sequential k-means This also suggests another alternative in which we replace the counts by constants. In particular, suppose that a is a constant between 0 and 1, and consider the following variation: Make initial guesses for the means m 1, m 2, …, m k Until interrupted Acquire the next example x If m i is closest to x, replace m i by m i +a(x-m i )

COMP Forgetful Sequential k-means The result is called the “ forgetful ” sequential k- means procedure. It is not hard to show that m i is a weighted average of the examples that were closest to m i, where the weight decreases exponentially with the “ age ” to the example. That is, if m 0 is the initial value of the mean vector and if x j is the j-th example out of n examples that were used to form m i, then it is not hard to show that m n = (1-a) n m 0 +a  k=1 n (1-a) n-k x k

COMP Forgetful Sequential k-means Thus, the initial value m 0 is eventually forgotten, and recent examples receive more weight than ancient examples. This variation of k-means is particularly simple to implement, and it is attractive when the nature of the problem changes over time and the cluster centres “ drift ”.

COMP Clustering Methods K-means Clustering Original k-means Clustering Sequential K-means Clustering Forgetful Sequential K-means Clustering Hierarchical Clustering Methods Agglomerative methods Divisive methods – polythetic approach and monothetic approach

COMP Hierarchical Clustering Methods The partition of data is not done at a single step. There are two varieties of hierarchical clustering algorithms Agglomerative – successively fusions of the data into groups Divisive – separate the data successively into finer groups

COMP Dendrogram Hierarchic grouping can be represented by two-dimensional diagram known as a dendrogram Distance Dendrogram

COMP Distance Single Linkage Complete Linkage Group Average Linkage Centroid Linkage Median Linkage

COMP Single Linkage Also, known as the nearest neighbor technique Distance between groups is defined as that of the closest pair of data, where only pairs consisting of one record from each group are considered Cluster A Cluster B

COMP (12) Distance Dendrogram 1 2

COMP (12) Distance Dendrogram

COMP (12) Distance Dendrogram (12) 3 (4 5) (12)3 (4 5)

COMP Distance Dendrogram (12) 3 (4 5) (12)3 (4 5)

COMP Distance 3 Dendrogram (12) 3 (4 5) (12)3 (4 5) (12) (3 4 5) (12)(3 4 5)

COMP Distance 3 Dendrogram (12) (3 4 5) (12)(3 4 5)

COMP Distance 3 Dendrogram (12) (3 4 5) (12)(3 4 5)

COMP Distance Single Linkage Complete Linkage Group Average Linkage Centroid Linkage Median Linkage

COMP Complete Linkage The distance between two clusters is given by the distance between their most distant members Cluster A Cluster B

COMP (12) Distance Dendrogram 1 2

COMP (12) Distance Dendrogram

COMP (12) Distance Dendrogram (12) 3 (4 5) (12)3 (4 5)

COMP Distance Dendrogram (12) 3 (4 5) (12)3 (4 5)

COMP Distance 3 Dendrogram (12) (3 4 5) (12)(3 4 5) (12) 3 (4 5) (12)3 (4 5)

COMP Distance 3 Dendrogram (12) (3 4 5) (12)(3 4 5)

COMP Distance 3 Dendrogram (12) (3 4 5) (12)(3 4 5)

COMP Distance Single Linkage Complete Linkage Group Average Linkage Centroid Linkage Median Linkage

COMP Group Average Clustering The distance between two clusters is defined as the average of the distances between all pairs of records (one from each cluster). d AB = 1/6 (d 13 + d 14 + d 15 + d 23 + d 24 + d 25 ) Cluster A Cluster B

COMP Distance Single Linkage Complete Linkage Group Average Linkage Centroid Linkage Median Linkage

COMP Centroid Clustering The distance between two clusters is defined as the distance between the mean vectors of the two clusters. d AB = d ab where a is the mean vector of the cluster A and b is the mean vector of the cluster B. Cluster A Cluster B a b

COMP Distance Single Linkage Complete Linkage Group Average Linkage Centroid Linkage Median Linkage

COMP Median Clustering Disadvantage of the Centroid Clustering: When a large cluster is merged with a small one, the centroid of the combined cluster would be closed to the large one, ie. The characteristic properties of the small one are lost After we have combined two groups, the mid-point of the original two cluster centres is used as the centre of the newly combined group Cluster A Cluster B a b

COMP Clustering Methods K-means Clustering Original k-means Clustering Sequential K-means Clustering Forgetful Sequential K-means Clustering Hierarchical Clustering Methods Agglomerative methods Divisive methods – polythetic approach and monothetic approach

COMP Divisive Methods In a divisive algorithm, we start with the assumption that all the data is part of one cluster. We then use a distance criterion to divide the cluster in two, and then subdivide the clusters until a stopping criterion is achieved. Polythetic – divide the data based on the values by all attributes Monothetic – divide the data on the basis of the possession of a single specified attribute

COMP Polythetic Approach Distance Single Linkage Complete Linkage Group Average Linkage Centroid Linkage Median Linkage

COMP Polythetic Approach D(1, *) = 26.0 D(2, *) = 22.5 D(3, *) = 20.7 D(4, *) = 17.3 D(5, *) = 18.5 D(6, *) = 22.2 D(7, *) = 25.5 A = {1 } B = {2, 3, 4, 5, 6, 7}

COMP Polythetic Approach D(2, A) = 10 D(3, A) = 7 D(4, A) = 30 D(5, A) = 29 D(6, A) = 38 D(7, A) = 42 A = {1 } B = {2, 3, 4, 5, 6, 7}

COMP Polythetic Approach D(2, A) = 10 D(3, A) = 7 D(4, A) = 30 D(5, A) = 29 D(6, A) = 38 D(7, A) = 42 A = {1 } B = {2, 3, 4, 5, 6, 7} D(2, B) = 25.0 D(3, B) = 23.4 D(4, B) = 14.8 D(5, B) = 16.4 D(6, B) = 19.0 D(7, B) = 22.2

COMP Polythetic Approach D(2, A) = 10 D(3, A) = 7 D(4, A) = 30 D(5, A) = 29 D(6, A) = 38 D(7, A) = 42 A = {1 } B = {2, 3, 4, 5, 6, 7} D(2, B) = 25.0 D(3, B) = 23.4 D(4, B) = 14.8 D(5, B) = 16.4 D(6, B) = 19.0 D(7, B) = 22.2  2 = 15.0  3 = 16.4  4 =  5 =  6 =  7 = -19.8, 3

COMP Polythetic Approach D(2, A) = 10 D(3, A) = 7 D(4, A) = 30 D(5, A) = 29 D(6, A) = 38 D(7, A) = 42 A = {1 } B = {2, 3, 4, 5, 6, 7} D(2, B) = 25.0 D(3, B) = 23.4 D(4, B) = 14.8 D(5, B) = 16.4 D(6, B) = 19.0 D(7, B) = 22.2  2 = 15.0  3 = 16.4  4 =  5 =  6 =  7 = -19.8, 3

COMP Polythetic Approach D(2, A) = 8.5 D(4, A) = 25.5 D(5, A) = 25.5 D(6, A) = 34.5 D(7, A) = 39.0 A = {1 } B = {2, 4, 5, 6, 7}, 3

COMP Polythetic Approach D(2, A) = 8.5 D(4, A) = 25.5 D(5, A) = 25.5 D(6, A) = 34.5 D(7, A) = 39.0 A = {1 } B = {2, 4, 5, 6, 7} D(2, B) = 29.5 D(4, B) = 13.2 D(5, B) = 15.0 D(6, B) = 16.0 D(7, B) = 18.6, 3

COMP Polythetic Approach D(2, A) = 8.5 D(4, A) = 25.5 D(5, A) = 25.5 D(6, A) = 34.5 D(7, A) = 39.0 A = {1 } B = {2, 4, 5, 6, 7} D(2, B) = 29.5 D(4, B) = 13.2 D(5, B) = 15.0 D(6, B) = 16.0 D(7, B) = 18.6  2 = 21.0  4 =  5 =  6 =  7 = -20.4, 3, 2

COMP Polythetic Approach D(2, A) = 8.5 D(4, A) = 25.5 D(5, A) = 25.5 D(6, A) = 34.5 D(7, A) = 39.0 A = {1 } B = {2, 4, 5, 6, 7} D(2, B) = 29.5 D(4, B) = 13.2 D(5, B) = 15.0 D(6, B) = 16.0 D(7, B) = 18.6  2 = 21.0  4 =  5 =  6 =  7 = -20.4, 3, 2

COMP Polythetic Approach D(4, A) = 24.7 D(5, A) = 25.3 D(6, A) = 34.3 D(7, A) = 38.0 A = {1 } B = {4, 5, 6, 7}, 3, 2

COMP Polythetic Approach D(4, A) = 24.7 D(5, A) = 25.3 D(6, A) = 34.3 D(7, A) = 38.0 A = {1 } B = {4, 5, 6, 7} D(4, B) = 10.0 D(5, B) = 11.7 D(6, B) = 10.0 D(7, B) = 13.0, 3, 2

COMP Polythetic Approach D(4, A) = 24.7 D(5, A) = 25.3 D(6, A) = 34.3 D(7, A) = 38.0 A = {1 } B = {4, 5, 6, 7} D(4, B) = 10.0 D(5, B) = 11.7 D(6, B) = 10.0 D(7, B) = 13.0  4 =  5 =  6 =  7 = -25.0, 3, 2 All differences are negative. The process would continue on each subgroup separately.

COMP Clustering Methods K-means Clustering Original k-means Clustering Sequential K-means Clustering Forgetful Sequential K-means Clustering Hierarchical Clustering Methods Agglomerative methods Divisive methods – polythetic approach and monothetic approach

COMP Monothetic ABC It is usually used when the data consists of binary variables.

COMP Monothetic ABC It is usually used when the data consists of binary variables. 10 1a=3b=1 0c=0d=1 A B = Chi-Square Measure

COMP Monothetic ABC It is usually used when the data consists of binary variables. 10 1a=3b=1 0c=0d=1 A B Chi-Square Measure = 1.875

COMP Monothetic ABC It is usually used when the data consists of binary variables. 10 1a=3b=1 0c=0d=1 A B Chi-Square Measure = Attr.AB a3 b1 c0 d1 N5 1.87

COMP Monothetic Attr.ABACBC a312 b121 c022 d100 N It is usually used when the data consists of binary variables. ABC For attribute A, = 4.09 For attribute B, = 2.70 For attribute C, = a=3b=1 0c=0d=1 A B Chi-Square Measure

COMP Monothetic It is usually used when the data consists of binary variables. ABC For attribute A, = 4.09 For attribute B, = 2.70 For attribute C, = 3.05 Attr.ABACBC a312 b121 c022 d100 N a=3b=1 0c=0d=1 A B Chi-Square Measure We choose attribute A for dividing the data into two groups. {2, 3, 4}, and {1, 5}