Clustering Algorithms Sunida Ratanothayanon. What is Clustering?

Slides:



Advertisements
Similar presentations
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Advertisements

Clustering.
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
Clustering Basic Concepts and Algorithms
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
6-1 ©2006 Raj Jain Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different.
Unsupervised learning: Clustering Ata Kaban The University of Birmingham
Clustering II.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
Basic Data Mining Techniques
Cluster Analysis: Basic Concepts and Algorithms
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Introduction to Bioinformatics - Tutorial no. 12
Cluster Analysis CS240B Lecture notes based on those by © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Evaluating Performance for Data Mining Techniques
Data mining and machine learning A brief introduction.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Unsupervised Learning. Supervised learning vs. unsupervised learning.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
LISA Short Course Series Multivariate Clustering Analysis in R Yuhyun Song Nov 03, 2015 LISA: Multivariate Clustering Analysis in RNov 3, 2015.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Machine Learning Queens College Lecture 7: Clustering.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
E.G.M. PetrakisText Clustering1 Clustering  “Clustering is the unsupervised classification of patterns (observations, data items or feature vectors) into.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Clustering (1) Chapter 7. Outline Introduction Clustering Strategies The Curse of Dimensionality Hierarchical k-means.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Analysis of Massive Data Sets Prof. dr. sc. Siniša Srbljić Doc. dr. sc. Dejan Škvorc Doc. dr. sc. Ante Đerek Faculty of Electrical Engineering and Computing.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Data Mining Classification and Clustering Techniques Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining.
Data Mining: Basic Cluster Analysis
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Data Clustering Michael J. Watts
Unsupervised Learning - Clustering 04/03/17
Unsupervised Learning - Clustering
K-means and Hierarchical Clustering
Clustering.
John Nicholas Owen Sarah Smith
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
DATA MINING Introductory and Advanced Topics Part II - Clustering
SEEM4630 Tutorial 3 – Clustering.
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
Presentation transcript:

Clustering Algorithms Sunida Ratanothayanon

What is Clustering?

Clustering Clustering is a classification pattern that divide data into groups in meaningful and useful way Unsupervised classification pattern

Clustering Clustering is a classification pattern that divide data into groups in meaningful and useful way Unsupervised classification pattern

Outline K-Means Algorithm Hierarchical Clustering Algorithm

K-Means Algorithm A partial clustering algorithm k clusters (# of k is specified by a user) Each cluster has a cluster center called centroid. The algorithm will literately group data into k clusters based on a distance function.

K-Means Algorithm The centroid can be obtained from the mean of all data points in the cluster. Stop when there is no change of center.

A numerical example

K-Means example Data Pointx1x We have five data points with 2 attributes Want to group data into 2 clusters (k=2)

K-Means example We can plot a graph from five data points as following.

K-Means example (1 st iteration) Step1 : Choosing center and defining k Data Pointx1x C1=(18,22), C2= (4,2) Step2 : Computing cluster centers We already define c1 and c2 Step3 : Finding square of Euclidian distance of each data point from the center and assigning each data points to a cluster

K-Means example (1 st iteration) Data Pointx1x Step3 (cont): Distance table for all data points Data Point C1C2 (18,22)(4,2) (22,21) (19,20) (18,22) (1,3) (4,2) Then, we assign each data point to the cluster by comparing its distance to the center. The data point will be assigned to its closest cluster.

K-Means example (2 nd iteration) Step2 : Computing cluster centers We will compute new cluster centers Member of cluster1 are (22,21), (19,20) and (18,22). We will find average of these data points Data Point C1C2 (18,22)(4,2) (22,21) (19,20) (18,22) (1,3) (4,2) C1 is [19.7, 21] Member of cluster2 are (1,3) and (4,2). C2 is [2.5, 2.5]

K-Means example (2 nd iteration) Data Point C1’C2’ (19.7,21)(2.5,2.5) (22,21) (19,20) (18,22) (1,3) (4,2) Step3 : Finding square of Euclidian distance of each data point from the center and assigning each data points to a cluster Distance table for all data points with new centers Assign each data point to the cluster by comparing its distance to the center. The data point will be assigned to its closest cluster. Repeat step2 and 3 for the next iteration because centers still have a change Data Point C1C2 (18,22)(4,2) (22,21) (19,20) (18,22) (1,3) (4,2)24.410

K-Means example (3 rd iteration) Step2 : Computing cluster centers We will compute new cluster centers Member of cluster1 are (22,21), (19,20) and (18,22). We will find average of these data points C1 is [19.7, 21] Member of cluster2 are (1,3) and (4,2). C2 is [2.5, 2.5] Data Point C1’C2’(19.7,21)(2.5,2.5) (22,21) (19,20) (18,22) (1,3) (4,2)

K-Means example (3 rd iteration) Data Point C1’C2’ (19.7,21)(2.5,2.5) (22,21) (19,20) (18,22) (1,3) (4,2) Step3 : Finding square of Euclidian distance of each data point from the center and assigning each data points to a cluster Distance table for all data points with new centers Assign each data point to the cluster by comparing its distance to the center. The data point will be assigned to its closest cluster. Stop the algorithm because centers remain the same. Data Point C1’’C2’’ (19.7,21)(2.5,2.5) (22,21) (19,20) (18,22) (1,3) (4,2)

Hierarchical Clustering Algorithm Produce a nest sequence of cluster like a tree. Allow to have subclusters. Individual data point at the bottom of the tree are called “Singleton clusters”.

Hierarchical Clustering Algorithm Agglomerative method A tree will be build up from the bottom level and will be merged the nearest pair of clusters at each level to go one level up Continue until all the data points are merged into a single cluster.

A numerical example

Hierarchical Clustering example We have five data points with 3 attributes Data Pointx1x2x3 A937 B1029 C194 D655 E1 3

Hierarchical Clustering example (1 st iteration) Data Pointx1x2x3 A937 B1029 C194 D655 E1 3 Step1 : Calculating Euclidian distance between two vector points Then we obtain distance table as following Data PointABCDE (9, 3, 7)(10, 2, 9)(1, 9, 4)(6, 5, 5)(1, 10, 3) A ( 9, 3, 7) B (10, 2, 9) C (1, 9, 4) D (6, 5, 5) E (1, 10, 3)----0

Hierarchical Clustering example (1 st iteration) Step2 : Forming a tree Consider the most similar pair of data points from the previous distance table Data PointABCDE (9, 3, 7)(10, 2, 9)(1, 9, 4)(6, 5, 5)(1, 10, 3) A ( 9, 3, 7) B (10, 2, 9) C (1, 9, 4) D (6, 5, 5) E (1, 10, 3)----0 C and E are the most similar We will obtain the first cluster as following Repeat step1 and 2 until all data points are merged into a single cluster.

Hierarchical Clustering example (2 nd iteration) Data PointABCDE (9, 3, 7)(10, 2, 9)(1, 9, 4)(6, 5, 5)(1, 10, 3) A ( 9, 3, 7) B (10, 2, 9) C (1, 9, 4) D (6, 5, 5) E (1, 10, 3)----0 Step1 : Calculating Euclidian distance between two vector points We will redraw the distance table including the merge of two entities, C&E. Data PointABDC&E (9, 3, 7)(10, 2, 9)(6, 5, 5) A ( 9, 3, 7) B (10, 2, 9) D (6, 5, 5) C&E (1, 9.5, 3.5)---0 A distance for C&E to A can be obtained from We can use a previous table to get the distance from C to A and E to A. avg (10.44, 11.36) = 10.9

Hierarchical Clustering example (2 nd iteration) Step2 : Forming a tree Consider the most similar pair of data points from the previous distance table A and B are the most similar We will obtain the second cluster as following Repeat step1 and 2 until all data points are merged into a single cluster. Data PointABDC&E (9, 3, 7)(10, 2, 9)(6, 5, 5) A ( 9, 3, 7) B (10, 2, 9) D (6, 5, 5) C&E (1, 9.5, 3.5)---0

 From previous table, we can obtain following distances for the new distance table Hierarchical Clustering example (3 rd iteration) Data PointABDC&E (9, 3, 7)(10, 2, 9)(6, 5, 5) A ( 9, 3, 7) B (10, 2, 9) D (6, 5, 5) C&E (1, 9.5, 3.5)---0 Step1 : Calculating Euclidian distance between two vector points We will redraw the distance table including the merge entities, C&E and A&B. Data PointA&BDC&E (6, 5, 5) A&B D (6, 5, 5)-06.9 C&E--0

Hierarchical Clustering example (3 rd iteration) Step2 : Forming a tree Consider the most similar pair of data points from the previous distance table A&B and D are the most similar We will obtain the new cluster as following Repeat step1 and 2 until all data points are merged into a single cluster. Data PointA&BDC&E (6, 5, 5) A&B D (6, 5, 5)-06.9 C&E--0

 From previous table, we can obtain a distance from cluster A&B&D to C&E as following Hierarchical Clustering example (4 th iteration) Data PointA&BDC&E (6, 5, 5) A&B D (6, 5, 5)-06.9 C&E--0 Step1 : Calculating Euclidian distance between two vector points We will redraw the distance table including the merge entities, C&E and A&B&D. Data PointA&B&DC&E A&B&D09.4 C&E-0

Hierarchical Clustering example (4 th iteration) Step2 : Forming a tree Consider the most similar pair of data points from the previous distance table We can form a final tree because no more recalculation has to be made We can merge all data points into a single cluster A&B&D&C&E. Stop the algorithm. Data PointA&B&DC&E A&B&D09.4 C&E-0

Conclusion Two major clustering algorithms. K-Means algorithm An algorithm which literately groups data into k clusters based on a distance function. # of k is specified by a user. Hierarchical Clustering algorithm It is a nest sequence of cluster like a tree. A tree will be build up from the bottom level and continue until all the data points are merged into a single cluster.

References [1] Hastie, T., Tibeshirani, R., & Friedman J. Data Mining, Inference, Prediction. Unsupervised Learning. pp [2] JAIN, A. K., MURTY, M. N., & FLYNN, P. J. (1999). Data Clustering: A Review. ACM Computing Surveys, 31(3), [3] Liu, B. (2006). Web Data Mining. Unsupervised Learning. Springer. pp [4] Ning, T. P., STEINBACH, M., & KUMAR, V. Introduction to Data Mining. Cluster Analysis: Basic Concepts and Algorithms. pp

Thank you