SEEM4630 2011-2012 Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

Clustering (2). Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like.
Hierarchical Clustering
Unsupervised Learning
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Data Mining Cluster Analysis Basics
Hierarchical Clustering, DBSCAN The EM Algorithm
PARTITIONAL CLUSTERING
Clustering Paolo Ferragina Dipartimento di Informatica Università di Pisa This is a mix of slides taken from several presentations, plus my touch !
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
unsupervised learning - clustering
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Clustering II.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Cluster Analysis.
Cluster Analysis: Basic Concepts and Algorithms
What is Cluster Analysis?
Cluster Analysis CS240B Lecture notes based on those by © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004.
What is Cluster Analysis?
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
DATA MINING LECTURE 8 Clustering The k-means algorithm
Hierarchical Clustering
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Adapted from Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar.
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Data Mining Cluster Analysis: Basic Concepts and Algorithms.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Clustering/Cluster Analysis. What is Cluster Analysis? l Finding groups of objects such that the objects in a group will be similar (or related) to one.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
CSE4334/5334 Data Mining Clustering. What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related)
ΠΑΝΕΠΙΣΤΗΜΙΟ ΙΩΑΝΝΙΝΩΝ ΑΝΟΙΚΤΑ ΑΚΑΔΗΜΑΪΚΑ ΜΑΘΗΜΑΤΑ Εξόρυξη Δεδομένων Ομαδοποίηση (clustering) Διδάσκων: Επίκ. Καθ. Παναγιώτης Τσαπάρας.
Data Mining Classification and Clustering Techniques Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining.
Computational Biology
Data Mining: Basic Cluster Analysis
Hierarchical Clustering
More on Clustering in COSC 4335
Clustering CSC 600: Data Mining Class 21.
Clustering 28/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.
Clustering Techniques for Finding Patterns in Large Amounts of Biological Data Michael Steinbach Department of Computer Science
Hierarchical Clustering
CSE 5243 Intro. to Data Mining
Data Mining K-means Algorithm
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
CSE 5243 Intro. to Data Mining
John Nicholas Owen Sarah Smith
Hierarchical and Ensemble Clustering
Data Mining Cluster Techniques: Basic
Clustering 23/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.
Hierarchical and Ensemble Clustering
Clustering Analysis.
SEEM4630 Tutorial 3 – Clustering.
Hierarchical Clustering
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Presentation transcript:

SEEM Tutorial 4 – Clustering

2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or related to one another and different from (or unrelated to) the objects in other groups.  A good clustering method will produce high quality clusters high intra-class similarity: cohesive within clusters low inter-class similarity: distinctive between clusters

3 Notion of a Cluster can be Ambiguous How many clusters? Four ClustersTwo Clusters Six Clusters

4 K-Means Clusteringfixed Euclidean Distance etc.

5 K-Means Clustering: Example  Given: Means of the cluster k i, m i = (t i1 + t i2 + … + t im )/m Data {2, 4, 10, 12, 3, 20, 30, 11, 25} K = 2  Solution: m 1 = 2, m 2 = 4,  K 1 = {2, 3}, and K 2 = {4, 10, 12, 20, 30, 11, 25} m 1 = 2.5, m 2 = 16  K 1 = {2, 3, 4}, and K 2 = {10, 12, 20, 30, 11, 25} m 1 = 3, m 2 = 18  K 1 = {2, 3, 4, 10}, and K 2 = {12, 20, 30, 11, 25} m 1 = 4.75, m 2 = 19.6  K 1 = {2, 3, 4, 10, 11, 12}, and K 2 = {20, 30, 25} m 1 = 7, m 2 = 25  K 1 = {2, 3, 4, 10, 11, 12}, and K 2 = {20, 30, 25}

6 K-Means Clustering: Evaluation  Evaluation Sum of Squared Error (SSE) Given clusters, choose the one with the smallest error Data point in cluster C i Centroid of cluster C i

7 Limitations of K-means  It is hard to determine a good K value The initial K centroids  K-means has problems when the data contains outliers. Outliers can be handled better by hierarchical clustering and density-based clustering

8 Hierarchical Clustering  Produces a set of nested clusters organized as a hierarchical tree  Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits

9 Strengths of Hierarchical Clustering  Do not have to assume any particular number of clusters Any desired number of clusters can be obtained by ‘cutting’ the dendrogram at the proper level  Partition direction Agglomerative: starting with single elements and aggregating them into clusters Divisive: starting with the complete data set and dividing it into partitions

10 Agglomerative Hierarchical Clustering  Basic algorithm is straightforward 1. Compute the proximity matrix 2. Let each data point be a cluster 3. Repeat 4. Merge the two closest clusters 5. Update the proximity matrix 6. Until only a single cluster remains  Key operation is the computation of the proximity of two clusters Different approaches to define the distance between clusters

11 Hierarchical Clustering  Define Inter-Cluster Similarity Min Max Group Average Distance between Centroids

12 Hierarchical Clustering: Min or Single Link I1I2I3I4I5 I I I I I I I1I2{I3, I6}I4I5 I I {I3, I6} I I I1{I2, I5}{I3, I6}I4 I {I2, I5} {I3, I6} I I1{I2, I5,I3, I6}I4 I {I2, I5, I3, I6} {I4} I1{I2, I5,I3, I6, I4} I {I2, I5, I3, I6, I4} Euclidean distance