Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering (2). Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like.
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Data Mining Cluster Analysis Basics
Hierarchical Clustering, DBSCAN The EM Algorithm
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
unsupervised learning - clustering
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Clustering II.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Advanced Concepts and Algorithms Figures for Chapter 9 Introduction.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Basic Concepts and Algorithms Figures for Chapter 8 Introduction.
Cluster Analysis.
Cluster Analysis: Basic Concepts and Algorithms
What is Cluster Analysis?
Cluster Analysis CS240B Lecture notes based on those by © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
What is Cluster Analysis?
DATA MINING LECTURE 8 Clustering The k-means algorithm
ZHANGXI LIN TEXAS TECH UNIVERSITY Lecture Notes 10 CRM Segmentation - Introduction.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Clustering Basic Concepts and Algorithms 2
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Adapted from Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar.
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar 10/30/2007.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Minqi Zhou Minqi Zhou Introduction.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Minqi Zhou © Tan,Steinbach, Kumar.
Data Mining Cluster Analysis: Basic Concepts and Algorithms.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Clustering/Cluster Analysis. What is Cluster Analysis? l Finding groups of objects such that the objects in a group will be similar (or related) to one.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
CSE4334/5334 Data Mining Clustering. What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related)
Data Mining Cluster Analysis: Basic Concepts and Algorithms.
ΠΑΝΕΠΙΣΤΗΜΙΟ ΙΩΑΝΝΙΝΩΝ ΑΝΟΙΚΤΑ ΑΚΑΔΗΜΑΪΚΑ ΜΑΘΗΜΑΤΑ Εξόρυξη Δεδομένων Ομαδοποίηση (clustering) Διδάσκων: Επίκ. Καθ. Παναγιώτης Τσαπάρας.
Data Mining Classification and Clustering Techniques Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Computational Biology
Data Mining: Basic Cluster Analysis
Hierarchical Clustering
More on Clustering in COSC 4335
Clustering CSC 600: Data Mining Class 21.
Hierarchical Clustering
CSE 5243 Intro. to Data Mining
Data Mining Cluster Techniques: Basic
SEEM4630 Tutorial 3 – Clustering.
Hierarchical Clustering
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Presentation transcript:

Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Inter-cluster distances are maximized Intra-cluster distances are minimized

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Types of Clusterings l A clustering is a set of clusters l Important distinction between hierarchical and partitional sets of clusters l Partitional Clustering –A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset l Hierarchical clustering –A set of nested clusters organized as a hierarchical tree

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Partitional Clustering Original Points A Partitional Clustering

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Hierarchical Clustering Traditional Hierarchical ClusteringTraditional Dendrogram

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Types of Clusters: Well-Separated l Well-Separated Clusters: –A cluster is a set of points such that any point in a cluster is closer (or more similar) to every other point in the cluster than to any point not in the cluster. 3 well-separated clusters

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Types of Clusters: Center-Based l Center-based – A cluster is a set of objects such that an object in a cluster is closer (more similar) to the “center” of a cluster, than to the center of any other cluster –The center of a cluster is often a centroid, the average of all the points in the cluster, or a medoid, the most “representative” point of a cluster 4 center-based clusters

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Clustering Algorithms l Hierarchical clustering l K-means l Density-based clustering

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Hierarchical Clustering l Produces a set of nested clusters organized as a hierarchical tree l Can be visualized as a dendrogram –A tree like diagram that records the sequences of merges or splits

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ DEMO

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Strengths of Hierarchical Clustering l Do not have to assume any particular number of clusters –Any desired number of clusters can be obtained by ‘cutting’ the dendogram at the proper level

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Hierarchical Clustering l Two main types of hierarchical clustering –Agglomerative:  Start with the points as individual clusters  At each step, merge the closest pair of clusters until only one cluster (or k clusters) left –Divisive:  Start with one, all-inclusive cluster  At each step, split a cluster until each cluster contains a point (or there are k clusters) l Traditional hierarchical algorithms use a similarity or distance matrix –Merge or split one cluster at a time

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Agglomerative Clustering Algorithm l More popular hierarchical clustering technique l Basic algorithm is straightforward 1.Compute the proximity matrix 2.Let each data point be a cluster 3.Repeat 4.Merge the two closest clusters 5.Update the proximity matrix 6.Until only a single cluster remains l Key operation is the computation of the proximity of two clusters –Different approaches to defining the distance between clusters distinguish the different algorithms

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Starting Situation l Start with clusters of individual points and a proximity matrix p1 p3 p5 p4 p2 p1p2p3p4p Proximity Matrix

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Intermediate Situation l After some merging steps, we have some clusters C1 C4 C2 C5 C3 C2C1 C3 C5 C4 C2 C3C4C5 Proximity Matrix

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Intermediate Situation l We want to merge the two closest clusters (C2 and C5) and update the proximity matrix. C1 C4 C2 C5 C3 C2C1 C3 C5 C4 C2 C3C4C5 Proximity Matrix

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ After Merging l The question is “How do we update the proximity matrix?” C1 C4 C2 U C5 C3 ? ? ? ? ? C2 U C5 C1 C3 C4 C2 U C5 C3C4 Proximity Matrix

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p Similarity? l MIN l MAX l Group Average l Distance Between Centroids Proximity Matrix

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p Proximity Matrix l MIN l MAX l Group Average l Distance Between Centroids

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p Proximity Matrix l MIN l MAX l Group Average l Distance Between Centroids

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p Proximity Matrix l MIN l MAX l Group Average l Distance Between Centroids

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p Proximity Matrix l MIN l MAX l Group Average l Distance Between Centroids 

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Hierarchical Clustering: Problems and Limitations l Once a decision is made to combine two clusters, it cannot be undone