Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in 2009 1.Advanced Clustering and Outlier Detection 2.Advanced Classification.

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering (2). Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like.
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Hierarchical Clustering, DBSCAN The EM Algorithm
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
unsupervised learning - clustering
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like diagram that.
Clustering II.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
SCAN: A Structural Clustering Algorithm for Networks
Cluster Analysis.
Cluster Analysis: Basic Concepts and Algorithms
Cluster Analysis CS240B Lecture notes based on those by © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
DATA MINING LECTURE 8 Clustering The k-means algorithm
Clustering Basic Concepts and Algorithms 2
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
Topic9: Density-based Clustering
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar 10/30/2007.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Minqi Zhou Minqi Zhou Introduction.
Jianping Fan Department of Computer Science UNC-Charlotte Density-Based Data Clustering Algorithms: K-Means & Others.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Minqi Zhou © Tan,Steinbach, Kumar.
Data Mining Cluster Analysis: Basic Concepts and Algorithms.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Clustering/Cluster Analysis. What is Cluster Analysis? l Finding groups of objects such that the objects in a group will be similar (or related) to one.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Other Clustering Techniques
CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
DATA MINING: CLUSTER ANALYSIS (3) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
CSE4334/5334 Data Mining Clustering. What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related)
ΠΑΝΕΠΙΣΤΗΜΙΟ ΙΩΑΝΝΙΝΩΝ ΑΝΟΙΚΤΑ ΑΚΑΔΗΜΑΪΚΑ ΜΑΘΗΜΑΤΑ Εξόρυξη Δεδομένων Ομαδοποίηση (clustering) Διδάσκων: Επίκ. Καθ. Παναγιώτης Τσαπάρας.
Data Mining Classification and Clustering Techniques Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining.
Data Mining: Basic Cluster Analysis
Hierarchical Clustering
More on Clustering in COSC 4335
CSE 4705 Artificial Intelligence
Hierarchical Clustering: Time and Space requirements
Clustering CSC 600: Data Mining Class 21.
Clustering 28/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.
What Is the Problem of the K-Means Method?
CSE 5243 Intro. to Data Mining
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Techniques: Basic
Clustering 23/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.
SEEM4630 Tutorial 3 – Clustering.
Hierarchical Clustering
Presentation transcript:

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification and Prediction 3.Top Ten Data Mining Algorithms (short) 4.Course Summary (short) 5.Assignment5 Student Presentations

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 2 Clustering Part2: Advanced Clustering and Outlier Detection 1.Hierarchical Clustering 2.More on Density-based Clustering: DENCLUE 3.[EM  Top10-DM-Alg] 4.Cluster Evaluation Measures 5.Outlier Detection

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN More on Clustering 1. Hierarchical Clustering to be discussed in Nov DBSCAN will be used in programming project

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN Hierarchical Clustering l Produces a set of nested clusters organized as a hierarchical tree l Can be visualized as a dendrogram –A tree like diagram that records the sequences of merges or splits

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN Agglomerative Clustering Algorithm l More popular hierarchical clustering technique l Basic algorithm is straightforward 1.Compute the proximity matrix 2.Let each data point be a cluster 3.Repeat 4.Merge the two closest clusters 5.Update the proximity matrix 6.Until only a single cluster remains l Key operation is the computation of the proximity of two clusters –Different approaches to defining the distance between clusters distinguish the different algorithms

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN Starting Situation l Start with clusters of individual points and a proximity matrix p1 p3 p5 p4 p2 p1p2p3p4p Proximity Matrix

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN Intermediate Situation l After some merging steps, we have some clusters C1 C4 C2 C5 C3 C2C1 C3 C5 C4 C2 C3C4C5 Proximity Matrix

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN Intermediate Situation l We want to merge the two closest clusters (C2 and C5) and update the proximity matrix. C1 C4 C2 C5 C3 C2C1 C3 C5 C4 C2 C3C4C5 Proximity Matrix

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN After Merging l The question is “How do we update the proximity matrix?” C1 C4 C2 U C5 C3 ? ? ? ? ? C2 U C5 C1 C3 C4 C2 U C5 C3C4 Proximity Matrix

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p Similarity? l MIN l MAX l Group Average l Distance Between Centroids l Other methods driven by an objective function –Ward’s Method uses squared error Proximity Matrix

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p Proximity Matrix l MIN l MAX l Group Average l Distance Between Centroids l Other methods driven by an objective function –Ward’s Method uses squared error

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p Proximity Matrix l MIN l MAX l Group Average l Distance Between Centroids l Other methods driven by an objective function –Ward’s Method uses squared error

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p Proximity Matrix l MIN l MAX l Group Average l Distance Between Centroids l Other methods driven by an objective function –Ward’s Method uses squared error

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p Proximity Matrix l MIN l MAX l Group Average l Distance Between Centroids l Other methods driven by an objective function –Ward’s Method uses squared error 

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN Cluster Similarity: Group Average l Proximity of two clusters is the average of pairwise proximity between points in the two clusters. l Need to use average connectivity for scalability since total proximity favors large clusters 12345

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN Teaching of Clustering Clustering Part1: Basics (September/October) 1.What is Clustering? 2.Partitioning/Representative-based Clustering K-means K-medoids 3.Density Based Clustering centering on DBSCAN 4.Region Discovery 5.Grid-based Clustering 6.Similarity Assessment Clustering Part2: Advanced Topics (November)

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN DBSCAN ( ) l DBSCAN is a density-based algorithm. –Density = number of points within a specified radius (Eps) –Input parameter: MinPts and Eps –A point is a core point if it has more than a specified number of points (MinPts) within Eps  These are points that are at the interior of a cluster –A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point –A noise point is any point that is not a core point or a border point.

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN DBSCAN: Core, Border, and Noise Points

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN DBSCAN Algorithm (simplified view for teaching) 1. Create a graph whose nodes are the points to be clustered 2. For each core-point c create an edge from c to every point p in the  -neighborhood of c 3. Set N to the nodes of the graph; 4. If N does not contain any core points terminate 5. Pick a core point c in N 6. Let X be the set of nodes that can be reached from c by going forward; 1.create a cluster containing X  {c} 2.N=N/(X  {c}) 7. Continue with step 4 Remarks: points that are not assigned to any cluster are outliers; gives a more efficient implementation by performing steps 2 and 6 in parallel

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN DBSCAN: Core, Border and Noise Points Original Points Point types: core, border and noise Eps = 10, MinPts = 4

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN When DBSCAN Works Well Original Points Clusters Resistant to Noise Can handle clusters of different shapes and sizes

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN When DBSCAN Does NOT Work Well Original Points (MinPts=4, Eps=9.75). (MinPts=4, Eps=9.12) Varying densities High-dimensional data Problems with

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN Assignment 3 Dataset: Earthquake

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN Assignment3 Dataset: Complex9 K-Means in Weka DBSCAN in Weka Dataset:

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN DBSCAN: Determining EPS and MinPts l Idea is that for points in a cluster, their k th nearest neighbors are at roughly the same distance l Noise points have the k th nearest neighbor at farther distance l So, plot sorted distance of every point to its k th nearest neighbor Non-Core-points Core-points Run DBSCAN for Minp=4 and  =5