Data Mining Cluster Analysis: Advanced Concepts and Algorithms

Slides:



Advertisements
Similar presentations
Clustering.
Advertisements

Clustering (2). Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like.
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering, DBSCAN The EM Algorithm
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Midterm topics Chapter 2 Data Data preprocessing Measures of similarity/dissimilarity Chapter.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
2001/12/18CHAMELEON1 CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Paper presentation in data mining class Presenter : 許明壽 ; 蘇建仲.
Clustering Prof. Navneet Goyal BITS, Pilani
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Lecture Notes for Chapter 2 Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Data Mining Cluster Analysis: Basic Concepts and Algorithms
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Advanced Concepts and Algorithms Figures for Chapter 9 Introduction.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Basic Concepts and Algorithms Figures for Chapter 8 Introduction.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Anomaly Detection Figures for Chapter 10 Introduction to Data Mining by Tan,
© University of Minnesota Data Mining CSCI 8980 (Fall 2002) 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center.
SCAN: A Structural Clustering Algorithm for Networks
Cluster Analysis.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Cluster Analysis: Basic Concepts and Algorithms
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Classification: Alternative Techniques Figures for Chapter 5 Introduction to.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining: Exploring Data Figures for Chapter 3 Introduction to Data Mining by Tan, Steinbach,
Clustering An overview of clustering algorithms Dènis de Keijzer GIA 2004.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Association Analysis: Advanced Concepts Figures for Chapter 7 Introduction to.
Clustering Unsupervised learning Generating “classes”
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Cluster Analysis Part II. Learning Objectives Hierarchical Methods Density-Based Methods Grid-Based Methods Model-Based Clustering Methods Outlier Analysis.
Clustering CIS 601 Fall 2004 Longin Jan Latecki Lecture slides taken/modified from: Jiawei Han ( ) Vipin.
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
Topic9: Density-based Clustering
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Clustering.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach,
Cluster Analysis: Basic Concepts and Algorithms. What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining: Basic Cluster Analysis
More on Clustering in COSC 4335
CSE 4705 Artificial Intelligence
Clustering 28/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Critical Issues with Respect to Clustering
Clustering 23/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.
CSE572, CBS572: Data Mining by H. Liu
CSE572: Data Mining by H. Liu
Data Pre-processing Lecture Notes for Chapter 2
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Presentation transcript:

Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

Graph-Based Clustering Graph-Based clustering uses the proximity graph Start with the proximity matrix Consider each point as a node in a graph Each edge between two nodes has a weight which is the proximity between the two points Initially the proximity graph is fully connected MIN (single-link) and MAX (complete-link) can be viewed as starting with this graph In the simplest case, clusters are connected components in the graph.

Graph-Based Clustering: Sparsification The amount of data that needs to be processed is drastically reduced Sparsification can eliminate more than 99% of the entries in a proximity matrix The amount of time required to cluster the data is drastically reduced The size of the problems that can be handled is increased

Graph-Based Clustering: Sparsification … Clustering may work better Sparsification techniques keep the connections to the most similar (nearest) neighbors of a point while breaking the connections to less similar points. The nearest neighbors of a point tend to belong to the same class as the point itself. This reduces the impact of noise and outliers and sharpens the distinction between clusters. Sparsification facilitates the use of graph partitioning algorithms (or algorithms based on graph partitioning algorithms. Chameleon and Hypergraph-based Clustering

Sparsification in the Clustering Process

Shared Near Neighbor Approach SNN graph: the weight of an edge is the number of shared neighbors between vertices given that the vertices are connected i j i j 4

Creating the SNN Graph Sparse Graph Shared Near Neighbor Graph Link weights are similarities between neighboring points Shared Near Neighbor Graph Link weights are number of Shared Nearest Neighbors

SNN Clustering Algorithm Compute the similarity matrix This corresponds to a similarity graph with data points for nodes and edges whose weights are the similarities between data points Sparsify the similarity matrix by keeping only the k most similar neighbors This corresponds to only keeping the k strongest links of the similarity graph Construct the shared nearest neighbor graph from the sparsified similarity matrix. At this point, we could apply a similarity threshold and find the connected components to obtain the clusters (Jarvis-Patrick algorithm) Find the SNN density of each Point. Using a user specified parameters, Eps, find the number points that have an SNN similarity of Eps or greater to each point. This is the SNN density of the point

SNN Density a) All Points b) High SNN Density c) Medium SNN Density d) Low SNN Density

SNN Clustering Can Handle Differing Densities Original Points

SNN Clustering Can Handle Other Difficult Situations

Features and Limitations of SNN Clustering Complexity of SNN Clustering is high O( n * time to find numbers of neighbor within Eps) In worst case, this is O(n2) For lower dimensions, there are more efficient ways to find the nearest neighbors R* Tree k-d Trees