Chameleon: A hierarchical Clustering Algorithm Using Dynamic Modeling By George Karypis, Eui-Hong Han,Vipin Kumar and not by Prashant Thiruvengadachari.

Slides:



Advertisements
Similar presentations
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Advertisements

Clustering for web documents 1 박흠. Clustering for web documents 2 Contents Cluto Criterion Functions for Document Clustering* Experiments and Analysis.
Clustering.
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Hierarchical Clustering, DBSCAN The EM Algorithm
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Clustering Categorical Data The Case of Quran Verses
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Midterm topics Chapter 2 Data Data preprocessing Measures of similarity/dissimilarity Chapter.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
2001/12/18CHAMELEON1 CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Paper presentation in data mining class Presenter : 許明壽 ; 蘇建仲.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Machine Learning and Data Mining Clustering
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
4. Clustering Methods Concepts Partitional (k-Means, k-Medoids)
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Image Segmentation Chapter 14, David A. Forsyth and Jean Ponce, “Computer Vision: A Modern Approach”.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
© University of Minnesota Data Mining CSCI 8980 (Fall 2002) 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center.
Clustering Color/Intensity
Cluster Analysis.
Cluster Analysis: Basic Concepts and Algorithms
What is Cluster Analysis?
Cluster Analysis CS240B Lecture notes based on those by © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004.
On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Birch: An efficient data clustering method for very large databases
DATA MINING LECTURE 8 Clustering The k-means algorithm
Clustering Unsupervised learning Generating “classes”
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Cluster Analysis Part II. Learning Objectives Hierarchical Methods Density-Based Methods Grid-Based Methods Model-Based Clustering Methods Outlier Analysis.
9/03Data Mining – Clustering G Dong (WSU) 1 4. Clustering Methods Concepts Partitional (k-Means, k-Medoids) Hierarchical (Agglomerative & Divisive, COBWEB)
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
CSE 185 Introduction to Computer Vision Pattern Recognition 2.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Machine Learning Queens College Lecture 7: Clustering.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Data Mining: Basic Cluster Analysis
Data Mining K-means Algorithm
CS 685: Special Topics in Data Mining Jinze Liu
Topic 3: Cluster Analysis
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
The University of Adelaide, School of Computer Science
DATA MINING Introductory and Advanced Topics Part II - Clustering
Clustering The process of grouping samples so that the samples are similar within each group.
SEEM4630 Tutorial 3 – Clustering.
Hierarchical Clustering
CS 685: Special Topics in Data Mining Jinze Liu
Presentation transcript:

Chameleon: A hierarchical Clustering Algorithm Using Dynamic Modeling By George Karypis, Eui-Hong Han,Vipin Kumar and not by Prashant Thiruvengadachari

Existing Algorithms K-means and PAM  Algorithm assigns K-representational points to the clusters and tries to form clusters based on the distance measure.

More algorithms Other algorithm include CURE, ROCK, CLARANS, etc. CURE takes into account distance between representatives ROCK takes into account inter-cluster aggregate connectivity.

Chameleon Two-phase approach Phase -I  Uses a graph partitioning algorithm to divide the data set into a set of individual clusters. Phase -II  uses an agglomerative hierarchical mining algorithm to merge the clusters.

So, basically..

Why not stop with Phase-I? We've got the clusters, haven't we ? Chameleon(Phase-II) takes into account Inter Connetivity Relative closeness  Hence, chameleon takes into account features intrinsic to a cluster.

Constructing a sparse graph Using KNN  Data points that are far away are completely avoided by the algorithm (reducing the noise in the dataset)‏  captures the concept of neighbourhood dynamically by taking into account the density of the region.

What do you do with the graph ? Partition the KNN graph such that the edge cut is minimized.  Reason: Since edge cut represents similarity between the points, less edge cut => less similarity. Multi-level graph partitioning algorithms to partition the graph – hMeTiS library.

Example:

Cluster Similarity Models cluster similarity based on the relative inter-connectivity and relative closeness of the clusters.

Relative Inter-Connectivity Ci and Cj  RIC= AbsoluteIC(Ci,Cj)‏ internal IC(Ci)+internal IC(Cj) / 2  where AbsoluteIC(Ci,Cj)= sum of weights of edges that connect Ci with Cj.  internalIC(Ci) = weighted sum of edges that partition the cluster into roughly equal parts.

Relative Closeness Absolute closeness normalized with respect to the internal closeness of the two clusters. Absolute closeness got by average similarity between the points in Ci that are connected to the points in Cj. average weight of the edges from C(i)->C(j).

Internal Closeness…. Internal closeness of the cluster got by average of the weights of the edges in the cluster.

So, which clusters do we merge? So far, we have got  Relative Inter-Connectivity measure.  Relative Closeness measure. Using them,

If the relative inter-connectivity measure relative closeness measure are same, choose inter-connectivity. You can also use,  RI (Ci, C j )≥T(RI) and RC(C i,C j ) ≥ T(RC)‏ Allows multiple clusters to merge at each level. Merging the clusters..

Good points about the paper : Nice description of the working of the system. Gives a note of existing algorithms and as to why chameleon is better. Not specific to a particular domain.

yucky and reasonably yucky parts.. Not much information given about the Phase- I part of the paper – graph properties ? Finding the complexity of the algorithm O(nm + n log n + m^2log m)‏ Different domains require different measures for connectivity and closeness,

Questions ?