Download presentation
Presentation is loading. Please wait.
Published byNora Jefferson Modified over 9 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra Bandyopadhyay Department of Information Management An automatic shape independent clustering technique Pattern Recognition, Vol 37, 2004, pp. 33-45.
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outline Motivation Objective Introduction Graph theoretical clustering based on relative neighborhood The proposed clustering method Experimental results Conclusions Personal opinion Review
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation The most clustering technique has some problem The number of clusters must be pre-defined. Can’t identify the arbitrary shapes of cluster.
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objective In this paper, a clustering technique that can automatically detect any number of well-separated clusters. Relative neighborhood graph. Iterative partitioning. Coupled with a post-processing step for merging small clusters.
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction Clustering. The data set X has n points {x 1, x 2, …, x n } divided to K clusters {C 1, C 2, …, C K }.
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction Minimum spanning tree, MST. The concept of inconsistent edges. The extension of MST-based methods. The concept of relative neighborhood of a finite planar set. In this paper, we use the concept of relative neighborhood for designing a clustering algorithm, which called CLUSTER. A B C DE 1 2 1 3
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Graph theoretical clustering based on relative neighborhood Relative neighborhood graph, RNG. X={x 1, x 2, …, x n }. Two points x i and x j are said to be relative neighbors if
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Graph theoretical clustering based on relative neighborhood Clustering based on limited neighborhood set. The region of influence of two points x i and x j in the RNG, denoted by In Ref. [15], an additional parameter, which is called the relative edge consistency.
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Graph theoretical clustering based on relative neighborhood Determine the connected components of the connected graph.
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. The proposed clustering method The algorithm called CLUSTER is based on The successive thresholding of the RNG. Until a termination criterion is attained.
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. The proposed clustering method RNG=(X,E) X, where the vertices of the graph are the points in X. E is the set of edges in the RNG. Let the weight of an edge e ij, be equal to d(x i, x j ). Let m be the cardinality of E.
12
Intelligent Database Systems Lab N.Y.U.S.T. I. M. CLUSTER Algoirthm
13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Some characteristics of the CLUSTER Algorithm Terminate conditions Inter-cluster relative neighbors are close to each other. (Max < 2 Min in CLUSTER) An appropriate thresh ( >= 2 Min) is not found. |Component| = 1.
14
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Some characteristics of the CLUSTER Algorithm An overfragmented condition will not arise. Hierarchical clusters. The number of clusters is equal to the number of Components formed on termination of the algorithm. Merge threshold,. Check the size of a component. Edge length. Outliers or noise.
15
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Some characteristics of the CLUSTER Algorithm The complexity of CLUSTER is O(m log m). m is the number of edges in the RNG. Reduction of the complexity Discretization of the values between Min and Max. Defined as a fraction of Max – Min,. The complexity of RNG is O(n 2 ). The overall complexity of the clustering algorithm is O(n 2 + m log m) O(n 2 ).
16
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental results Eight data sets of different characteristics. Parameters Merge condition clusters size below 5% of the size of data set. =0.001 the range [Min, Max] is discretized into 1000 intervals. = 3.
17
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Normal 2 Dimensional, 3 class, Gaussian distributions, N=300.
18
Intelligent Database Systems Lab N.Y.U.S.T. I. M. RC1
19
Intelligent Database Systems Lab N.Y.U.S.T. I. M. ADS1
20
Intelligent Database Systems Lab N.Y.U.S.T. I. M. ADS2
21
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Encircle
22
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Concentric & Concentric_noisy
23
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions CLUSTER, based on an iterative partitioning of the relative neighborhood graph. The number of clusters not to be predefined. The cluster shape can be convex and non-convex. It is able to identify an appropriate threshold value. A post-processing step of merging small clusters. Outliers of the data are not merged. Be able to provide an hierarchy of clusters.
24
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Personal opinion Adjusting the threshold according to the current state of clusters is a good ides.
25
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Review Graph theoretical clustering, i.e., MST. Relative neighborhood graph, RNG. CLUSTER. Hierarchical iteration partition. (Top-Down) Based on RNG.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.