Download presentation
Presentation is loading. Please wait.
Published byArthur Bernard Nicholson Modified over 8 years ago
1
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A Cluster Validity Measure With Outlier Detection for Support Vector Clustering Presenter : Lin, Shu-Han Authors : Jeen-Shing Wang, Jen-Chieh Chiang IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS(2008)
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Introduction of SVC Motivation Objective Methodology Experiments Conclusion Comments
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC SVC is from SVMs SVMs is supervised clustering technique Fast convergence Good generalization performance Robustness for noise SVC is unsupervised approach 1. Data points map to HD feature space using a Gaussian kernel. 2. Look for smallest sphere enclose data. 3. Map sphere back to data space to form set of contours. 4. Contours are treated as the cluster boundaries. 3
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC - Sphere Analysis To find the minimal enclose sphere with soft margin: To solve this problem, the Lagrangian function: 4 a
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC - Sphere Analysis 5
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC - Sphere Analysis Karush-Kuhn-Tucker complementarity: 6 Bound SV; Outlier
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC -Sphere Analysis To find the minimal enclose sphere with soft margin: C : existence of outliers allowed 7 Wolfe dual optimization problem a
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC -Sphere Analysis The distance between x and a: q : |clusters| & the smoothness/tightness of the cluster boundaries. 8 Mercer kernel Kernel: Gaussian a Gaussian function:
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation 9 The traditional cluster validity measure such as Partition coefficient (PC) Separation measures Base on fuzzy membership grades and cancroids of clusters. SVC algorithm generates boundaries to cluster are arbitrary no fuzzy membership grade. Which clustering is better?
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives Optimal cluster number Cluster validity measure Outlier-detection algorithm Cluster merging mechanism 10 Outlier-detection Cluster merging
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology - Overview 11 Cluster Validity Measure for the SVC Algorithm Outlier detection Cluster-Merging Mechanism C=1, no outliers are allowed
12
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Cluster Validity Measure for the SVC Algorithm 12 Compactness (intra-cluster) Separation (inter-cluster) Cluster Validity measure (ratio) for SVC min
13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Outlier Detection 13 In SVC, outliers (BSV) are the data in boundary regions. q = 1 q = 4 q = 2 q = 1.8 C=0.02 singleton
14
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Outlier Detection C If C=1, result clusters are smooth, but not desirable BSV (outlier) All outlier are SVs Some outlier is far away from other data in clusters SVs More SVs make too tight to fit the data q Increase q makes clusters compact Singleton Important criterion 14 q = 1 q = 4 q = 2 q = 1.8 C=0.02 singleton
15
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Outlier Detection Outlier Existence Criterion Desirable Cluster Criterion Singleton clusters can’t exceed threshold Datapoint’s % of SVs can’t greater than threshold, suggested 50% Recursively adjust C to satisfy this two criterion 15 Suggested γ = 2
16
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Cluster-Merging Mechanism Similarity: overlapping degree 16 Gaussian function: P C = 0 P A > 0
17
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Cluster-Merging Mechanism 1) Agglomerative outliers/noises: identification For all ci 0, merge cluster i and cluster j. Otherwise, discard cluster i. Set K ← K − 1.} 2) Compatible clusters: Combination (similarity) Sort the size of the remaining K clusters in ascending order such that cK = max(ci), ∀ i ∈ K. For each i, i = 1,..., K, perform {Set x ← mi. For each j, j = i + 1,..., K, perform pj(x) Find l = arg max i+1≤j≤K pj(x), where arg maxa denotes the value of a at which the expression that follows is maximized. If pl > 0, merge cluster i with cluster l. Set K ← K − 1 and repeat 2) until no further combination.} 17
18
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Summary 1) Initialize a small value of q, and set C = 1 and γ = 2 2) Perform SVC algorithm, get |clusters|. 3) If |clusters| < 2, increase q, go to 2). 4) If the outlier-detection criterion holds, decrease C, fix q, and go to 2). Otherwise, go to 5). 5) If |SVs|< 50% of the datapoints, go to 6). Otherwise, decrease C, and go to 2). 6) Compute validity measure index (V (m)). 7) If |clusters| > √N, increase q, and go to 2). Otherwise, stop the SVC. 8) Use cluster-merging mechanism to identify an ideal |clusters|. Output |clusters|. 18
19
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments - Benchmark and Artificial Examples Bensaid Data Set 19
20
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments - Benchmark and Artificial Examples Five-Cluster Data Set & Five-Cluster Data Set With Noise 20
21
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments - Benchmark and Artificial Examples 21 Five-Cluster Data Set With Noise, after cluster-merge Merge
22
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments - Benchmark and Artificial Examples 22 Crescent Data Set
23
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments - IRIS Data Set 23 Misclassificatoin
24
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions This paper integrated for SVC: cluster validity measure Outlier detection Merging mechanism Automatically determine suitable values for Kernel parameter Soft-margin constant Clustering with Compact and smooth arbitrary-shaped cluster contours Increasing robustness to outliers and noises 24
25
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Comments Advantage Provide a cluster validity index for a cluster method Drawback … Application SVC 25
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.