Download presentation
1
Locally Constraint Support Vector Clustering
Dragomir Yankov, Eamonn Keogh, Kin Fai Kan Computer Science & Eng. Dept. University of California, Riverside
2
Outline On the need of improving the Support Vector Clustering (SVC) algorithm. Motivation Problem formulation Locally constrained SVC An overview of SVC Applying factor analysis for local outlier detection Regularizing the decision function of SVC Experimental evaluation
3
Motivation for improving SVC
SVC transforms the data in a high dimensional feature space, where a decision function is computed The support-vectors define contours in the original space representing higher density regions The method is theoretically sound and useful for detecting non-convex formations original data detected clusters
4
Motivation for improving SVC (cont)
Parametrizing SVC incorrectly may either disguise some objectively present clusters, or produce multiple unintuitive clusters Correct parametrization is especially hard in the presence of noise (frequently encountered when learning from embedded manifolds) large kernel widths merge the clusters small kernel widths produce multiple unintuitive clusters
5
Problem formulation How can we make Support Vector Clustering:
Less susceptible to noise in the data More resilient to imprecise parametrization
6
Locally constrained SVC – one class classification
Support Vector density estimation Primal formulation Dual formulation
7
Locally constrained SVC – labeling the closed contours
Support Vector Clustering – decision function Labeling the individual classes Build an affinity matrix and find the connected components
8
Locally constrained SVC – detecting local outliers
Factor analysis: Mixture of factor analyzers We can adapt MFA to pinpoint local outliers Points like P1and P2 that deviate a lot from the FA are among the true outliers
9
Locally constrained SVC – regularizing the decision function
To compute the local deviation of each point we use their Mahalanobis distances with respect to the corresponding FA New primal formulation (weighting the slack variables) New dual formulation
10
Locally constrained SVC – discussion
Difference SVC and LSVC Tuning the parameters cannot achieve the same result SVC LSVC SVC tries to accommodate all outliers building complex boundaries SVC SVC Left : SVC tries to accommodate all examples building complex contours and incorrectly bridging the two concentric clusters. Right : LSVC, the proposed here method, detects most outliers. The contours shrink towards the truly dense regions and the two main clusters are separated correctly. Left : SVC for = 8 and = 0.4. Many outliers are now correctly identified, but the rest of the points are split into multiple uninformative clusters. Right : SVC for = 9 and = 0.1. Increasing also cannot achieve the LSVC effect. The contours become very tight and complex and start splitting into multiple clusters Small kernel width detects the outliers but produces multiple unintuitive clusters
11
Experimental evaluation – synthetic data
Gaussian with radial Gaussian distributions LSVC Good parameter values for LSVC are detected automatically. The right clusters are detected SVC SVC is harder to parametrize. The detected clusters are incorrect
12
Experimental evaluation – synthetic data
Swiss roll data with added Gaussian noise LSVC Most of the noise is identified as bounded SVs by LSVC. The correct clusters are detected SVC SVC tends to merge the two large clusters. With supervision the clusters are eventually identified
13
Experimental evaluation – face images
Frey face dataset LSVC LSVC discriminates the two objectively interesting manifolds embedding the data SVC Even with supervision we could not find parameters that separate the two major manifolds with SVC
14
Experimental evaluation – shape clustering
Arrowheads dataset LSVC Some of the classes are similar. There are multiple elements bridging their shape manifolds SVC LSVC achieves 73% accuracy vs 60% for SVC
15
Conclusion The LSVC method combines both a global and a local view of the data It computes a decision function that defines a global measure of density support MFA complements this with a local view based on the individual analyzers The algorithm improves significantly on the stability of SVC in the presence of noise LSVC allows for easier automatic parameterization of one-class SVMs
16
All datasets and the code for LSVC can be obtained by writing to the first author: THANK YOU!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.