Max-margin Clustering: Detecting Margins from Projections of Points on Lines Raghuraman Gopalan1, and Jagan Sankaranarayanan2 1Center for Automation Research, University of Maryland, College Park, MD USA 2NEC Labs, Cupertino, CA USA E-mail: {raghuram,jagan}@umiacs.umd.edu
Problem Statement Given an unlabelled set of points forming k clusters, find a grouping with maximum separating margin among the clusters Prior work: (Mostly) Establish feedback between different label proposals, and run a supervised classifier on it Goal: To understand the relation between data points and margin regions by analyzing projections of data on lines
Two-cluster Problem Assumptions Linearly separable clusters Kernel trick for non-linear case No outliers in data (max margin exist only between clusters) Enforce global cluster balance Proposition 1 SI* exists ONLY on line segments in margin region that are perpendicular to the separating hyperplane Such line segments directly provide cluster groupings
Multi-cluster Problem SI* doesn’t exist Location information of projected points (SI) alone is insufficient to detect margins
The Role of Distance of Projection Proposition 2 For line intervals in margin region, perpendicular to the separating hyperplane, Proposition 3 For line intervals inside a cluster of length more than Mm, Proposition 4 An interval with SI having no projected points with distance of projection less than Dmin*, can lie only outside a cluster; where γ1 γ2 γ3 CL1 CL2 CL3 Defn: Dmin of a line interval is the minimum distance of projection of points in that interval. No outlier assumption: Max margin between points within a cluster
A Pair-wise Similarity Measure for Clustering f(xi,xj)=1, iff xi=xj f(xi,xj)<<1, iff xi and xj are from different clusters, and Intij is perpendicular to their separating hyperplane
Max-margin Clustering Algorithm Draw lines between all pairs of points Estimate the probability of presence of margins between a pair of points xi and xj by computing f(xi,xj) Perform global clustering using f between all point-pairs
Results 3D 2D
Clustering Detecting margin regions Summary Clustering Detecting margin regions Obtaining statistics of location and distance of projection of points that are specific to line segments in margin regions (Prop. 1 to 4) A pair-wise similarity measure to perform clustering, which avoids some optimization-related challenges prevalent in most existing methods