Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang (NTU, Singapore) James T. Kwok (HKUST, Hong Kong) Zhi-Hua Zhou (LAMDA, Nanjing University, China)
Summary Maximum Margin Clustering (MMC) [Xu et al., nips05] –inspired by the success of large margin criterion in SVM –the state-of-the-art performance in many clustering problems. The problem of existing methods –SDP relaxation: global but not scalable –Local search: efficient but non-convex We propose a convex LG-MMC method which is also scalable to large datasets via Label Generation strategy.
Outline Introduction The Proposed LG-MMC Method Experimental Results Conclusion
Outline Introduction The Proposed LG-MMC Method Experimental Results Conclusion
Maximum Margin Clustering [Xu et.al., NIPS05] Perform clustering (i.e., determining the unknown label y) by simultaneously finding maximum margin hyperplane in the data Setting –Given a set of unlabeled pattern Goal –Learn a decision function and a label vector Balance Constraint Margin Error
Maximum Margin Clustering [Xu et.al., NIPS05] The Dual problem Key –Some kind of relaxation maybe helpful Mixed integer program, intractable for large scale dataset
Related work MMC with SDP relaxation [Xu et.al., nips05] –convex, state-of-the-art performance –Expensive: the worse-case O(n^6.5) Generalized MMC [Valizadegan & Jin, nips07] –a smaller SDP problem which speedup MMC by 100 times –Still expensive: cannot handle medium datasets Some efficient algorithms [Zhang et.al., icml07][Zhao et.al.,sdm08] –Much more scalable than global methods –Non-convex: may get struck in local minima To investigate a convex method which is also scalable for large datasets
Outline Introduction The Proposed LG-MMC Method Experiment Results Conclusion
Intuition ? ? ? ? ? ? ? ? ? ? SVM hard efficient combination - Multiple label-kernel learning - yy’ : label-kernel
Flow Chart of LG-MMC LG-MMC: transform MMC problem to multiple label- kernel learning via minmax relaxation Cutting Plane Algorithm –multiple label-kernel learning –Finding the most violated y LG-MMC achieves tighter relaxation than SDP relaxation [Xu et al., nips05]
LG-MMC: Minmax relaxation of MMC problem –Consider interchanging the order of and, leading to: –According to the minmax theorem, the optimal objective of LG-MMC is upper bound of that of MMC problem.
LG-MMC: multiple label-kernel learning Firstly, LG-MMC can be rewritten as: For the inner optimization subproblem, let be the dual variable for each constraint. Its Lagrangian can be obtained as:
LG-MMC: multiple label-kernel learning (cont.) Setting its derivative w.r.t. to zero, we have Let be the simplex Replace the inner subproblem with its dual and one can have: Similar to single label learning, the above formulation can be regarded as multiple label-kernel learning.
Cutting Plane Algorithm Problem: Exponential number of possible labeling assignment –the set of base kernels is also exponential in size –direct multiple kernel learning (MKL) is computationally intractable Observation –only a subset of these constraints are active at optimality –cutting-plane method
Cutting Plane Algorithm 1. Initialize. Find the most violated y and set = {y,−y}. ( is the subset of constraints). 2. Run MKL for the subset of kernel matrices selected in. 3. Find the most violated y and set 4. Repeat steps 2-3 until convergence. How?
Cutting Plane Algorithm Step2: Multiple Label-Kernel Learning –Suppose that the current working set is –The feature map for the base kernel matrix : SimpleMKL 1. Fix and solve the SVM’s dual 2. Fix and use gradient method for updating 3. Iterate until converge
Cutting Plane Algorithm Step 3: Finding the most violated y Find the most violated y: Problem: Concave QP Observation: –The cutting plane algorithm only requires the addition of a violated constraint at each iteration –Replace the L2 norm above with infinity-norm
Cutting Plane Algorithm Step 3: Finding the most violated y Each of these is of the form: –Sort ‘s –Balance constraint
LG-MMC achieves tighter relaxation Consider the set of all feasible label matrices and two relaxations Convex hull
LG-MMC achieves tighter relaxation (cont.) Define One can find that –Maximum margin clustering is the same as –LG-MMC problem is the same as –SDP based MMC problem is the same as
LG-MMC achieves tighter relaxation (cont.) is the convex-hull of, which is the smallest convex set containing. –LG-MMC gives the tightest convex relaxation. It can be shown that is more relaxed than. –SDP MMC is a looser relaxation than the proposed formulation.
Outline Introduction The Proposed LG-MMC Method Experimental Results Conclusion
Experiments Data sets 17 UCI dataset MNIST dataset Implementation Matlab 7.6 Evaluation Misclassification error
Compared Methods k-means –One of most mature baseline methods Normalized Cut [Shi & Malik, PAMI00] –The first spectral based clustering method GMMC [Valizadegan & Jin, nips07] –One of the most efficient global methods for MMC IterSVR [Zhang et.al., icml07] –An efficient algorithm for MMC CPMMC [Zhao et.al., sdm08] –Another state-of-the-art efficient method for MMC
Clustering Error
Win-tie-loss Global method vs local method –Global method are better than local method. LG-MMC vs GMMC –LG-MMC is competitive to GMMC method. Win/tie/lossLocal method Global method15/2/2 Win/tie/lossGMMC LG-MMC7/0/3
Speed LG-MMC is about 10 times faster than GMMC However, In general, local methods are faster than global method.
Outline Introduction The Proposed LG-MMC Method Experiment Results Conclusion
Conclusion Main Contribution –In this paper, we propose a scalable and global optimization method for maximum margin clustering –To our best knowledge, it is first time to use label-generation strategy for clustering which might be useful in other domains Further work –In further, we will extend the proposed approach for semi-supervised learning. Thank you