Download presentation
Presentation is loading. Please wait.
Published bySharon Hosfield Modified over 9 years ago
1
http://lamda.nju.edu.cn Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) (liyf@lamda.nju.edu.cn) Ivor W. Tsang (NTU, Singapore) (IvorTsang@ntu.edu.sg) James T. Kwok (HKUST, Hong Kong) (jamesk@cse.ust.hk) Zhi-Hua Zhou (LAMDA, Nanjing University, China) (zhouzh@lamda.nju.edu.cn)
2
http://lamda.nju.edu.cn Summary Maximum Margin Clustering (MMC) [Xu et al., nips05] –inspired by the success of large margin criterion in SVM –the state-of-the-art performance in many clustering problems. The problem of existing methods –SDP relaxation: global but not scalable –Local search: efficient but non-convex We propose a convex LG-MMC method which is also scalable to large datasets via Label Generation strategy.
3
http://lamda.nju.edu.cn Outline Introduction The Proposed LG-MMC Method Experimental Results Conclusion
4
http://lamda.nju.edu.cn Outline Introduction The Proposed LG-MMC Method Experimental Results Conclusion
5
http://lamda.nju.edu.cn Maximum Margin Clustering [Xu et.al., NIPS05] Perform clustering (i.e., determining the unknown label y) by simultaneously finding maximum margin hyperplane in the data Setting –Given a set of unlabeled pattern Goal –Learn a decision function and a label vector Balance Constraint Margin Error
6
http://lamda.nju.edu.cn Maximum Margin Clustering [Xu et.al., NIPS05] The Dual problem Key –Some kind of relaxation maybe helpful Mixed integer program, intractable for large scale dataset
7
http://lamda.nju.edu.cn Related work MMC with SDP relaxation [Xu et.al., nips05] –convex, state-of-the-art performance –Expensive: the worse-case O(n^6.5) Generalized MMC [Valizadegan & Jin, nips07] –a smaller SDP problem which speedup MMC by 100 times –Still expensive: cannot handle medium datasets Some efficient algorithms [Zhang et.al., icml07][Zhao et.al.,sdm08] –Much more scalable than global methods –Non-convex: may get struck in local minima To investigate a convex method which is also scalable for large datasets
8
http://lamda.nju.edu.cn Outline Introduction The Proposed LG-MMC Method Experiment Results Conclusion
9
http://lamda.nju.edu.cn Intuition ? ? ? ? ? ? ? ? ? ? 1 1 1 1 1 1 1 1 1 1 1 1 1 1 SVM hard efficient combination - Multiple label-kernel learning - yy’ : label-kernel
10
http://lamda.nju.edu.cn Flow Chart of LG-MMC LG-MMC: transform MMC problem to multiple label- kernel learning via minmax relaxation Cutting Plane Algorithm –multiple label-kernel learning –Finding the most violated y LG-MMC achieves tighter relaxation than SDP relaxation [Xu et al., nips05]
11
http://lamda.nju.edu.cn LG-MMC: Minmax relaxation of MMC problem –Consider interchanging the order of and, leading to: –According to the minmax theorem, the optimal objective of LG-MMC is upper bound of that of MMC problem.
12
http://lamda.nju.edu.cn LG-MMC: multiple label-kernel learning Firstly, LG-MMC can be rewritten as: For the inner optimization subproblem, let be the dual variable for each constraint. Its Lagrangian can be obtained as:
13
http://lamda.nju.edu.cn LG-MMC: multiple label-kernel learning (cont.) Setting its derivative w.r.t. to zero, we have Let be the simplex Replace the inner subproblem with its dual and one can have: Similar to single label learning, the above formulation can be regarded as multiple label-kernel learning.
14
http://lamda.nju.edu.cn Cutting Plane Algorithm Problem: Exponential number of possible labeling assignment –the set of base kernels is also exponential in size –direct multiple kernel learning (MKL) is computationally intractable Observation –only a subset of these constraints are active at optimality –cutting-plane method
15
http://lamda.nju.edu.cn Cutting Plane Algorithm 1. Initialize. Find the most violated y and set = {y,−y}. ( is the subset of constraints). 2. Run MKL for the subset of kernel matrices selected in. 3. Find the most violated y and set 4. Repeat steps 2-3 until convergence. How?
16
http://lamda.nju.edu.cn Cutting Plane Algorithm Step2: Multiple Label-Kernel Learning –Suppose that the current working set is –The feature map for the base kernel matrix : SimpleMKL 1. Fix and solve the SVM’s dual 2. Fix and use gradient method for updating 3. Iterate until converge
17
http://lamda.nju.edu.cn Cutting Plane Algorithm Step 3: Finding the most violated y Find the most violated y: Problem: Concave QP Observation: –The cutting plane algorithm only requires the addition of a violated constraint at each iteration –Replace the L2 norm above with infinity-norm
18
http://lamda.nju.edu.cn Cutting Plane Algorithm Step 3: Finding the most violated y Each of these is of the form: –Sort ‘s –Balance constraint
19
http://lamda.nju.edu.cn LG-MMC achieves tighter relaxation Consider the set of all feasible label matrices and two relaxations Convex hull
20
http://lamda.nju.edu.cn LG-MMC achieves tighter relaxation (cont.) Define One can find that –Maximum margin clustering is the same as –LG-MMC problem is the same as –SDP based MMC problem is the same as
21
http://lamda.nju.edu.cn LG-MMC achieves tighter relaxation (cont.) is the convex-hull of, which is the smallest convex set containing. –LG-MMC gives the tightest convex relaxation. It can be shown that is more relaxed than. –SDP MMC is a looser relaxation than the proposed formulation.
22
http://lamda.nju.edu.cn Outline Introduction The Proposed LG-MMC Method Experimental Results Conclusion
23
http://lamda.nju.edu.cn Experiments Data sets 17 UCI dataset MNIST dataset Implementation Matlab 7.6 Evaluation Misclassification error
24
http://lamda.nju.edu.cn Compared Methods k-means –One of most mature baseline methods Normalized Cut [Shi & Malik, PAMI00] –The first spectral based clustering method GMMC [Valizadegan & Jin, nips07] –One of the most efficient global methods for MMC IterSVR [Zhang et.al., icml07] –An efficient algorithm for MMC CPMMC [Zhao et.al., sdm08] –Another state-of-the-art efficient method for MMC
25
http://lamda.nju.edu.cn Clustering Error
26
http://lamda.nju.edu.cn Win-tie-loss Global method vs local method –Global method are better than local method. LG-MMC vs GMMC –LG-MMC is competitive to GMMC method. Win/tie/lossLocal method Global method15/2/2 Win/tie/lossGMMC LG-MMC7/0/3
27
http://lamda.nju.edu.cn Speed LG-MMC is about 10 times faster than GMMC However, In general, local methods are faster than global method.
28
http://lamda.nju.edu.cn Outline Introduction The Proposed LG-MMC Method Experiment Results Conclusion
29
http://lamda.nju.edu.cn Conclusion Main Contribution –In this paper, we propose a scalable and global optimization method for maximum margin clustering –To our best knowledge, it is first time to use label-generation strategy for clustering which might be useful in other domains Further work –In further, we will extend the proposed approach for semi-supervised learning. Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.