Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang.

Similar presentations


Presentation on theme: "Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang."— Presentation transcript:

1 http://lamda.nju.edu.cn Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) (liyf@lamda.nju.edu.cn) Ivor W. Tsang (NTU, Singapore) (IvorTsang@ntu.edu.sg) James T. Kwok (HKUST, Hong Kong) (jamesk@cse.ust.hk) Zhi-Hua Zhou (LAMDA, Nanjing University, China) (zhouzh@lamda.nju.edu.cn)

2 http://lamda.nju.edu.cn Summary Maximum Margin Clustering (MMC) [Xu et al., nips05] –inspired by the success of large margin criterion in SVM –the state-of-the-art performance in many clustering problems. The problem of existing methods –SDP relaxation: global but not scalable –Local search: efficient but non-convex We propose a convex LG-MMC method which is also scalable to large datasets via Label Generation strategy.

3 http://lamda.nju.edu.cn Outline Introduction The Proposed LG-MMC Method Experimental Results Conclusion

4 http://lamda.nju.edu.cn Outline Introduction The Proposed LG-MMC Method Experimental Results Conclusion

5 http://lamda.nju.edu.cn Maximum Margin Clustering [Xu et.al., NIPS05] Perform clustering (i.e., determining the unknown label y) by simultaneously finding maximum margin hyperplane in the data Setting –Given a set of unlabeled pattern Goal –Learn a decision function and a label vector Balance Constraint Margin Error

6 http://lamda.nju.edu.cn Maximum Margin Clustering [Xu et.al., NIPS05] The Dual problem Key –Some kind of relaxation maybe helpful Mixed integer program, intractable for large scale dataset 

7 http://lamda.nju.edu.cn Related work MMC with SDP relaxation [Xu et.al., nips05] –convex, state-of-the-art performance –Expensive: the worse-case O(n^6.5) Generalized MMC [Valizadegan & Jin, nips07] –a smaller SDP problem which speedup MMC by 100 times –Still expensive: cannot handle medium datasets Some efficient algorithms [Zhang et.al., icml07][Zhao et.al.,sdm08] –Much more scalable than global methods –Non-convex: may get struck in local minima To investigate a convex method which is also scalable for large datasets

8 http://lamda.nju.edu.cn Outline Introduction The Proposed LG-MMC Method Experiment Results Conclusion

9 http://lamda.nju.edu.cn Intuition ? ? ? ? ? ? ? ? ? ? 1 1 1 1 1 1 1 1 1 1 1 1 1 1 SVM hard efficient combination - Multiple label-kernel learning - yy’ : label-kernel

10 http://lamda.nju.edu.cn Flow Chart of LG-MMC LG-MMC: transform MMC problem to multiple label- kernel learning via minmax relaxation Cutting Plane Algorithm –multiple label-kernel learning –Finding the most violated y LG-MMC achieves tighter relaxation than SDP relaxation [Xu et al., nips05]

11 http://lamda.nju.edu.cn LG-MMC: Minmax relaxation of MMC problem –Consider interchanging the order of and, leading to: –According to the minmax theorem, the optimal objective of LG-MMC is upper bound of that of MMC problem.

12 http://lamda.nju.edu.cn LG-MMC: multiple label-kernel learning Firstly, LG-MMC can be rewritten as: For the inner optimization subproblem, let be the dual variable for each constraint. Its Lagrangian can be obtained as:

13 http://lamda.nju.edu.cn LG-MMC: multiple label-kernel learning (cont.) Setting its derivative w.r.t. to zero, we have Let be the simplex Replace the inner subproblem with its dual and one can have: Similar to single label learning, the above formulation can be regarded as multiple label-kernel learning.

14 http://lamda.nju.edu.cn Cutting Plane Algorithm Problem: Exponential number of possible labeling assignment –the set of base kernels is also exponential in size –direct multiple kernel learning (MKL) is computationally intractable Observation –only a subset of these constraints are active at optimality –cutting-plane method

15 http://lamda.nju.edu.cn Cutting Plane Algorithm 1. Initialize. Find the most violated y and set = {y,−y}. ( is the subset of constraints). 2. Run MKL for the subset of kernel matrices selected in. 3. Find the most violated y and set 4. Repeat steps 2-3 until convergence. How?

16 http://lamda.nju.edu.cn Cutting Plane Algorithm Step2: Multiple Label-Kernel Learning –Suppose that the current working set is –The feature map for the base kernel matrix : SimpleMKL 1. Fix and solve the SVM’s dual 2. Fix and use gradient method for updating 3. Iterate until converge

17 http://lamda.nju.edu.cn Cutting Plane Algorithm Step 3: Finding the most violated y Find the most violated y: Problem: Concave QP Observation: –The cutting plane algorithm only requires the addition of a violated constraint at each iteration –Replace the L2 norm above with infinity-norm

18 http://lamda.nju.edu.cn Cutting Plane Algorithm Step 3: Finding the most violated y Each of these is of the form: –Sort ‘s –Balance constraint

19 http://lamda.nju.edu.cn LG-MMC achieves tighter relaxation Consider the set of all feasible label matrices and two relaxations Convex hull

20 http://lamda.nju.edu.cn LG-MMC achieves tighter relaxation (cont.) Define One can find that –Maximum margin clustering is the same as –LG-MMC problem is the same as –SDP based MMC problem is the same as

21 http://lamda.nju.edu.cn LG-MMC achieves tighter relaxation (cont.) is the convex-hull of, which is the smallest convex set containing. –LG-MMC gives the tightest convex relaxation. It can be shown that is more relaxed than. –SDP MMC is a looser relaxation than the proposed formulation.

22 http://lamda.nju.edu.cn Outline Introduction The Proposed LG-MMC Method Experimental Results Conclusion

23 http://lamda.nju.edu.cn Experiments Data sets 17 UCI dataset MNIST dataset Implementation Matlab 7.6 Evaluation Misclassification error

24 http://lamda.nju.edu.cn Compared Methods k-means –One of most mature baseline methods Normalized Cut [Shi & Malik, PAMI00] –The first spectral based clustering method GMMC [Valizadegan & Jin, nips07] –One of the most efficient global methods for MMC IterSVR [Zhang et.al., icml07] –An efficient algorithm for MMC CPMMC [Zhao et.al., sdm08] –Another state-of-the-art efficient method for MMC

25 http://lamda.nju.edu.cn Clustering Error

26 http://lamda.nju.edu.cn Win-tie-loss Global method vs local method –Global method are better than local method. LG-MMC vs GMMC –LG-MMC is competitive to GMMC method. Win/tie/lossLocal method Global method15/2/2 Win/tie/lossGMMC LG-MMC7/0/3

27 http://lamda.nju.edu.cn Speed LG-MMC is about 10 times faster than GMMC However, In general, local methods are faster than global method.

28 http://lamda.nju.edu.cn Outline Introduction The Proposed LG-MMC Method Experiment Results Conclusion

29 http://lamda.nju.edu.cn Conclusion Main Contribution –In this paper, we propose a scalable and global optimization method for maximum margin clustering –To our best knowledge, it is first time to use label-generation strategy for clustering which might be useful in other domains Further work –In further, we will extend the proposed approach for semi-supervised learning. Thank you


Download ppt "Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang."

Similar presentations


Ads by Google