Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang.

Slides:



Advertisements
Similar presentations
Primal Dual Combinatorial Algorithms Qihui Zhu May 11, 2009.
Advertisements

Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz.
Fast SDP Relaxations of Graph Cut Clustering, Transduction, and Other Combinatorial Problems (JMLR 2006) Tijl De Bie and Nello Cristianini Presented by.
by Rianto Adhy Sasongko Supervisor: Dr.J.C.Allwright
Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008.
Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.
Separating Hyperplanes
Ziming Zhang*, Ze-Nian Li, Mark Drew School of Computing Science Simon Fraser University Vancouver, Canada {zza27, li, AdaMKL: A Novel.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Learning with Cost Intervals Xu-Ying Liu and Zhi-Hua Zhou LAMDA Group National Key Laboratory for Novel Software Technology Nanjing.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Efficient Convex Relaxation for Transductive Support Vector Machine Zenglin Xu 1, Rong Jin 2, Jianke Zhu 1, Irwin King 1, and Michael R. Lyu 1 4. Experimental.
1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.
Chapter 10: Iterative Improvement
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and.
Support Vector Machines
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,
Active Set Support Vector Regression
1 Multiple Kernel Learning Naouel Baili MRL Seminar, Fall 2009.
An Introduction to Support Vector Machines Martin Law.
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
SVM by Sequential Minimal Optimization (SMO)
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
Scalable Symbolic Model Order Reduction Yiyu Shi*, Lei He* and C. J. Richard Shi + *Electrical Engineering Department, UCLA + Electrical Engineering Department,
An Introduction to Support Vector Machines (M. Law)
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
An Efficient Greedy Method for Unsupervised Feature Selection
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Dd Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems Generalized Optimal Kernel-based Ensemble Learning for HS Classification.
Support Vector Machines
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Online Learning of Maximum Margin Classifiers Kohei HATANO Kyusyu University (Joint work with K. Ishibashi and M. Takeda) p-Norm with Bias COLT 2008.
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Presented by: Mingkui Tan, Li Wang, Ivor W. Tsang School of Computer Engineering June 21-24, ICML2010 Haifa, Israel Learning Sparse SVM.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Unsupervised Streaming Feature Selection in Social Media
Ultra-high dimensional feature selection Yun Li
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Semi-Supervised Learning Using Label Mean
Semi-Supervised Clustering
Data Driven Resource Allocation for Distributed Learning
Yu-Feng Li 1, James T. Kwok2, Ivor W. Tsang3 and Zhi-Hua Zhou1
Nonnegative polynomials and applications to learning
An Introduction to Support Vector Machines
Chapter 6. Large Scale Optimization
Non-Negative Matrix Factorization
Chapter 6. Large Scale Optimization
Presentation transcript:

Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang (NTU, Singapore) James T. Kwok (HKUST, Hong Kong) Zhi-Hua Zhou (LAMDA, Nanjing University, China)

Summary Maximum Margin Clustering (MMC) [Xu et al., nips05] –inspired by the success of large margin criterion in SVM –the state-of-the-art performance in many clustering problems. The problem of existing methods –SDP relaxation: global but not scalable –Local search: efficient but non-convex We propose a convex LG-MMC method which is also scalable to large datasets via Label Generation strategy.

Outline Introduction The Proposed LG-MMC Method Experimental Results Conclusion

Outline Introduction The Proposed LG-MMC Method Experimental Results Conclusion

Maximum Margin Clustering [Xu et.al., NIPS05] Perform clustering (i.e., determining the unknown label y) by simultaneously finding maximum margin hyperplane in the data Setting –Given a set of unlabeled pattern Goal –Learn a decision function and a label vector Balance Constraint Margin Error

Maximum Margin Clustering [Xu et.al., NIPS05] The Dual problem Key –Some kind of relaxation maybe helpful Mixed integer program, intractable for large scale dataset 

Related work MMC with SDP relaxation [Xu et.al., nips05] –convex, state-of-the-art performance –Expensive: the worse-case O(n^6.5) Generalized MMC [Valizadegan & Jin, nips07] –a smaller SDP problem which speedup MMC by 100 times –Still expensive: cannot handle medium datasets Some efficient algorithms [Zhang et.al., icml07][Zhao et.al.,sdm08] –Much more scalable than global methods –Non-convex: may get struck in local minima To investigate a convex method which is also scalable for large datasets

Outline Introduction The Proposed LG-MMC Method Experiment Results Conclusion

Intuition ? ? ? ? ? ? ? ? ? ? SVM hard efficient combination - Multiple label-kernel learning - yy’ : label-kernel

Flow Chart of LG-MMC LG-MMC: transform MMC problem to multiple label- kernel learning via minmax relaxation Cutting Plane Algorithm –multiple label-kernel learning –Finding the most violated y LG-MMC achieves tighter relaxation than SDP relaxation [Xu et al., nips05]

LG-MMC: Minmax relaxation of MMC problem –Consider interchanging the order of and, leading to: –According to the minmax theorem, the optimal objective of LG-MMC is upper bound of that of MMC problem.

LG-MMC: multiple label-kernel learning Firstly, LG-MMC can be rewritten as: For the inner optimization subproblem, let be the dual variable for each constraint. Its Lagrangian can be obtained as:

LG-MMC: multiple label-kernel learning (cont.) Setting its derivative w.r.t. to zero, we have Let be the simplex Replace the inner subproblem with its dual and one can have: Similar to single label learning, the above formulation can be regarded as multiple label-kernel learning.

Cutting Plane Algorithm Problem: Exponential number of possible labeling assignment –the set of base kernels is also exponential in size –direct multiple kernel learning (MKL) is computationally intractable Observation –only a subset of these constraints are active at optimality –cutting-plane method

Cutting Plane Algorithm 1. Initialize. Find the most violated y and set = {y,−y}. ( is the subset of constraints). 2. Run MKL for the subset of kernel matrices selected in. 3. Find the most violated y and set 4. Repeat steps 2-3 until convergence. How?

Cutting Plane Algorithm Step2: Multiple Label-Kernel Learning –Suppose that the current working set is –The feature map for the base kernel matrix : SimpleMKL 1. Fix and solve the SVM’s dual 2. Fix and use gradient method for updating 3. Iterate until converge

Cutting Plane Algorithm Step 3: Finding the most violated y Find the most violated y: Problem: Concave QP Observation: –The cutting plane algorithm only requires the addition of a violated constraint at each iteration –Replace the L2 norm above with infinity-norm

Cutting Plane Algorithm Step 3: Finding the most violated y Each of these is of the form: –Sort ‘s –Balance constraint

LG-MMC achieves tighter relaxation Consider the set of all feasible label matrices and two relaxations Convex hull

LG-MMC achieves tighter relaxation (cont.) Define One can find that –Maximum margin clustering is the same as –LG-MMC problem is the same as –SDP based MMC problem is the same as

LG-MMC achieves tighter relaxation (cont.) is the convex-hull of, which is the smallest convex set containing. –LG-MMC gives the tightest convex relaxation. It can be shown that is more relaxed than. –SDP MMC is a looser relaxation than the proposed formulation.

Outline Introduction The Proposed LG-MMC Method Experimental Results Conclusion

Experiments Data sets 17 UCI dataset MNIST dataset Implementation Matlab 7.6 Evaluation Misclassification error

Compared Methods k-means –One of most mature baseline methods Normalized Cut [Shi & Malik, PAMI00] –The first spectral based clustering method GMMC [Valizadegan & Jin, nips07] –One of the most efficient global methods for MMC IterSVR [Zhang et.al., icml07] –An efficient algorithm for MMC CPMMC [Zhao et.al., sdm08] –Another state-of-the-art efficient method for MMC

Clustering Error

Win-tie-loss Global method vs local method –Global method are better than local method. LG-MMC vs GMMC –LG-MMC is competitive to GMMC method. Win/tie/lossLocal method Global method15/2/2 Win/tie/lossGMMC LG-MMC7/0/3

Speed LG-MMC is about 10 times faster than GMMC However, In general, local methods are faster than global method.

Outline Introduction The Proposed LG-MMC Method Experiment Results Conclusion

Conclusion Main Contribution –In this paper, we propose a scalable and global optimization method for maximum margin clustering –To our best knowledge, it is first time to use label-generation strategy for clustering which might be useful in other domains Further work –In further, we will extend the proposed approach for semi-supervised learning. Thank you