Www.ntu.edu.sg Presented by: Mingkui Tan, Li Wang, Ivor W. Tsang School of Computer Engineering June 21-24, ICML2010 Haifa, Israel Learning Sparse SVM.

Slides:



Advertisements
Similar presentations
Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Lecture 9 Support Vector Machines
Christoph F. Eick Questions and Topics Review Dec. 10, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means.
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Support Vector Machine & Its Applications Abhishek Sharma Dept. of EEE BIT Mesra Aug 16, 2010 Course: Neural Network Professor: Dr. B.M. Karan Semester.
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang.
SVM—Support Vector Machines
Optimization Tutorial
Nonlinear Unsupervised Feature Learning How Local Similarities Lead to Global Coding Amirreza Shaban.
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Support Vector Machine
1 PEGASOS Primal Efficient sub-GrAdient SOlver for SVM Shai Shalev-Shwartz Yoram Singer Nati Srebro The Hebrew University Jerusalem, Israel YASSO = Yet.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Efficient Convex Relaxation for Transductive Support Vector Machine Zenglin Xu 1, Rong Jin 2, Jianke Zhu 1, Irwin King 1, and Michael R. Lyu 1 4. Experimental.
Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,
Support Vector Machines
A Kernel-based Support Vector Machine by Peter Axelberg and Johan Löfhede.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Support Vector Machine & Image Classification Applications
Support Vector Machine (SVM) Based on Nello Cristianini presentation
Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Nonlinear Learning Using Local Coordinate Coding K. Yu, T. Zhang and Y. Gong, NIPS 2009 Improved Local Coordinate Coding Using Local Tangents K. Yu and.
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Supervised Learning. CS583, Bing Liu, UIC 2 An example application An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc)
Dd Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems Generalized Optimal Kernel-based Ensemble Learning for HS Classification.
1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci
Consensus Group Stable Feature Selection
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Kernels Slides from Andrew Moore and Mingyue Tan.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Semi-Supervised Learning Using Label Mean
Support vector machines
Yu-Feng Li 1, James T. Kwok2, Ivor W. Tsang3 and Zhi-Hua Zhou1
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Support Vector Machines
Kernels Usman Roshan.
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Linear machines 28/02/2017.
Random feature for sparse signal classification
PEGASOS Primal Estimated sub-GrAdient Solver for SVM
Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee
Usman Roshan CS 675 Machine Learning
Support vector machines
Primal Sparse Max-Margin Markov Networks
University of Wisconsin - Madison
Presentation transcript:

Presented by: Mingkui Tan, Li Wang, Ivor W. Tsang School of Computer Engineering June 21-24, ICML2010 Haifa, Israel Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets Background Experimental Results on Huge Dimensional Problems In many machine learning applications, there is a great desire of sparsity with respect to input features. In this work, we propose a new Feature Generating Machine (FGM) to learn the sparsity of features. We propose a new sparse model and transform it into a MKL problem with exponential linear kernels via a mild convex relaxation. We propose to solve the relaxed problem by using cutting plane algorithm which incorporates the MKL learning. We prove that the proposed algorithm can globally converge within limited iterations. We show that FGM scales linearly both on dimensions and instances. Empirically, FGM shows great scalability for non-monotonic feature selection on large-scale and very high dimensional problems. Introduce a 0-1 vector d to control the status of features (“1” means selected, “0” means not). As shown as follows: Suppose we want to select B features, we obtain a new sparse model: It can be convex relaxed and converted to a MKL problem with exponential linear kernels: where. In general, Model Methods We propose to use cutting plane algorithm to solve the MKL problem with exponential linear kernels[4]: (1)Toy Experiments: Toy features are shown in the first two figures. For non-monotonic feature selection, suppose B=2, one needs to select f 1 and f 2 as the most informative features and get the best prediction accuracy. We gradually increase noise features. The SVM(IDEAL) denotes the results obtained by only using f 1 and f 2. Fig 1. Results on synthetic dataset with varying noise features (2) Large-scale Real Data Experiments: The experiments on news20.binary( ×9996) Arxiv astro-ph(99757×62369), rcv1.binary(47236×20242), real-sim(20958×32309), URL0( ×16000) and URL1( ×20000) are reported. The number in brackets are (dimensions × instances). The comparison with SVM-RFE [2] below shows the competitiveness of our method in terms of both prediction accuracies and training time. news20.binary Arxiv astro-ph rcv1.binary real-sim URL0 URL1 news20.binary Arxiv astro-ph rcv1.binary real-sim URL0 URL1 [1] Chen, J. and Ye, J. Training SVM with indefinite kernels. In ICML, [2] Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. Gene selection for cancer classification using support vector machines. Machine Learning, 46:389–422, [3] Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, S. S., and Sundararajan, S. A dual coordinate descent method for large-scale linear SVM. In ICML, [4] Li, Y.F., Tsang, I.W., Kwok, J.T., and Zhou, Z.H. Tighter and convex maximum margin clustering. In AISTATS, 2009b. … Convergence Property Theorem 1 : Assume that the sub-problem of MKL in step 2 and the most violated d selection in step 3 can be exactly solved, FGM can globally converge after a finite number of steps[1]. Theorem 2 : FGM scales linearly in computation with respect to n and m. In other words, FGM takes O(mn) time complexity [3]. References From Fig 1, FGM-B(FGM) shows the best performance in terms of prediction accuracy, sparsity and training time.