A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert Samy Bengio Yoshua Bengio Prepared : S.Y.C. Neural Information Processing Systems,

Slides:



Advertisements
Similar presentations
(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
Advertisements

ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
IMAN SAUDY UMUT OGUR NORBERT KISS GEORGE TEPES-NICA BARLEY SEEDS CLASSIFICATION.
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences.
Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008.
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Support Vector Machines (and Kernel Methods in general)
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Presented by: Travis Desell.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Support Vector Machines for Multiple- Instance Learning Authors: Andrews, S.; Tsochantaridis, I. & Hofmann, T. (Advances in Neural Information Processing.
Deep Belief Networks for Spam Filtering
Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem:
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
A Study of the Relationship between SVM and Gabriel Graph ZHANG Wan and Irwin King, Multimedia Information Processing Laboratory, Department of Computer.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
1 Music Classification Using SVM Ming-jen Wang Chia-Jiu Wang.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Trading Convexity for Scalability Marco A. Alvarez CS7680 Department of Computer Science Utah State University.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Mathematical Programming in Support Vector Machines
An Introduction to Support Vector Machines Martin Law.
Soft Margin Estimation for Speech Recognition Main Reference: Jinyu Li, " SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION," PhD thesis, Georgia.
Approximating the Algebraic Solution of Systems of Interval Linear Equations with Use of Neural Networks Nguyen Hoang Viet Michal Kleiber Institute of.
Efficient Model Selection for Support Vector Machines
Active Learning for Class Imbalance Problem
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所.
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
An Introduction to Support Vector Machines (M. Law)
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Soft Computing Lecture 8 Using of perceptron for image recognition and forecasting.
1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Web-Mining Agents Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Übungen)
SVMs in a Nutshell.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Support Vector Machines Optimization objective Machine Learning.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
Present by: Fang-Hui Chu Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition Fei Sha*, Lawrence K. Saul University of Pennsylvania.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Jan Rupnik Jozef Stefan Institute
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Hyperparameters, bias-variance tradeoff, validation
CSE 802. Prepared by Martin Law
Linear Discrimination
Presentation transcript:

A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert Samy Bengio Yoshua Bengio Prepared : S.Y.C. Neural Information Processing Systems, Vol.17,MIT Press,2004

S.Y.C.2 Abstract Support Vector Machines (SVMs) are currently the state-of-the-art models for many classification problems It is hopeless to try to solve real-life problems having more than a few hundreds of thousands examples with SVMs.

S.Y.C.3 Abstract (Cont.) The present paper proposes a new mixture of SVMs that can be easily implemented in parallel and where each SVM is trained on a small subset of the whole dataset.

S.Y.C.4 Outline Introduction Introduction to SVM A New Conditional Mixture of SVMs Experiments Conclusion

S.Y.C.5 Introduction SVMs require to solve a quadratic optimization problem which needs resources that are at least quadratic in the number of training examples, and it is thus hopeless to try solving problems having millions of examples using classical SVMs.

S.Y.C.6 Introduction (Cont.) We propose a mixture of several SVMs, each of them trained only on a part of the dataset.

S.Y.C.7 Introduction to SVM training data ( x i, y i ), i =1,…,l.  x i, y i , x i  R n y i = 1 x i  class 1 x i  class 2

S.Y.C.8 Introduction to SVM (Cont.) Making decision boundary

S.Y.C.9 Introduction to SVM (Cont.) Large-margin Decision Boundary

S.Y.C.10 A New Conditional Mixture of SVMs The output of the mixture for an input vector x is computed as follows: In the proposed model, the gater is trained to minimize the cost function

S.Y.C.11 A New Conditional Mixture of SVMs (Cont.) To train this model algorithm –Divide the training set into M random subsets of size near N/M. –Train each expert separately over one of these subsets. –Keeping the experts fixed, train the gater to minimize on the whole training set.

S.Y.C.12 A New Conditional Mixture of SVMs (Cont.) –Reconstruct M subsets –If a termination criterion is not fullled (such as a given number of iterations or a validation error going up), goto step 2.

S.Y.C.13 Experiments A Toy Problem

S.Y.C.14 Experiments (Cont.) A Large-Scale Realistic Problem –Kept a separate test set of 50,000 examples –Used a validation set of 10,000 examples to select the best mixture of SVMs, –Trained our models on dierent training sets. –The mixtures had from 10 to 50 expert SVMs with Gaussian kernel and the gater was an MLP with between 25 and 500 hidden units.

S.Y.C.15 Experiments (Cont.) Comparison of performance

S.Y.C.16 Experiments (Cont.) Comparison of the validation error of different mixtures of SVMs with various number of hidden units and experts.

S.Y.C.17 Experiments (Cont.) Verication on Another Large-Scale Problem

S.Y.C.18 Experiments (Cont.)

S.Y.C.19 Conclusion In this paper we have presented a new algorithm to train a mixture of SVMs We conjecture it to be the case for very large data sets, then the whole method is clearly sub- quadratic in training time with respect to the number of training examples.