PEGASOS Primal Estimated sub-GrAdient Solver for SVM

Slides:



Advertisements
Similar presentations
Koby Crammer Department of Electrical Engineering
Advertisements

Introduction to Support Vector Machines (SVM)
Introduction to Machine Learning Fall 2013 Perceptron (6) Prof. Koby Crammer Department of Electrical Engineering Technion 1.
Regularization David Kauchak CS 451 – Fall 2013.
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Optimization Tutorial
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Separating Hyperplanes
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Lecture 4: Embedded methods
Forgetron Slide 1 Online Learning with a Memory Harness using the Forgetron Shai Shalev-Shwartz joint work with Ofer Dekel and Yoram Singer Large Scale.
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
1 PEGASOS Primal Efficient sub-GrAdient SOlver for SVM Shai Shalev-Shwartz Yoram Singer Nati Srebro The Hebrew University Jerusalem, Israel YASSO = Yet.
1-norm Support Vector Machines Good for Feature Selection  Solve the quadratic program for some : min s. t.,, denotes where or membership. Equivalent.
Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Presented by: Travis Desell.
Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,
Binary Classification Problem Learn a Classifier from the Training Set
Active Set Support Vector Regression
1 Multiple Kernel Learning Naouel Baili MRL Seminar, Fall 2009.
Large-scale Classification and Regression Shannon Quinn (with thanks to J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
Online Learning Algorithms
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,
Online Passive-Aggressive Algorithms Shai Shalev-Shwartz joint work with Koby Crammer, Ofer Dekel & Yoram Singer The Hebrew University Jerusalem, Israel.
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
Classification & Regression Part II
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Biointelligence Laboratory, Seoul National University
Online Learning Rong Jin. Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Learning from Big Data Lecture 5
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Presented by: Mingkui Tan, Li Wang, Ivor W. Tsang School of Computer Engineering June 21-24, ICML2010 Haifa, Israel Learning Sparse SVM.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:
Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.
Linear machines márc Decison surfaces We focus now on the decision surfaces Linear machines = linear decision surface Non-optimal solution but.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Neural networks and support vector machines
Support vector machines
LECTURE 11: Advanced Discriminant Analysis
Learning Recommender Systems with Adaptive Regularization
Dan Roth Department of Computer and Information Science
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Jan Rupnik Jozef Stefan Institute
Kernels Usman Roshan.
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Privacy-Preserving Classification
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Linear machines 28/02/2017.
Probabilistic Models with Latent Variables
CSCI B609: “Foundations of Data Science”
Aviv Rosenberg 10/01/18 Seminar on Experts and Bandits
Large Scale Support Vector Machines
The following slides are taken from:
Kai-Wei Chang University of Virginia
Support vector machines
CS480/680: Intro to ML Lecture 01: Perceptron 9/11/18 Yao-Liang Yu.
Usman Roshan CS 675 Machine Learning
Support vector machines
Mathematical Foundations of BME Reza Shadmehr
Support vector machines
CS639: Data Management for Data Science
Linear Discrimination
Primal Sparse Max-Margin Markov Networks
Logistic Regression Geoff Hulten.
Presentation transcript:

PEGASOS Primal Estimated sub-GrAdient Solver for SVM Ming TIAN 04-20-2012

Reference [1] Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for svm. ICML, 807-814. Mathematical Programming, Series B, 127(1):3-30, 2011. [2] Zhuang Wang, Koby Crammer, Slobodan Vucetic (2010). Multi-Class Pegasos on a Budget. ICML. [3] Crammer, K & Singer. Y. (2001). On the algorithmic implemen- tation of multiclass kernel-based vector machines. JMLR, 2, 262-292. [4] Crammer, K., Kandola, J. & Singer, Y. (2004). Online classifi- cation on a budget. NIPS, 16, 225-232.

Outline Review of SVM optimization The Pegasos algorithm Multi-Class Pegasos on a Budget Further works

Outline Review of SVM optimization The Pegasos algorithm Multi-Class Pegasos on a Budget Further works

Review of SVM optimization Q1: Regularization term Empirical loss

Review of SVM optimization

Review of SVM optimization Dual-based methods Interior Point methods Memory: m2, time: m3, log(log(1/)) Decomposition methods Memory: m, Time: super-linear in m Online learning & Stochastic Gradient Memory: O(1), Time: 1/2 (linear kernel) Memory: 1/2, Time: 1/4 (non-linear kernel) Typically, online learning algorithms do not converge to the optimal solution of SVM Better rates for finite dimensional instances (Murata, Bottou)

Outline Review of SVM optimization The Pegasos algorithm Multi-Class Pegasos on a Budget Further works

PEGASOS A_t = S Subgradient method |A_t| = 1 Stochastic gradient Projection

Run-Time of Pegasos Choosing |At|=1 and a linear kernel over Rn  Run-time required for Pegasos to find  accurate solution with probability 1- Run-time does not depend on #examples Depends on “difficulty” of problem ( and )

Formal Properties Definition: w is  accurate if Theorem 1: Pegasos finds  accurate solution w.p. 1- after at most iterations. Theorem 2: Pegasos finds log(1/) solutions s.t. w.p. 1-, at least one of them is  accurate after iterations

Proof Sketch A second look on the update step:

Proof Sketch Denote: Logarithmic Regret for OCP Take expectation: f(wr)-f(w*) 0  Markov gives that w.p. 1- Amplify the confidence

Proof Sketch

Proof Sketch A function f is called strongly convex if is a convex function.

Proof Sketch

Proof Sketch

Experiments 3 datasets (provided by Joachims) Reuters CCAT (800K examples, 47k features) Physics ArXiv (62k examples, 100k features) Covertype (581k examples, 54 features) 4 competing algorithms SVM-light (Joachims) SVM-Perf (Joachims’06) Norma (Kivinen, Smola, Williamson ’02) Zhang’04 (stochastic gradient descent)

Training Time (in seconds) Pegasos SVM-Perf SVM-Light Reuters 2 77 20,075 Covertype 6 85 25,514 Astro-Physics 5 80

Compare to Norma (on Physics) obj. value test error

Compare to Zhang (on Physics) Objective But, tuning the parameter is more expensive than learning …

Effect of k=|At| when T is fixed Objective

Effect of k=|At| when kT is fixed Objective

bias term Popular approach: increase dimension of x Cons: “pay” for b in the regularization term Calculate subgradients w.r.t. w and w.r.t b: Cons: convergence rate is 1/2 Define: Cons: |At| need to be large Search b in an outer loop Cons: evaluating objective is 1/2

Outline Review of SVM optimization The Pegasos algorithm Multi-Class Pegasos on a Budget Further works

multi-class SVM (Crammer & Singer, 2001) multi-class model :

multi-class SVM (Crammer & Singer, 2001) multi-class SVM objective function: where and the multi-class hinge-loss function is defined as: where

multi-class Pegasos use the instantaneous objective function : multi-class Pegasos works by iteratively executing the two-step updates : Step 1: Where:

multi-class Pegasos If loss is equal to zero then: Else: Step 2: project the weight wt+1 into the closed convex set:

Budgeted Multi-Class Pegasos

Budget Maintenance Strategies Budget maintenance through removal the optimal removal always selects the oldest SV Budget maintenance through projection projecting an SV onto all the remaining SVs and thus results in smaller weight degradation. Budget maintenance through Merging merging two SVs to a newly created one The total cost of finding the optimal merging for the n-th and m-th SV is O(1).

Experiments

Outline Review of SVM optimization The Pegasos algorithm Multi-Class Pegasos on a Budget Further works

Further works Distribution_aware Pegasos? Online structural regularized SVM?

Thanks! Q&A