Support Vector Machines part 2 21 March 2013 Some slides from F. Bach and Z. Harchaoui
Motivation Max-margin classification Classification with kernels Left image: http://www.sussex.ac.uk/Users/christ/crs/ml/lec08a.html Right image: http://www.cs.helsinki.fi/group/smart/teaching/58308109/niissaloPrint.pdf
Primal and Dual formulations Maximize where Subject to these constraints:
Primal vs Dual Formulations n (# samples) params. Can efficiently handle very high dimensional data No need for explicit features (kernel trick) Is not very efficient for very large data sets Need to store the support vectors Primal d (data dimension) parameters Efficient when the number of samples is high (millions) – stochastic grad. desc. Easy on memory (store only w and b)
SVM Kernel Functions K(a,b)=(a . b +1)d is an example of an SVM Kernel Function Beyond polynomials there are other very high dimensional basis functions that can be made practical by finding the right Kernel Function Radial-Basis-style Kernel Function: Bandwidth: from linear classifier to NN methos Copyright © 2001, 2003, Andrew W. Moore
Kernel Tricks Replacing dot product with a kernel function Not all functions are kernel functions Need to be decomposable K(a,b) = (a) (b) Could K(a,b) = (a-b)3 be a kernel function ? Could K(a,b) = (a-b)4 – (a+b)2 be a kernel function? Copyright © 2001, 2003, Andrew W. Moore
SVM for CSE 802 Project Copyright © 2001, 2003, Andrew W. Moore
SVM - FAQ Which formulation to use? How to set C? Which kernel to use? How to determine the kernel parameters? How to use SVM for multi-class problems (project)?
Multi-class / multi-label SVM Multi-class classification: each instance belong to one of the K classes Multi-label classification: each instance might belong to one or more members of the K classes
Multi-class / multi-label SVM For multi-class only!!!
OvsO vs OvsR OvR: K classifiers with n instances each OvO: K(K-1)/2 classifiers with O(2n/K) instances each (average) OvR: Data imbalance problem OvO: Hot to decide the winner
Data imbalance problem
A small experiment
A small experiment
SVM for CSE 802 Project
SVM for CSE 802 Project I have many feature or kernel parameter (type, bandwidth, degree) options and I am indecisive. I have many feature or kernel parameter (type, bandwidth, degree) options and I want to use them all somehow.
Gradient descend
Stochastic gradient descend
SVM - FAQ Which formulation to use? How to set C? Which kernel to use? How to determine the kernel parameters? How to use SVM for multi-class problems (project)?