Support vector machines

Slides:



Advertisements
Similar presentations
3.6 Support Vector Machines
Advertisements

Introduction to Support Vector Machines (SVM)
Support Vector Machine
Lecture 9 Support Vector Machines
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
Support vector machine
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Separating Hyperplanes
Support Vector Machines (and Kernel Methods in general)
Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!
Classification and risk prediction Usman Roshan. Disease risk prediction What is the best method to predict disease risk? –We looked at the maximum likelihood.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machine (SVM) Classification
Support Vector Machines and Kernel Methods
SVM Support Vectors Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Support Vector Machines
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines Tao Department of computer science University of Illinois.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Machine learning optimization Usman Roshan. Machine learning Two components: – Modeling – Optimization Modeling – Generative: we assume a probabilistic.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Support Vector Machines
Support vector machines
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
Support Vector Machine
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Large Margin classifiers
Computational Intelligence: Methods and Applications
Geometrical intuition behind the dual problem
Support Vector Machines
Regularized risk minimization
An Introduction to Support Vector Machines
Kernels Usman Roshan.
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Statistical Learning Dong Liu Dept. EEIS, USTC.
Support Vector Machines Most of the slides were taken from:
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
Support vector machines
Support Vector Machines
Usman Roshan CS 675 Machine Learning
Support vector machines
SVMs for Document Ranking
Support Vector Machines 2
Presentation transcript:

Support vector machines Usman Roshan

Separating hyperplanes For two sets of points there are many hyperplane separators Which one should we choose for classification? In other words which one is most likely to produce least error? y x

Theoretical foundation Margin error theorem (7.3 from Learning with kernels)

Separating hyperplanes Best hyperplane is the one that maximizes the minimum distance of all training points to the plane (Learning with kernels, Scholkopf and Smola, 2002) Its expected error is at most the fraction of misclassified points plus a complexity term (Learning with kernels, Scholkopf and Smola, 2002)

Margin of a plane We define the margin as the minimum distance to training points (distance to closest point) The optimally separating plane is the one with the maximum margin y x

Optimally separating hyperplane   w x

Optimally separating hyperplane How do we find the optimally separating hyperplane? Recall distance of a point to the plane defined earlier

Hyperplane separators   r xp x w

Distance of a point to the separating plane And so the distance to the plane r is given by or where y is -1 if the point is on the left side of the plane and +1 otherwise.

Support vector machine: optimally separating hyperplane Distance of point x (with label y) to the hyperplane is given by We want this to be at least some value By scaling w we can obtain infinite solutions. Therefore we require that So we minimize ||w|| to maximize the distance which gives us the SVM optimization problem.

Support vector machine: optimally separating hyperplane SVM optimization criterion (primal form) We can solve this with Lagrange multipliers. That tells us that The xi for which αi is non-zero are called support vectors.

SVM dual problem Let L be the Lagrangian: Solving dL/dw=0 and dL/dw0=0 gives us the dual form

Support vector machine: optimally separating hyperplane

Another look at the SVM objective Consider the objective: This is non-convex and harder to solve than the convex SVM objective Roughly it measures the total sum of distances of points to the plane. Misclassified points are given a negative distance whereas correct ones have a 0 distance.

SVM objective The SVM objective is Compare this with The SVM objective can be viewed as a convex approximation of this by separating the numerator and denominator The SVM objective is equivalent to minimizing the regularized hinge loss:

Inseparable case What is there is no separating hyperplane? For example XOR function. One solution: consider all hyperplanes and select the one with the minimal number of misclassified points Unfortunately NP-complete (see paper by Ben-David, Eiron, Long on course website) Even NP-complete to polynomially approximate (Learning with kernels, Scholkopf and Smola, and paper on website)

Inseparable case But if we measure error as the sum of the distance of misclassified points to the plane then we can solve for a support vector machine in polynomial time Roughly speaking margin error bound theorem applies (Theorem 7.3, Scholkopf and Smola) Note that total distance error can be considerably larger than number of misclassified points

0/1 loss vs distance based

Optimally separating hyperplane with errors       w x

Support vector machine: optimally separating hyperplane In practice we allow for error terms in case there is no hyperplane.

SVM software Plenty of SVM software out there. Two popular packages: SVM-light LIBSVM

Kernels What if no separating hyperplane exists? Consider the XOR function. In a higher dimensional space we can find a separating hyperplane Example with SVM-light

Kernels The solution to the SVM is obtained by applying KKT rules (a generalization of Lagrange multipliers). The problem to solve becomes

Kernels The previous problem can be solved in turn again with KKT rules. The dot product can be replaced by a matrix K(i,j)=xiTxj or a positive definite matrix K.

Kernels With the kernel approach we can avoid explicit calculation of features in high dimensions How do we find the best kernel? Multiple Kernel Learning (MKL) solves it for K as a linear combination of base kernels.