Support Vector Machine

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machines
Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
An Introduction of Support Vector Machine
Support Vector Machines
SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machines
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
SUPPORT VECTOR MACHINES. Intresting Statistics: Vladmir Vapnik invented Support Vector Machines in SVM have been developed in the framework of Statistical.
An Introduction to Support Vector Machine (SVM)
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
An Introduction of Support Vector Machine Courtesy of Jinwei Gu.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
SUPPORT VECTOR MACHINES
Support Vector Machines
Support vector machines
Support Vector Machines
PREDICT 422: Practical Machine Learning
Omer Boehm A tutorial about SVM Omer Boehm
Support Vector Machines
Support Vector Machines
Support Vector Machines
An Introduction to Support Vector Machines
Kernels Usman Roshan.
Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Statistical Learning Dong Liu Dept. EEIS, USTC.
CSSE463: Image Recognition Day 14
COSC 4335: Other Classification Techniques
Support Vector Machines
Support vector machines
Machine Learning Week 3.
Support Vector Machines
Support Vector Machines and Kernels
Usman Roshan CS 675 Machine Learning
Support vector machines
Support vector machines
COSC 4368 Machine Learning Organization
SVMs for Document Ranking
Support Vector Machines Kernels
Presentation transcript:

Support Vector Machine A Brief Introduction

Maximal-Margin Classification (I) Consider a 2-class problem in Rd As needed (and without loss of generality), relabel the classes to -1 and +1 Suppose we have a separating hyperplane Its equation is: w.x + b = 0 w is normal to the hyperplane |b|/||w|| is the perpendicular distance from the hyperplane to the origin ||w|| is the Euclidean norm of w

Maximal-Margin Classification (II) We can certainly choose w and b in such a way that: w.xi + b > 0 when yi = +1 w.xi + b < 0 when yi = -1 Rescaling w and b so that the closest points to the hyperplane satisfy |w.xi + b| = 1 , we can rewrite the above to w.xi + b ≥ +1 when yi = +1 (1) w.xi + b ≤ -1 when yi = -1 (2)

Maximal-Margin Classification (III) Consider the case when (1) is an equality w.xi + b = +1 (H+) Normal w Distance from origin |1-b|/||w|| Similarly for (2) w.xi + b = -1 (H-) Distance from origin |-1-b|/||w|| We now have two hyperplanes (// to original)

Maximal-Margin Classification (IV)

Maximal-Margin Classification (V) Note that the points on H- and H+ are sufficient to define H- and H+ and therefore are sufficient to build a linear classifier Define the margin as the distance between H- and H+ What would be a good choice for w and b? Maximize the margin

Maximal-Margin Classification (VI) From the equations of H- and H+, we have Margin = |1-b|/||w|| - |-1-b|/||w|| = 2/||w|| So, we can maximize the margin by: Minimizing ||w||2 Subject to: yi(w.xi + b) + 1 ≥ 0 (see (1) and (2) above)

Minimizing ||w||2 Use Lagrange multipliers for each constraint (1 per training instance) For constraints of the form ci ≥ 0 (see above) The constraint equations are multiplied by positive Lagrange multipliers, and Subtracted from the objective function Hence, we have the Lagrangian

Maximizing LD It turns out, after some transformations beyond the scope of our discussion that minimizing LP is equivalent to maximizing the following dual Lagrangian: Where <xi,xj> denotes the dot product subject to:

SVM Learning (I) We could stop here and we would have a nice linear classification algorithm. SVM goes one step further: It assumes that non-linearly separable problems in low dimensions may become linearly separable in higher dimensions (e.g., XOR)

SVM Learning (II) SVM thus: Creates a non-linear mapping from the low dimensional space to a higher dimensional space Uses MM learning in the new space Computation is efficient when “good” transformations are selected (typically, combinations of existing dimensions) The kernel trick

Choosing a Transformation (I) Recall the formula for LD Note that it involves a dot product Expensive to compute in high dimensions What if we did not have to?

Choosing a Transformation (II) It turns out that it is possible to design transformations φ such that: <φ(x), φ(y)> can be expressed in terms of <x,y> Hence, one needs only compute in the original lower dimensional space Example: φ: R2R3 where φ(x)=(x12, √2x1x2, x22)

Choosing a Kernel Can start from a desired feature space and try to construct kernel More often one starts from a reasonable kernel and may not analyze the feature space Some kernels are better fit for certain problems, domain knowledge can be helpful Common kernels: Polynomial Gaussian Sigmoidal Application specific

SVM Notes Excellent empirical and theoretical potential Multi-class problems not handled naturally How to choose kernel – main learning parameter Also includes other parameters to be defined (degree of polynomials, variance of Gaussians, etc.) Speed and size: both training and testing, how to handle very large training sets not yet solved MM can lead to overfit due to noise, or problem may not be linearly separable within a reasonable feature space Soft Margin is a common solution, allows slack variables αi constrained to be >= 0 and less than C. The C allows outliers. How to pick C?