Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Support Vector Machines
Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
CHAPTER 10: Linear Discrimination
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...
Support Vector Machines
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machine (SVM) Classification
Support Vector Machines and Kernel Methods
Support Vector Machines
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
SVM Support Vectors Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
An Introduction to Support Vector Machines (M. Law)
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
An Introduction to Support Vector Machine (SVM)
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support vector machines
PREDICT 422: Practical Machine Learning
Geometrical intuition behind the dual problem
Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
CSSE463: Image Recognition Day 14
Support vector machines
Support vector machines
Support vector machines
Presentation transcript:

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic

Intro AI2 Today’s Lecture Goals Support Vector Machine (SVM) Classification –Another algorithm for linear separating hyperplanes A Good text on SVMs: Bernhard Schölkopf and Alex Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002

Greg GrudicIntro AI3 Support Vector Machine (SVM) Classification Classification as a problem of finding optimal (canonical) linear hyperplanes. Optimal Linear Separating Hyperplanes: –In Input Space –In Kernel Space Can be non-linear

Greg GrudicIntro AI4 Linear Separating Hyper-Planes How many lines can separate these points? NO! Which line should we use?

Greg GrudicIntro AI5 Initial Assumption: Linearly Separable Data

Greg GrudicIntro AI6 Linear Separating Hyper-Planes

Greg GrudicIntro AI7 Linear Separating Hyper-Planes Given data: Finding a separating hyperplane can be posed as a constraint satisfaction problem (CSP): Or, equivalently: If data is linearly separable, there are an infinite number of hyperplanes that satisfy this CSP

Greg GrudicIntro AI8 The Margin of a Classifier Take any hyper-plane (P0) that separates the data Put a parallel hyper-plane (P1) on a point in class 1 closest to P0 Put a second parallel hyper-plane (P2) on a point in class -1 closest to P0 The margin (M) is the perpendicular distance between P1 and P2

Greg GrudicIntro AI9 Calculating the Margin of a Classifier P0 P2 P1 P0: Any separating hyperplane P1: Parallel to P0, passing through closest point in one class P2: Parallel to P0, passing through point closest to the opposite class Margin (M): distance measured along a line perpendicular to P1 and P2

Model parameters must be chosen such that, for on P1 and for on P2: SVM Constraints on the Model Parameters Greg GrudicIntro AI10 For any P0, these constraints are always attainable. Given the above, then the linear separating boundary lies half way between P1 and P2 and is given by: Resulting Classifier:

Remember: signed distance from a point to a hyperplane: Greg GrudicIntro AI11 Hyperplane define by:

Calculating the Margin (1) Greg GrudicIntro AI12

Calculating the Margin (2) Greg GrudicIntro AI13 Take absolute value to get the unsigned margin: Signed Distance

Greg GrudicIntro AI14 Different P0’s have Different Margins P0 P2 P1 P0: Any separating hyperplane P1: Parallel to P0, passing through closest point in one class P2: Parallel to P0, passing through point closest to the opposite class Margin (M): distance measured along a line perpendicular to P1 and P2

Greg GrudicIntro AI15 Different P0’s have Different Margins P0 P2 P1 P0: Any separating hyperplane P1: Parallel to P0, passing through closest point in one class P2: Parallel to P0, passing through point closest to the opposite class Margin (M): distance measured along a line perpendicular to P1 and P2

Greg GrudicIntro AI16 Different P0’s have Different Margins P0 P2 P1 P0: Any separating hyperplane P1: Parallel to P0, passing through closest point in one class P2: Parallel to P0, passing through point closest to the opposite class Margin (M): distance measured along a line perpendicular to P1 and P2

Greg GrudicIntro AI17 How Do SVMs Choose the Optimal Separating Hyperplane (boundary)? P2 P1 Find the that maximizes the margin! Margin (M): distance measured along a line perpendicular to P1 and P2

Greg GrudicIntro AI18 SVM: Constraint Optimization Problem Given data: Minimize subject to: The Lagrange Function Formulation is used to solve this Minimization Problem

Greg Grudic Intro AI 19 The Lagrange Function Formulation For every constraint we introduce a Lagrange Multiplier: The Lagrangian is then defined by: Where - the primal variables are - the dual variables are Goal: Minimize Lagrangian w.r.t. primal variables, and Maximize w.r.t. dual variables

Greg GrudicIntro AI20 Derivation of the Dual Problem At the saddle point (extremum w.r.t. primal) This give the conditions Substitute into to get the dual problem

Greg GrudicIntro AI21 Using the Lagrange Function Formulation, we get the Dual Problem Maximize Subject to

Properties of the Dual Problem Solving the Dual gives a solution to the original constraint optimization problem For SVMs, the Dual problem is a Quadratic Optimization Problem which has a globally optimal solution Gives insights into the NON-Linear formulation for SVMs Greg GrudicIntro AI22

Greg GrudicIntro AI23 Support Vector Expansion (1) OR is also computed from the optimal dual variables

Greg GrudicIntro AI24 Support Vector Expansion (2) OR Substitute

Greg GrudicIntro AI25 What are the Support Vectors? Maximized Margin

Greg GrudicIntro AI26 Why do we want a model with only a few SVs? Leaving out an example that does not become an SV gives the same solution! Theorem (Vapnik and Chervonenkis, 1974): Let be the number of SVs obtained by training on N examples randomly drawn from P(X,Y), and E be an expectation. Then

Greg GrudicIntro AI27 What Happens When Data is Not Separable: Soft Margin SVM Add a Slack Variable

Greg GrudicIntro AI28 Soft Margin SVM: Constraint Optimization Problem Given data: Minimize subject to:

Greg GrudicIntro AI29 Dual Problem (Non-separable data) Maximize Subject to

Greg GrudicIntro AI30 Same Decision Boundary

Greg GrudicIntro AI31 Mapping into Nonlinear Space Goal: Data is linearly separable (or almost) in the nonlinear space.

Greg GrudicIntro AI32 Nonlinear SVMs KEY IDEA: Note that both the decision boundary and dual optimization formulation use dot products in input space only!

Greg GrudicIntro AI33 Kernel Trick Replace with Can use the same algorithms in nonlinear kernel space! Inner Product

Greg GrudicIntro AI34 Nonlinear SVMs Maximize: Boundary:

Greg GrudicIntro AI35 Need Mercer Kernels

Greg GrudicIntro AI36 Gram (Kernel) Matrix Training Data: Properties: Positive Definite Matrix Symmetric Positive on diagonal N by N

Greg GrudicIntro AI37 Commonly Used Mercer Kernels Polynomial Sigmoid Gaussian

Why these kernels? There are infinitely many kernels –The best kernel is data set dependent –We can only know which kernels are good by trying them and estimating error rates on future data Definition: a universal approximator is a mapping that can arbitrarily well model any surface (i.e. many to one mapping) Motivation for the most commonly used kernels –Polynomials (given enough terms) are universal approximators However, Polynomial Kernels are not universal approximators because they cannot represent all polynomial interactions –Sigmoid functions (given enough training examples) are universal approximators –Gaussian Kernels (given enough training examples) are universal approximators –These kernels have shown to produce good models in practice Greg GrudicIntro AI38

Greg GrudicIntro AI39 Picking a Model (A Kernel for SVMs)? How do you pick the Kernels? –Kernel parameters These are called learning parameters or hyperparamters –Two approaches choosing learning paramters Bayesian –Learning parameters must maximize probability of correct classification on future data based on prior biases Frequentist –Use the training data to learn the model parameters –Use validation data to pick the best hyperparameters. More on learning parameter selection later

Greg GrudicIntro AI40

Greg GrudicIntro AI41

Greg GrudicIntro AI42

Some SVM Software LIBSVM – SVM Light – TinySVM – WEKA – –Has many ML algorithm implementations in JAVA Greg GrudicIntro AI43

Greg GrudicIntro AI44 MNIST: A SVM Success Story Handwritten character benchmark –60,000 training and 10,0000 testing –Dimension d = 28 x 28

Greg GrudicIntro AI45 Results on Test Data SVM used a polynomial kernel of degree 9.

Greg GrudicIntro AI46 SVM (Kernel) Model Structure