Lecture 10: Support Vector Machines

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Lecture 9 Support Vector Machines
Classification / Regression Support Vector Machines

CHAPTER 10: Linear Discrimination
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Support vector machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...
Support Vector Machines
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
CES 514 – Data Mining Lecture 8 classification (contd…)
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines Kernel Machines
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machine (SVM) Classification
Support Vector Machines and Kernel Methods
Support Vector Machines
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Support Vector Machine (SVM) Based on Nello Cristianini presentation
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines Exercise solutions Ata Kaban The University of Birmingham.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
PREDICT 422: Practical Machine Learning
Support Vector Machine
Large Margin classifiers
Geometrical intuition behind the dual problem
Support Vector Machines
An Introduction to Support Vector Machines
Kernels Usman Roshan.
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Statistical Learning Dong Liu Dept. EEIS, USTC.
COSC 4335: Other Classification Techniques
Machine Learning Week 3.
Usman Roshan CS 675 Machine Learning
COSC 4368 Machine Learning Organization
Presentation transcript:

Lecture 10: Support Vector Machines CSE 881: Data Mining Lecture 10: Support Vector Machines

Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data

Support Vector Machines One Possible Solution

Support Vector Machines Another possible solution

Support Vector Machines Other possible solutions

Support Vector Machines Which one is better? B1 or B2? How do you define better?

Support Vector Machines Find hyperplane maximizes the margin => B1 is better than B2

SVM vs Perceptron Perceptron: SVM: Minimizes least square error (gradient descent) Maximizes margin

Support Vector Machines

Learning Linear SVM We want to maximize: Which is equivalent to minimizing: But subjected to the following constraints: or This is a constrained optimization problem Solve it using Lagrange multiplier method

Unconstrained Optimization min E(w1,w2) = w12 + w22 w2 w1 E(w1,w2) w1=0, w2=0 minimizes E(w1,w2) Minimum value is E(0,0)=0

Constrained Optimization min E(w1,w2) = w12 + w22 subject to 2w1 + w2 = 4 (equality constraint) E(w1,w2) w2 w1 Dual problem:

Learning Linear SVM Lagrange multiplier: Take derivative w.r.t  and b: Additional constraints: Dual problem:

Learning Linear SVM Bigger picture: Learning algorithm needs to find w and b To solve for w: But:  is zero for points that do not reside on Data points where s are not zero are called support vectors

Example of Linear SVM Support vectors

Learning Linear SVM Bigger picture… Decision boundary depends only on support vectors If you have data set with same support vectors, decision boundary will not change How to classify using SVM once w and b are found? Given a test record, xi

Support Vector Machines What if the problem is not linearly separable?

Support Vector Machines What if the problem is not linearly separable? Introduce slack variables Need to minimize: Subject to: If k is 1 or 2, this leads to same objective function as linear SVM but with different constraints (see textbook)

Nonlinear Support Vector Machines What if decision boundary is not linear?

Nonlinear Support Vector Machines Trick: Transform data into higher dimensional space Decision boundary:

Learning Nonlinear SVM Optimization problem: Which leads to the same set of equations (but involve (x) instead of x)

Learning NonLinear SVM Issues: What type of mapping function  should be used? How to do the computation in high dimensional space? Most computations involve dot product (xi) (xj) Curse of dimensionality?

Learning Nonlinear SVM Kernel Trick: (xi) (xj) = K(xi, xj) K(xi, xj) is a kernel function (expressed in terms of the coordinates in the original space) Examples:

Example of Nonlinear SVM SVM with polynomial degree 2 kernel

Learning Nonlinear SVM Advantages of using kernel: Don’t have to know the mapping function  Computing dot product (xi) (xj) in the original space avoids curse of dimensionality Not all functions can be kernels Must make sure there is a corresponding  in some high-dimensional space Mercer’s theorem (see textbook)