SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
Lecture 9 Support Vector Machines
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines

Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
CHAPTER 10: Linear Discrimination
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...
Discriminative and generative methods for bags of features
Support Vector Machines
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
SVMs Concluded; R1. Administrivia HW1 returned today and there was much rejoicing... μ=15.8 σ=8.8 Remember: class is curved You don’t have to run faster.
Support Vector Machines and Kernel Methods
Support Vector Machines
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.
Support Vector Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Support Vector Machines a.k.a, Whirlwind o’ Vector Algebra Sec. 6.3 SVM Tutorial by C. Burges (on class “resources” page)
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Support Vector Machine & Image Classification Applications
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
Support Vector Machine (SVM) Based on Nello Cristianini presentation
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
Introduction to Machine Learning Prof. Nir Ailon Lecture 5: Support Vector Machines (SVM)
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support vector machines
PREDICT 422: Practical Machine Learning
Support Vector Machine
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Statistical Learning Dong Liu Dept. EEIS, USTC.
CSSE463: Image Recognition Day 14
SVMs for Document Ranking
Support Vector Machines 2
Presentation transcript:

SVMs, cont’d Intro to Bayesian learning

Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems There are off-the-shelf methods to solve them Actually solving this is way, way beyond the scope of this class Consider it a black box If a solution exists, it will be found & be unique Expensive, but not intractably so

Nonseparable data What if the data isn’t linearly separable? Project into higher dim space (we’ll get there) Allow some “slop” in the system Allow margins to be violated “a little” w

The new “slackful” QP The are “slack variables” Allow margins to be violated a little Still want to minimize margin violations, so add them to QP instance: Minimize: Subject to:

You promised nonlinearity! Where did the nonlinear transform go in all this? Another clever trick With a little algebra (& help from Lagrange multipliers), can rewrite our QP in the form: Maximize: Subject to:

Kernel functions So??? It’s still the same linear system Note, though, that appears in the system only as a dot product: Can replace with : The inner product is called a “kernel function”

Why are kernel fns cool? The cool trick is that many useful projections can be written as kernel functions in closed form I.e., can work with K() rather than If you know K(X i,X j ) for every (i,j) pair, then you can construct the maximum margin hyperplane between the projected data without ever explicitly doing the projection!

Example kernels Homogeneous degree- k polynomial: Inhomogeneous degree- k polynomial: Gaussian radial basis function: Sigmoidal (neural network):

Side note on kernels What precisely do kernel functions mean? Metric functions take two points and return a (generalized) distance between them What is the equivalent interpretation for kernels? Hint: think about what kernel function replaces in the max margin QP formulation

Side note on kernels Kernel functions are generalized inner products Essentially, give you the cosine of the angle between vectors Recall the law of cosines:

Side note on kernels Replace traditional dot product with “generalized inner product” and get:

Using the classifier Solution of the QP gives back a set of Data points for which are called “support vectors” Turns out that we can write w as

Using the classifier Solution of the QP gives back a set of Data points for which are called “support vectors” Turns out that we can write w as And our classification rule for query pt was:

Using the classifier Solution of the QP gives back a set of Data points for which are called “support vectors” Turns out that we can write w as And our classification rule for query pt was: So:

Using the classifier SVM images from lecture notes by S. Dreiseitl Support vectors

Putting it all together Original (low dimensional) data

Putting it all together Original data matrix Kernel function Kernel matrix

Putting it all together Kernel + orig labels Maximize Subject to: Quadratic Program instance

Putting it all together Support Vector weights Maximize Subject to: Quadratic Program instance QP Solver subroutine

Putting it all together Support Vector weights Hyperplane in

Putting it all together Support Vector weights Final classifier

Putting it all together Final classifier Nonlinear classifier in

Final notes on SVMs Note that only for which actually contribute to final classifier This is why they are called support vectors All the rest of the training data can be discarded Complexity of training (& ability to generalize) based only on amount of training data Not based on dimension of hyperplane space ( ) Good classification performance In practice, SVMs among the strongest classifiers we have Closely related to neural nets, boosting, etc.