SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...

Slides:



Advertisements
Similar presentations
Support Vector Machine
Advertisements

CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
Lecture 9 Support Vector Machines
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines

Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
CHAPTER 10: Linear Discrimination
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
LOGO Classification IV Lecturer: Dr. Bo Yuan
Support Vector Machines
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
Support Vector Machines and Kernel Methods
Support Vector Machines
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.
Support Vector Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Support Vector Machines a.k.a, Whirlwind o’ Vector Algebra Sec. 6.3 SVM Tutorial by C. Burges (on class “resources” page)
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
An Introduction to Support Vector Machine (SVM)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Lecture 14. Outline Support Vector Machine 1. Overview of SVM 2. Problem setting of linear separators 3. Soft Margin Method 4. Lagrange Multiplier Method.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
Introduction to Machine Learning Prof. Nir Ailon Lecture 5: Support Vector Machines (SVM)
LECTURE 20: SUPPORT VECTOR MACHINES PT. 1 April 11, 2016 SDS 293 Machine Learning.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
PREDICT 422: Practical Machine Learning
Support Vector Machine
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Support Vector Machines
Statistical Learning Dong Liu Dept. EEIS, USTC.
SVMs for Document Ranking
Presentation transcript:

SVMs Reprised

Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...

Tides of history Last time Linear classification (briefly) Multi-classes from sets of hyperplanes SVMs: nonlinear data projections Today Vector geometry The SVM objective function Linear programming (briefly) Slack variables

Exercise Given a hyperplane defined by a weight vector What is the equation for points on the surface of the hyperplane? What are the equations for points on the two margins? Give an expression for the distance between a point and the hyperplane (and/or either margin) What is the role of w 0 ?

5 minutes of math... A dot product (inner product) is a projection of one vector onto another When the projection of x onto w is equal to w 0, then x falls exactly onto the w hyperplane w Hyperplane X

5 minutes of math... BTW, are we sure that hyperplane is perpendicular to w ? Why?

5 minutes of math... BTW, are we sure that hyperplane is perpendicular to w ? Why? Consider any two vectors, x 1 and x 2, falling exactly on the hyperplane, then: is perpendicular to any vector in the hyperplane is some vector in the hyperplane

5 minutes of math... Projections on one side of the line have dot products >0... w Hyperplane x

5 minutes of math... Projections on one side of the line have dot products > and on the other, <0 w Hyperplane x

5 minutes of math... What is the distance from any vector x to the hyperplane? w x r =?

5 minutes of math... What is the distance from any vector x to the hyperplane? Write x as a point on plane + offset from plane w Unit vector in direction of w

5 minutes of math... Now:

5 minutes of math... Theorem: The distance, r, from any point x to the hyperplane defined by w and is given by: Lemma: The distance from the origin to the hyperplane is given by: Also: r>0 for points on one side of the hyperplane; r<0 for points on the other

Back to SVMs & margins The margins are parallel to hyperplane, so are defined by same w, plus constant offsets w b b

Back to SVMs & margins The margins are parallel to hyperplane, so are defined by same w, plus constant offsets Want to ensure that all data points are “outside” the margins w b b

Maximizing the margin So now we have a learning criterion function: Pick w to maximize b s.t. all points still satisfy Note: w.l.o.g. can rescale w arbitrarily (why?)

Maximizing the margin So now we have a learning criterion function: Pick w to maximize b s.t. all points still satisfy Note: w.l.o.g. can rescale w arbitrarily (why?) So can formulate full problem as: Minimize: Subject to: But how do you do that? And how does this help?

Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems There are off-the-shelf methods to solve them Actually solving this is way, way beyond the scope of this class Consider it a black box If a solution exists, it will be found & be unique Expensive, but not intractably so

Nonseparable data What if the data isn’t linearly separable? Project into higher dim space (we’ll get there) Allow some “slop” in the system Allow margins to be violated “a little” w

The new “slackful” QP The are “slack variables” Allow margins to be violated a little Still want to minimize margin violations, so add them to QP instance: Minimize: Subject to:

You promised nonlinearity! Where did the nonlinear transform go in all this? Another clever trick With a little algebra (& help from Lagrange multipliers), can rewrite our QP in the form: Maximize: Subject to:

Kernel functions So??? It’s still the same linear system Note, though, that appears in the system only as a dot product:

Kernel functions So??? It’s still the same linear system Note, though, that appears in the system only as a dot product: Can replace with : The inner product is called a “kernel function”

Why are kernel fns cool? The cool trick is that many useful projections can be written as kernel functions in closed form I.e., can work with K() rather than If you know K(x i,x j ) for every (i,j) pair, then you can construct the maximum margin hyperplane between the projected data without ever explicitly doing the projection!

Example kernels Homogeneous degree- k polynomial: Inhomogeneous degree- k polynomial: Gaussian radial basis function: Sigmoidal (neural network):

Side note on kernels What precisely do kernel functions mean? Metric functions take two points and return a (generalized) distance between them What is the equivalent interpretation for kernels? Hint: think about what kernel function replaces in the max margin QP formulation

Side note on kernels Kernel functions are generalized inner products Essentially, give you the cosine of the angle between vectors Recall the law of cosines:

Side note on kernels Replace traditional dot product with “generalized inner product” and get:

Side note on kernels Replace traditional dot product with “generalized inner product” and get: Kernel (essentially) represents: Angle between vectors in the projected, high- dimensional space

Using the classifier Solution of the QP gives back a set of Data points for which are called “support vectors” Turns out that we can write w as

Using the classifier And our classification rule for query pt was: So:

Using the classifier SVM images from lecture notes by S. Dreiseitl Support vectors