Recitation 6: Kernel SVM

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machines
Lecture 9 Support Vector Machines
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
An Introduction of Support Vector Machine
Support Vector Machines
Support vector machine
Support Vector Machines
Support Vector Machine
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines and Kernel Methods
Support Vector Machines
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
An Introduction to Support Vector Machines Martin Law.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
CSE 4705 Artificial Intelligence
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
SUPPORT VECTOR MACHINE
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
Support Vector Machines Part 2. Recap of SVM algorithm Given training set S = {(x 1, y 1 ), (x 2, y 2 ),..., (x m, y m ) | (x i, y i )   n  {+1, -1}
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
An Introduction of Support Vector Machine Courtesy of Jinwei Gu.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Support vector machines
Support Vector Machines
PREDICT 422: Practical Machine Learning
ECE 5424: Introduction to Machine Learning
Support Vector Machines
Geometrical intuition behind the dual problem
Support Vector Machines
Lecture 19. SVM (III): Kernel Formulation
An Introduction to Support Vector Machines
An Introduction to Support Vector Machines
Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Statistical Learning Dong Liu Dept. EEIS, USTC.
CS 2750: Machine Learning Support Vector Machines
Support Vector Machines Most of the slides were taken from:
Support Vector Machines
Support vector machines
Lecture 18. SVM (II): Non-separable Cases
Support vector machines
SVMs for Document Ranking
Presentation transcript:

Recitation 6: Kernel SVM SVM Revision. The Kernel Trick. Reproducing Kernels. Examples. Main Source: F2009 10-701 course taught by Carlos Guestrin 1/11/2019 Recitation 6: Kernels

SVM Primal Find maximum margin hyper-plane Hard Margin 1/11/2019 Recitation 6: Kernels

SVM Primal Find maximum margin hyper-plane Soft Margin 1/11/2019 Recitation 6: Kernels

SVM Dual Find maximum margin hyper-plane Dual for the hard margin SVM 1/11/2019 Recitation 6: Kernels

SVM Dual Find maximum margin hyper-plane Dual for the hard margin SVM Notice here that the points for which alpha is 0 are not going to change w. Substituting α for w 1/11/2019 Recitation 6: Kernels

SVM Dual Find maximum margin hyper-plane Dual for the hard margin SVM The constraints are active for the support vectors 1/11/2019 Recitation 6: Kernels

SVM Dual Find maximum margin hyper-plane Dual for the hard margin SVM 1/11/2019 Recitation 6: Kernels

SVM – Computing w Find maximum margin hyper-plane Dual for the hard margin SVM solve, get αi 1/11/2019 Recitation 6: Kernels

SVM – Computing w Find maximum margin hyper-plane Dual for the soft margin SVM solve, get αi only difference from the separable case The intuition: in the separable case, if a constraint was violated that set the corresponding alpha to infinity. Here, alpha is caped at C. If we were to only be interested in linear classifiers, we’d be basically done. 1/11/2019 Recitation 6: Kernels

SVM – the feature map Find maximum margin hyper-plane But data is not linearly separable  feature map features We’ll apply a transformation to a high dimensional space where the data is linearly separable inputs 1/11/2019 Recitation 6: Kernels

SVM – the feature map Find maximum margin hyper-plane But data is not linearly separable  We obtain a linear separator in the feature space. feature map features inputs !! is expensive to compute! 1/11/2019 Recitation 6: Kernels

Introducing the kernel The dual formulation no longer depends on w, only on a dot product! But we don’t have to! We obtain a linear separator in the feature space. What we need is the dot product: !! Let’s call this a kernel 2-variable function can be written as a dot product is expensive to compute! 1/11/2019 Recitation 6: Kernels

Kernel SVM This is the famous ‘kernel trick’. The dual formulation no longer depends on w, only on a dot product! closed form This is the famous ‘kernel trick’. never compute the feature map learn using the closed form K constant time for HD dot products 1/11/2019 Recitation 6: Kernels

Kernel SVM –Run time What happens when we need to classify some x0? Recall that w depends on α Our classifier for x0 uses w Shouldn’t the fact the we apparently need w cause a problem? 1/11/2019 Recitation 6: Kernels

Kernel SVM –Run time What happens when we need to classify some x0? Recall that w depends on α Our classifier for x0 uses w It would, if we actually needed it Who needs w when we’ve got dot products? 1/11/2019 Recitation 6: Kernels

Kernel SVM Recap Pick kernel Solve the optimization to get α Classify as It would, if we actually needed it Compute b using the support vectors 1/11/2019 Recitation 6: Kernels

Other uses of Kernels in ML Logistic Regression http://books.nips.cc/papers/files/nips14/AA13.pdf Multiple Kernel Boosting http://siam.omnibooksonline.com/2011datamining/data/papers/146.pdf Trees and Kernels http://users.cecs.anu.edu.au/~williams/papers/P175.pdf Conditional Mean Embeddings http://arxiv.org/abs/1205.4656 1/11/2019 Recitation 6: Kernels

More on Kernels Gram Matrix Reproducing Kernels of a set of vectors x1 … xn in the inner product space defined by the kernel K Reproducing Kernels Point evaluation function for a Hilbert sp. of functions Reproducing property 1/11/2019 Recitation 6: Kernels

SVM Pop Quiz What’s the maximum number of Support Vectors for a linear classification problem? Hint: it’s related to a concept you’ve recently studied 1-d case 2-d case 1/11/2019 Recitation 6: Kernels

SVM Pop Quiz What’s the worst case number of Support Vectors for a [linear] classification problem? Hint: it’s related to a concept you’ve recently studied 1-d case 2-d case A: it’s the same as the VC dimension of the classifier. Because we can’t have these as support vectors in 2D 1/11/2019 Recitation 6: Kernels

K-SVM Pop Quiz Here’s the result of training different kernels on this dataset Linear Quadratic Polynomial RBF What happens when we translate the points up by a large constant T on the vertical axis? 1/11/2019 Recitation 6: Kernels

K-SVM Pop Quiz Here’s the result of training different kernels on this dataset Linear Quadratic Polynomial RBF What happens when we translate the points up by a large constant T on the vertical axis? the bound depends more on the y value, therefore the bound becomes more arched the value of the kernel is the same for each pair of points, so the bound retains position relative to points the bound retains relative position to points - it is shifted by 10 units 1/11/2019 Recitation 6: Kernels

Thanks! Reminder: midterm is on Monday, feel free to ask questions about problems given in previous years 1/11/2019 Recitation 6: Kernels