Recap Finds the boundary with “maximum margin”

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Computer vision: models, learning and inference Chapter 8 Regression.
Support Vector Machines
Support Vector Machines and Kernel Methods
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Support Vector Machines Kernel Machines
Support Vector Machines
CS 4700: Foundations of Artificial Intelligence
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Lecture 10: Support Vector Machines
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
This week: overview on pattern recognition (related to machine learning)
Support Vector Machine & Image Classification Applications
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
An Introduction to Support Vector Machines (M. Law)
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Support Vector Machines Exercise solutions Ata Kaban The University of Birmingham.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Support vector machines
PREDICT 422: Practical Machine Learning
Support Vector Machine
Support Vector Machines
Recap Finds the boundary with “maximum margin”
Support Vector Machines
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Support Vector Machines
Statistical Learning Dong Liu Dept. EEIS, USTC.
CS 2750: Machine Learning Support Vector Machines
Hyperparameters, bias-variance tradeoff, validation
Support Vector Machines Most of the slides were taken from:
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 15
Support vector machines
CSSE463: Image Recognition Day 15
COSC 4368 Machine Learning Organization
SVMs for Document Ranking
Support Vector Machines 2
Introduction to Machine Learning
Presentation transcript:

Recap Finds the boundary with “maximum margin” Uses “slack variables” to deal with outliers Uses “kernels”, and the “kernel trick”, to solve nonlinear problems.

SVM error function = hinge loss + 1/margin …where hinge loss (summed over datapoints) proportional to the inverse of the margin, 1/m

Slack variables (aka soft margins) slack penalty for using slack

2D 3D What about much more complicated data? - project into high dimensional space, and solve with a linear model - project back to the original space, and the linear boundary becomes non-linear 2D 3D

The Kernel Trick

Slight rearrangement of our model – still equivalent though. Remember matrix notation – this is a “dot product”

…our new feature space. Project into higher dimensional space… x3 x2 x2 x1 x1 …our new feature space. BUT WHERE DO WE GET THIS FROM!?

The Representer Theorem (Kimeldorf and Wahba, 1978) For a linear model, The optimal parameter vector is always a linear combination of the training examples…

The Kernel Trick, PART 1 Substitute this into our model…. Or, if with our hypothetical high dimensional feature space:

The Kernel Trick, PART 2 scalar value Wouldn’t it be nice if we didn’t have to think up the ? And just skip straight to the scalar value we need directly…? ….If we had this, ….our model would look like this.

Kernels When d=2, the implicit feature space is: For example…. The polynomial kernel When d=2, the implicit feature space is: But we never actually calculate it!

- project into high dimensional space, and solve with a linear model - project back to the original space, and the linear boundary becomes non-linear 2D 3D

Polynomial kernel, with d=2

The Polynomial Kernel

The RBF (Gaussian) Kernel

Varying two things at once!

Summary of things…

SVMs versus Neural Networks Started from solid theory Theory led to many extensions (SVMS for text, images, graphs) Almost no parameter tuning Highly efficient to train Single optimum Highly resistant to overfitting Neural Nets Started from bio-inspired heuristics Ended up at theory equivalent to statistics ideas Good performance = lots of parameter tuning Computationally intensive to train Suffers from local optima Prone to overfitting

SVMs, done. Tonight… read chapter 4 while it’s still fresh. Remember, by next week – read chapter 5.

Examples of Parameters obeying the Representer Theorem 10 5 0 5 10

10 p 5 n 0 5 10

n 10 p 5 n 0 5 10

We had before, for an x on the boundary: And we just worked out: p Which gives us the expression for t … n The w and t are both linear functions of the examples.