Lecture 18. Support Vector Machine (SVM)

Slides:



Advertisements
Similar presentations
CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
Advertisements

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Linear Classifiers/SVMs
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
Pattern Recognition and Machine Learning
Support Vector Machines
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.
Active Learning with Support Vector Machines
Announcements  Project teams should be decided today! Otherwise, you will work alone.  If you have any question or uncertainty about the project, talk.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Machine Learning Week 4 Lecture 1. Hand In Data Is coming online later today. I keep test set with approx test images That will be your real test.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
An Introduction to Support Vector Machine (SVM)
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines Tao Department of computer science University of Illinois.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
SVMs in a Nutshell.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
LECTURE 20: SUPPORT VECTOR MACHINES PT. 1 April 11, 2016 SDS 293 Machine Learning.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
SUPPORT VECTOR MACHINES
Support Vector Machines
CS 9633 Machine Learning Support Vector Machines
Support Vector Machines
PREDICT 422: Practical Machine Learning
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Fun with Hyperplanes: Perceptrons, SVMs, and Friends
Large Margin classifiers
Artificial Intelligence
Lecture 19. Review CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Support Vector Machines
Support Vector Machines (SVM)
CS 4/527: Artificial Intelligence
An Introduction to Support Vector Machines
Pawan Lingras and Cory Butz
Support Vector Machines
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
CS 188: Artificial Intelligence
Large Scale Support Vector Machines
COSC 4335: Other Classification Techniques
Support vector machines
Machine Learning Week 3.
Support Vector Machines
Support vector machines
Support Vector Machines
COSC 4368 Machine Learning Organization
Junheng, Shengming, Yunsheng 10/19/2018
SVMs for Document Ranking
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Support Vector Machines 2
Presentation transcript:

Lecture 18. Support Vector Machine (SVM) CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan

Announcements Pavlos office hours: Wed 4-5 will be covered by Hari. HW7: Due tomorrow midnight HW8: Q2 is optional. In-class competition hosted at Kaggle, top 20% will be awarded +1 point applicable to any HW Milestone #4-5 deadline: Nov 28

Quiz Code quiz: bernie2020

Elections What have we learned? Why the predictions were so off? Modeling errors? Methodology? Data? People?

Support Vector Machine (SVM) Support vector machines (SVM) is a classification method from the 90s that is very popular. SVMs performs well in a many different applications One of the best “out of the box” classifiers (as RF or AdaBoost).

Support Vector Machine (SVM) 1. Maximal margin classifier, a simple and intuitive model that is not useful since it requires that the classes are separable. 2. Support vector classifier is more general and useful classifier, still only allows for linear class boundaries 3. Support vector machine, which is a further extension of the support vector classifier in order to accommodate non-linear class boundaries.

Support Vector Machine (SVM) 1. Maximal margin classifier, a simple and intuitive model that is not useful since it requires that the classes are separable. 2. Support vector classifier is more general and useful classifier, still only allows for linear class boundaries 3. Support vector machine, which is a further extension of the support vector classifier in order to accommodate non-linear class boundaries.

Hyperplane x: data point y: label {-1, 1} w: weight vector w wTx=0

Hyperplane x: data point y: label {-1, 1} w: weight vector wTx=0 x2 w

Hyperplane x: data point y: label {-1, 1} w: weight vector b: bias wTx+b=0

Hyperplane 1 x2 x: data point y: label {-1, 1} w: weight vector b: bias wTx+b>0 w wTx+b<0 wTx+b=0 -1 x1

Maximal Margin Classifier Can choose many w’s x1 x2

Maximal Margin Classifier Given w, we measure the distances to all points x1 x2

Maximal Margin Classifier Find the smallest such distance (the minimum) the margin for this hyperplane. x1 x2

Maximal Margin Classifier Do that for all possible hyperplanes x1 x2

Maximal Margin Classifier The maximal margin hyperplane is the separating hyperplane for which the margin is largest The hyperplane that has the farthest minimum distance to the training observations.

Support vectors In a sense, the maximal margin hyperplane represents the mid-line of the widest “slab” that we can insert between the two classes.

The maximal margin hyperplane only depends on the support vectors , but not on the other observations: a movement to any of the other observations would not affect the separating hyperplane, provided that the observation’s movement does not cause it to cross the boundary set by the margin. The fact that the maximal margin hyperplane depends directly on only a small subset of the observations is an important property that will arise later in this chapter when we discuss the support vector classifier and support vector machines.

Maximal Margin Classifier Equation for hyperplale in p dimensions: If The point lies on one side of the hyperplane Lies on the other side

Maximal Margin Classifier If y={-1,1} then: M represents the margin where and the optimization finds the maximum M

Support Vectors Distance to points is a measure of confidence

Support Vectors What if there is no hyperplane that separates the two classes? x1 x2

Support Vectors What if there is no hyperplane that separates the two classes? x1 x2

Support Vectors Shall we consider a classifier based on a hyperplane that does not perfectly separate the two classes: Greatest robustness to individual measurements Better classification of most of the training observations Misclassify a few training observations in order to do a better job in classifying the remaining observations Support vector classifier, aka soft margin classifier, does this

Support Vectors We consider a classifier based on a hyperplane that does not perfectly separate the two classes: Greatest robustness to individual measurements Better classification of most of the training observations Misclassify a few training observations in order to do a better job in classifying the remaining observations Support vector classifier, aka soft margin classifier, does this

Support Vector Classifier

Support Vector Classifier Simply speaking instead of looking at the maximum of the minimum of the distances we redefine the margin not to be the minimum of the distances but we allow some points to be on the other side of the margins or even the other side of the hyperplane. C, a tuning parameter controls how much we allow. If C=0 then we are back to Maximal Margin Classifier (if it is possible) If C>0 we become more tolerant of violations to the margin. C tells us how many points can be violating the margin.

Support Vector Classifier We determine C using Cross-Validation (as usual) and it is again a tradeoff between bias and variance. For very small C (no mistakes) we fit the training set very well but we will not do well in the testing. If C is too large we gave a many hyperplanes that can do the job.

Support Vector Classifier ε1,...,εn are slack variables. If εi > 0 then the i-th observation is on the wrong side of the margin, and we say that the i-th observation has violated the margin. If εi > 1 then it is on the wrong side of the hyperplane.