LECTURE 20: SUPPORT VECTOR MACHINES PT. 1 April 11, 2016 SDS 293 Machine Learning.

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Classification / Regression Support Vector Machines
Linear Classifiers/SVMs
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
LOGO Classification IV Lecturer: Dr. Bo Yuan
SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...
Support Vector Machine
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Announcements  Project teams should be decided today! Otherwise, you will work alone.  If you have any question or uncertainty about the project, talk.
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Support Vector Machines a.k.a, Whirlwind o’ Vector Algebra Sec. 6.3 SVM Tutorial by C. Burges (on class “resources” page)
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Outline Separating Hyperplanes – Separable Case
SVM by Sequential Minimal Optimization (SMO)
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
SVMs in a Nutshell.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
LECTURE 17: BEYOND LINEARITY PT. 2 March 30, 2016 SDS 293 Machine Learning.
LECTURE 09: CLASSIFICATION WRAP-UP February 24, 2016 SDS 293 Machine Learning.
LECTURE 21: SUPPORT VECTOR MACHINES PT. 2 April 13, 2016 SDS 293 Machine Learning.
Roughly overview of Support vector machines Reference: 1.Support vector machines and machine learning on documents. Christopher D. Manning, Prabhakar Raghavan.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Support Vector Machines
PREDICT 422: Practical Machine Learning
Support Vector Machines (SVM)
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Machine Learning Week 3.
Support Vector Machines
Support Vector Machines and Kernels
COSC 4368 Machine Learning Organization
Presentation transcript:

LECTURE 20: SUPPORT VECTOR MACHINES PT. 1 April 11, 2016 SDS 293 Machine Learning

Announcements

Final project deliverables FP2: Dataset & Initial Model - What I need: evidence that you have data, and that you’ve started thinking about how to model it - Due APRIL 15th by 4:59pm FP3: Refined Model - What I need: I will give you feedback on FP2; you’ll either want to revise your model or defend your original - Due APRIL 22nd by 4:59pm FP4: Presentation - What I need: your poster, you to show up ready to talk about your project - Posters due during class on April 27th - Reception APRIL 27th 6pm to 8pm FP5: Write-up (2 pages) - MAY 6th by 4:59pm

Outline Maximal margin classifier Support vector classification - 2 classes, linear boundaries - 2 classes, nonlinear boundaries Multiple classes Comparison to other methods Lab

Activity: name game

Maximal margin classifier Claim: if a separating hyperplane exists, there are infinitely many of them (why?) Big idea: pick the one that gives you the widest margin (why?) Bigger margin = more confident

Discussion Question: what’s wrong with this approach? Answer: sometimes the margin is tiny (and therefore prone to overfitting), and other times the data there is no hyperplane that perfectly divides the data

>11 ≤11 Support vector classifier Big idea: might prefer to sacrifice a few if it enables us to perform better on the rest Can allow points to cross the margin, or even be completely misclassified

Support vector classifier (math) Goal: maximize the margin M Subject to the following constraints: distance from the i th obs. to the hyperplane “slack variables”no one gets rewarded for being extra far from the margin We only have so much “slack” to allocate across all the observations

>11 ≤11 Support vectors Decision rule is based only on the support vectors => SVC is robust to strange behavior far from the hyperplane!

A more general formulation Goal: maximize the margin M Subject to the following constraints: Fun fact: the solution to this maximization can be found using the inner products of the observations distance from the i th obs. to the hyperplane

Dot product = measure of similarity The dot product of 2 (same-length) vectors: Geometric Algebraic

Only nonzero at support vectors A more general formulation We can rewrite the linear support vector classifier as: The dot product is just one way to measure the similarity In general, we call any such similarity measure a kernel * *which is why SVMs and related measures are often referred to as “kernel methods”

Other kernels We’ve seen a linear kernel (i.e. the classification boundary is linear in the features)

Other kernels We could also have a polynomial kernel

Other kernels Or even a radial kernel

Application: Heart dataset

Goal: predict whether an individual has heart disease on the basis of 13 variables: Age, Sex, Chol, etc. 297 subjects, randomly split into 207 training and 90 test observations Problem: no good way to plot in 13 dimensions Solution: ROC curve

ROC curve: LDA vs. SVM, linear kernel

ROC curve: SVM, linear vs. radial kernel

Lab: SVMs To do today’s lab in R: e1071, ROCR To do today’s lab in python: Instructions and code: Full version can be found beginning on p. 359 of ISLR

Coming up Multiclass support vector machines