1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machines
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines
CHAPTER 10: Linear Discrimination
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
©2012 Paula Matuszek GATE information based on ©2012 Paula Matuszek.
Discriminative and generative methods for bags of features
Support Vector Machine
Support Vector Machines and Kernel Methods
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
SVM Active Learning with Application to Image Retrieval
SVM Support Vectors Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
This week: overview on pattern recognition (related to machine learning)
Support Vector Machine & Image Classification Applications
Copyright © 2001, Andrew W. Moore Support Vector Machines Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE: Support Vector Machines.
1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines 2 (SVMs)
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Linear Document Classifier.
1 CSC 8520 Spring Paula Matuszek Kinds of Machine Learning Machine learning techniques can be grouped into several categories, in several ways: –What.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
1 CSC 8520 Spring Paula Matuszek CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Spring, 2013.
An Introduction to Support Vector Machines (M. Law)
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
1 CMSC 671 Fall 2010 Class #24 – Wednesday, November 24.
1 Support Vector Machines Chapter Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
1 Support Vector Machines. Why SVM? Very popular machine learning technique –Became popular in the late 90s (Vapnik 1995; 1998) –Invented in the late.
Machine Learning Lecture 7: SVM Moshe Koppel Slides adapted from Andrew Moore Copyright © 2001, 2003, Andrew W. Moore.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #23.
1 Support Vector Machines Some slides were borrowed from Andrew Moore’s PowetPoint slides on SVMs. Andrew’s PowerPoint repository is here:
SVMs in a Nutshell.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Support Vector Machines
PREDICT 422: Practical Machine Learning
Support Vector Machines (SVM)
An Introduction to Support Vector Machines
Support Vector Machines
Machine Learning Week 2.
Support Vector Machines
CS 485: Special Topics in Data Mining Jinze Liu
Class #212 – Thursday, November 12
Evaluating Classifiers
Support Vector Machines
SVMs for Document Ranking
Presentation transcript:

1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines (SVMs)

2 CSC 4510, Spring © Paula Matuszek A Motivating Problem People who set up bird feeders want to feed birds. We don’t want to feed squirrels –they eat too fast –they drive the birds away So how do we create a bird feeder that isn’t a squirrel feeder?

3 CSC 4510, Spring © Paula Matuszek Birds vs Squirrels How are birds and squirrels different? Take a moment and write down a couple of features that distinguish birds from squirrels.

4 CSC 4510, Spring © Paula Matuszek Birds vs Squirrels How are birds and squirrels different? Take a moment and write down a couple of features that distinguish birds from squirrels. And now take another and write down how YOU tell a bird from a squirrel.

5 CSC 4510, Spring © Paula Matuszek Possible Features Birds – Squirrels – And what do people actually do?

6 CSC 4510, Spring © Paula Matuszek The Typical Bird Feeder Solutions Squirrels are heavier Squirrels can’t fly

7 CSC 4510, Spring © Paula Matuszek And Then There’s This: backyard-computer-vision-and-the-squirrel-hordes

8 CSC 4510, Spring © Paula Matuszek Knowledge for SVMs We are trying here to emulate our human decision making about whether something is a squirrel. So we will have: –Features –A classification As we saw in our discussion, there are a couple of obvious features, but mostly we decide based on a number of visual cues: size, color, general arrangement of pixels.

9 CSC 4510, Spring © Paula Matuszek Kurt Grandis’ Squirrel Gun The water gun is driven by a system which –uses a camera to watch the bird feeder –detects blobs –determines whether the blob is a squirrel –targets the squirrel –shoots! Mostly in Python, using openCV for the vision. Of interest to us is that decision about whether a blob is a squirrel is made by an SVM.

10 CSC 4510, Spring © Paula Matuszek So What’s an SVM? A Support Vector Machine (SVM) is a classifier –It uses features of instances to decide which class each instance belongs to It is a supervised machine-learning classifier –Training cases are used to calculate parameters for a model which can then be applied to new instances to make a decision It is a binary classifier –it distinguishes between two classes For the squirrel vs bird, Grandis used size, a histogram of pixels, and a measure of texture as the features

11 CSC 4510, Spring © Paula Matuszek Basic Idea Underlying SVMs Find a line, or a plane, or a hyperplane, that separates our classes cleanly. –This is the same concept as we have seen in regression. By finding the greatest margin separating them –This is not the same concept as we have seen in regression. What does it mean?

Borrowed heavily from Andrew tutorials:: Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers denotes +1 denotes -1 How would you classify this data?

Borrowed heavily from Andrew tutorials:: Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers denotes +1 denotes -1 How would you classify this data?

Borrowed heavily from Andrew tutorials:: Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers denotes +1 denotes -1 Any of these would be fine....but which is best?

Borrowed heavily from Andrew tutorials:: Copyright © 2001, 2003, Andrew W. Moore Classifier Margin denotes +1 denotes -1 Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

Borrowed heavily from Andrew tutorials:: Copyright © 2001, 2003, Andrew W. Moore Maximum Margin denotes +1 denotes -1 The maximum margin linear classifier is the linear classifier with the maximum margin.

Borrowed heavily from Andrew tutorials:: Copyright © 2001, 2003, Andrew W. Moore Maximum Margin denotes +1 denotes -1 The maximum margin linear classifier is the linear classifier with the maximum margin. Called Linear Support Vector Machine (SVM)

Borrowed heavily from Andrew tutorials:: Copyright © 2001, 2003, Andrew W. Moore Maximum Margin denotes +1 denotes -1 The maximum margin linear classifier is the linear classifier with the, um, maximum margin. Called Linear Support Vector Machine (SVM) Support Vectors are those datapoints that the margin pushes up against

Borrowed heavily from Andrew tutorials:: Copyright © 2001, 2003, Andrew W. Moore Why Maximum Margin? denotes +1 denotes -1 f ( x, w,b) = sign( w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against 1.If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification. 2.Empirically it works very very well.

Borrowed heavily from Andrew tutorials:: Copyright © 2001, 2003, Andrew W. Moore Maximum Margin Classifier (SVM) Find the linear classifier separates documents of the positive class from those of the negative class has the largest classification margin +1 Compute the classification margin Search the linear classifier with the largest classification margin Classification margin Plus-Plane Minus-Plane Classifier Boundary

21 CSC 4510, Spring © Paula Matuszek Concept Check For which of these could we use a basic linear SVM? –A: Classify the three kinds of iris in the Orange data set? –B: Classify into spam and non-spam? –C: Classify students into likely to pass or not? Which of these is the SVM margin? BA

22 CSC 4510, Spring © Paula Matuszek Messy Data This is all good so far. Suppose our aren’t that neat:

23 CSC 4510, Spring © Paula Matuszek Soft Margins Intuitively, it still looks like we can make a decent separation here. –Can’t make a clean margin –But can almost do so, if we allow some errors We introduce slack variables, which measure the degree of misclassification A soft margin is one which lets us make some errors, in order to get a wider margin Tradeoff between wide margin and classification errors

24 CSC 4510, Spring © Paula Matuszek Only Two Errors, Narrow Margin

25 CSC 4510, Spring © Paula Matuszek Several Errors, Wider Margin

26 CSC 4510, Spring © Paula Matuszek Slack Variables and Cost In order to find a soft margin, we allow slack variables, which measure the degree of misclassification. –Takes into account the distance from the margin as well we the number of misclassified instances We then modify this by a cost (C) for these misclassified instances. –High cost will give relatively narrow margins –Low cost will give broader margins but misclassify more data. How much we want it to cost to misclassify instances depends on our domain -- what we are trying to do

27 CSC 4510, Spring © Paula Matuszek Concept Check Which of these represents a soft margin? AB

28 CSC 4510, Spring © Paula Matuszek Evaluating SVMs As with any classifier, we need to know how well our trained model performs on other data Train on sample data, evaluate on test data (why?) Some things to look at: –classification accuracy: percent correctly classified –confusion matrix –sensitivity and specificity

29 CSC 4510, Spring © Paula Matuszek Confusion Matrix Is it spam?Predicted yesPredicted no Actually yesTrue positives False negatives Actually noFalse positives True negatives Note that “positive” vs “negative” is arbitrary

30 CSC 4510, Spring © Paula Matuszek Specificity and Sensitivity sensitivity: ratio of labeled positives to actual positives –how much spam are we finding? specificity: ratio of labeled negatives to actual negatives –how much “real” are we calling ?

31 CSC 4510, Spring © Paula Matuszek More on Evaluating SVMs Overfitting: very close fit to training data which takes advantage of irrelevant variations in instances –performance on test data will be much lower –may mean that your training sample isn’t representative –in SVMs, may mean that C is too high Is the SVM actually useful? –Compare to the “majority” classifier

32 CSC 4510, Spring © Paula Matuszek Concept Check For binary classifiers A and B, for balanced data: –Which is better: A is 80% accurate, B is 60% accurate –Which is better: A has 90% sensitivity, B has 70% sensitivity –Which is the better classifier: A has 100 % sensitivity, 50% specificity B has 80% sensitivity, 80% specificity Would you use a spam filter that was 80% accurate? Would you use a classifier for who needs major surgery that was 80% accurate? Would you ever use a binary classifier that is 50% accurate?

33 CSC 4510, Spring © Paula Matuszek Orange and SVMs We now know enough to start looking at the Orange SVM widget –create some linearly separable data –sample it –run a linear SVM and look at results

34 CSC 4510, Spring © Paula Matuszek Why SVMs? Focus on the instances nearest the margin is paying more attention to where the differences are critical Can handle very large feature sets effectively In practice has been shown to work well in a variety of domains

35 CSC 4510, Spring © Paula Matuszek But wait, there’s more!

36 CSC 4510, Spring © Paula Matuszek Non-Linearly-Separable Data Suppose we can’t do a good linear separation of our data? As with regression, allowing non-linearity will give us much better modeling of many data sets. In SVMs, we do this by using a kernel. A kernel is a function which maps our data into a higher-order order feature space where we can find a separating hyperplane

37 CSC 4510, Spring © Paula Matuszek Kernels for SVMs As we saw in Orange, we always specify a kernel for an SVM Linear is simplest, but seldom a good match to the data Other common ones are –polynomial –RBF (Gaussian Radial Basis Function)

Borrowed heavily from Andrew tutorials::

Borrowed heavily from Andrew tutorials::

40 CSC 4510, Spring © Paula Matuszek Back to Orange Let’s try some

41 CSC 4510, Spring © Paula Matuszek To be continued next week