Support Vector Machines

Slides:

Advertisements

Similar presentations

Support Vector Machine & Its Applications

Advertisements

Support Vector Machines

Lecture 9 Support Vector Machines

ECG Signal processing (2)

Support Vector Machine & Its Applications Abhishek Sharma Dept. of EEE BIT Mesra Aug 16, 2010 Course: Neural Network Professor: Dr. B.M. Karan Semester.

Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

An Introduction of Support Vector Machine

1 Support Vector Machines Some slides were borrowed from Andrew Moore’s PowetPoint slides on SVMs. Andrew’s PowerPoint repository is here:

LOGO Classification IV Lecturer: Dr. Bo Yuan

1 CSC 463 Fall 2010 Dr. Adam P. Anthony Class #27.

Support Vector Machine

Support Vector Machines Kernel Machines

Copyright © 2004, Andrew W. Moore Naïve Bayes Classifiers Andrew W. Moore Professor School of Computer Science Carnegie Mellon University

Support Vector Machines

Support Vector Machines

Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.

Support Vector Machine & Image Classification Applications

Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © 2001, Andrew W. Moore Support Vector Machines Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University.

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu.

Introduction to SVMs. SVMs Geometric –Maximizing Margin Kernel Methods –Making nonlinear decision boundaries linear –Efficiently! Capacity –Structural.

Data Mining Volinsky - Columbia University Topic 9: Advanced Classification Neural Networks Support Vector Machines 1 Credits: Shawndra Hill Andrew.

INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE: Support Vector Machines.

1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines 2 (SVMs)

SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.

Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,

Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.

1 CMSC 671 Fall 2010 Class #24 – Wednesday, November 24.

1 Support Vector Machines Chapter Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School.

1 Support Vector Machines. Why SVM? Very popular machine learning technique –Became popular in the late 90s (Vapnik 1995; 1998) –Invented in the late.

1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines (SVMs)

Machine Learning Lecture 7: SVM Moshe Koppel Slides adapted from Andrew Moore Copyright © 2001, 2003, Andrew W. Moore.

Sep 10th, 2001Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie.

Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu.

Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.

Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Nov 30th, 2001Copyright © 2001, Andrew W. Moore PAC-learning Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University.

Oct 29th, 2001Copyright © 2001, Andrew W. Moore Bayes Net Structure Learning Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon.

1 Support Vector Machines Some slides were borrowed from Andrew Moore’s PowetPoint slides on SVMs. Andrew’s PowerPoint repository is here:

SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.

Support Vector Machines Louis Oliphant Cs540 section 2.

Nov 20th, 2001Copyright © 2001, Andrew W. Moore VC-dimension for characterizing classifiers Andrew W. Moore Associate Professor School of Computer Science.

A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.

Support Vector Machines Chapter 18.9 and the paper “Support vector machines” by M. Hearst, ed., 1998 Acknowledgments: These slides combine and modify ones.

Support Vector Machine & Its Applications. Overview Intro. to Support Vector Machines (SVM) Properties of SVM Applications  Gene Expression Data Classification.

Support Vector Machine Slides from Andrew Moore and Mingyue Tan.

Support Vector Machines

Support Vector Machines

Markov Systems, Markov Decision Processes, and Dynamic Programming

Introduction to Support Vector Machines

Support Vector Machines

Introduction to SVMs.

Geometrical intuition behind the dual problem

Support Vector Machines

An Introduction to Support Vector Machines

Support Vector Machines

Machine Learning Week 2.

Support Vector Machines

Support Vector Machine

Support Vector Machine & Its Applications

Introduction to Support Vector Machines

CS 485: Special Topics in Data Mining Jinze Liu

Class #212 – Thursday, November 12

Support Vector Machines

SVMs for Document Ranking

K-means and Hierarchical Clustering

Presentation transcript:

Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received. Copyright © 2001, 2003, Andrew W. Moore Nov 23rd, 2001

Linear Classifiers f a yest x f(x,w,b) = sign(w. x - b) denotes +1 How would you classify this data? Copyright © 2001, 2003, Andrew W. Moore

Linear Classifiers f a yest x f(x,w,b) = sign(w. x - b) denotes +1 How would you classify this data? Copyright © 2001, 2003, Andrew W. Moore

Linear Classifiers f a yest x f(x,w,b) = sign(w. x - b) denotes +1 How would you classify this data? Copyright © 2001, 2003, Andrew W. Moore

Linear Classifiers f a yest x f(x,w,b) = sign(w. x - b) denotes +1 How would you classify this data? Copyright © 2001, 2003, Andrew W. Moore

Linear Classifiers f a yest x f(x,w,b) = sign(w. x - b) denotes +1 Any of these would be fine.. ..but which is best? Copyright © 2001, 2003, Andrew W. Moore

Classifier Margin f a yest x f(x,w,b) = sign(w. x - b) denotes +1 denotes -1 Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint. Copyright © 2001, 2003, Andrew W. Moore

Maximum Margin f a yest x f(x,w,b) = sign(w. x - b) denotes +1 denotes -1 The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM Copyright © 2001, 2003, Andrew W. Moore

Maximum Margin f a yest x f(x,w,b) = sign(w. x - b) denotes +1 denotes -1 The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against Linear SVM Copyright © 2001, 2003, Andrew W. Moore

Why Maximum Margin? Intuitively this feels safest. If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification. LOOCV is easy since the model is immune to removal of any non-support-vector datapoints. There’s some theory (using VC dimension) that is related to (but not the same as) the proposition that this is a good thing. Empirically it works very very well. f(x,w,b) = sign(w. x - b) denotes +1 denotes -1 The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against Copyright © 2001, 2003, Andrew W. Moore

Specifying a line and margin Plus-Plane “Predict Class = +1” zone Classifier Boundary Minus-Plane “Predict Class = -1” zone How do we represent this mathematically? …in m input dimensions? Copyright © 2001, 2003, Andrew W. Moore

Specifying a line and margin Plus-Plane “Predict Class = +1” zone Classifier Boundary Minus-Plane “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 Plus-plane = { x : w . x + b = +1 } Minus-plane = { x : w . x + b = -1 } Classify as.. +1 if w . x + b >= 1 -1 w . x + b <= -1 Universe explodes -1 < w . x + b < 1 Copyright © 2001, 2003, Andrew W. Moore

Computing the margin width M = Margin Width “Predict Class = +1” zone How do we compute M in terms of w and b? “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 Plus-plane = { x : w . x + b = +1 } Minus-plane = { x : w . x + b = -1 } Claim: The vector w is perpendicular to the Plus Plane. Why? Copyright © 2001, 2003, Andrew W. Moore

Computing the margin width M = Margin Width “Predict Class = +1” zone How do we compute M in terms of w and b? “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 Plus-plane = { x : w . x + b = +1 } Minus-plane = { x : w . x + b = -1 } Claim: The vector w is perpendicular to the Plus Plane. Why? Let u and v be two vectors on the Plus Plane. What is w . ( u – v ) ? And so of course the vector w is also perpendicular to the Minus Plane Copyright © 2001, 2003, Andrew W. Moore

Computing the margin width x+ M = Margin Width “Predict Class = +1” zone How do we compute M in terms of w and b? x- “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 Plus-plane = { x : w . x + b = +1 } Minus-plane = { x : w . x + b = -1 } The vector w is perpendicular to the Plus Plane Let x- be any point on the minus plane Let x+ be the closest plus-plane-point to x-. Any location in m: not necessarily a datapoint Any location in Rm: not necessarily a datapoint Copyright © 2001, 2003, Andrew W. Moore

Computing the margin width x+ M = Margin Width = “Predict Class = +1” zone x- “Predict Class = -1” zone wx+b=1 M = |x+ - x- | =| l w |= wx+b=0 wx+b=-1 What we know: w . x+ + b = +1 w . x- + b = -1 x+ = x- + l w |x+ - x- | = M Copyright © 2001, 2003, Andrew W. Moore

Learning the Maximum Margin Classifier Given guess of w , b we can Compute whether all data points are in the correct half-planes Compute the margin width Assume R datapoints, each (xk,yk) where yk = +/- 1 “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 What should our quadratic optimization criterion be? How many constraints will we have? What should they be? Copyright © 2001, 2003, Andrew W. Moore

Learning the Maximum Margin Classifier Given guess of w , b we can Compute whether all data points are in the correct half-planes Compute the margin width Assume R datapoints, each (xk,yk) where yk = +/- 1 “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 What should our quadratic optimization criterion be? Minimize w.w How many constraints will we have? R What should they be? w . xk + b >= 1 if yk = 1 w . xk + b <= -1 if yk = -1 Copyright © 2001, 2003, Andrew W. Moore

What You Should Know Linear SVMs The definition of a maximum margin classifier What QP can do for you (but, for this class, you don’t need to know how it does it) How Maximum Margin can be turned into a QP problem How we deal with noisy (non-separable) data How we permit non-linear boundaries How SVM Kernel functions permit us to pretend we’re working with ultra-high-dimensional basis-function terms Copyright © 2001, 2003, Andrew W. Moore