A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Support Vector Machine & Its Applications
Advertisements

Introduction to Support Vector Machines (SVM)
Lecture 9 Support Vector Machines
ECG Signal processing (2)
Support Vector Machine & Its Applications Abhishek Sharma Dept. of EEE BIT Mesra Aug 16, 2010 Course: Neural Network Professor: Dr. B.M. Karan Semester.
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine

An Introduction of Support Vector Machine
Support Vector Machines
1 Support Vector Machines Some slides were borrowed from Andrew Moore’s PowetPoint slides on SVMs. Andrew’s PowerPoint repository is here:
Machine learning continued Image source:
LOGO Classification IV Lecturer: Dr. Bo Yuan
Support Vector Machine
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machines
CS 4700: Foundations of Artificial Intelligence
Support Vector Machines
Support Vector Machines
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
Copyright © 2001, Andrew W. Moore Support Vector Machines Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu.
Introduction to SVMs. SVMs Geometric –Maximizing Margin Kernel Methods –Making nonlinear decision boundaries linear –Efficiently! Capacity –Structural.
Data Mining Volinsky - Columbia University Topic 9: Advanced Classification Neural Networks Support Vector Machines 1 Credits: Shawndra Hill Andrew.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines 2 (SVMs)
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Linear Document Classifier.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Support Vector Machine PNU Artificial Intelligence Lab. Kim, Minho.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
1 CMSC 671 Fall 2010 Class #24 – Wednesday, November 24.
1 Support Vector Machines Chapter Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School.
1 Support Vector Machines. Why SVM? Very popular machine learning technique –Became popular in the late 90s (Vapnik 1995; 1998) –Invented in the late.
1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines (SVMs)
Machine Learning Lecture 7: SVM Moshe Koppel Slides adapted from Andrew Moore Copyright © 2001, 2003, Andrew W. Moore.
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
SVM – Support Vector Machines Presented By: Bella Specktor.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
1 Support Vector Machines Some slides were borrowed from Andrew Moore’s PowetPoint slides on SVMs. Andrew’s PowerPoint repository is here:
SVMs in a Nutshell.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
Support Vector Machines Louis Oliphant Cs540 section 2.
Support Vector Machines Chapter 18.9 and the paper “Support vector machines” by M. Hearst, ed., 1998 Acknowledgments: These slides combine and modify ones.
Support Vector Machine & Its Applications. Overview Intro. to Support Vector Machines (SVM) Properties of SVM Applications  Gene Expression Data Classification.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Support Vector Machines
PREDICT 422: Practical Machine Learning
Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Support Vector Machines
Machine Learning Week 2.
Support Vector Machines
CS 485: Special Topics in Data Mining Jinze Liu
Class #212 – Thursday, November 12
Support Vector Machines
Support Vector Machines
Presentation transcript:

A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University

Overview A new, powerful method for 2-class classification –Original idea: Vapnik, 1965 for linear classifiers –SVM, Cortes and Vapnik, 1995 –Became very hot since 2001 Better generalization (less overfitting) Can do linearly unspeakable classification with global optimal Key ideas –Use kernel function to transform low dimensional training samples to higher dim (for linear separability problem) –Use quadratic programming (QP) to find the best classifier boundary hyperplane (for global optima and)

Support Vector Machines: Slide 3 Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?

Support Vector Machines: Slide 4 Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?

Support Vector Machines: Slide 5 Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?

Support Vector Machines: Slide 6 Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?

Support Vector Machines: Slide 7 Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) Any of these would be fine....but which is best?

Support Vector Machines: Slide 8 Classifier Margin f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

Support Vector Machines: Slide 9 Maximum Margin f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM

Support Vector Machines: Slide 10 Maximum Margin f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against Linear SVM

Support Vector Machines: Slide 11 Why Maximum Margin? denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against 1.Intuitively this feels safest. 2.If we’ve made a small error in the location of the boundary this gives us least chance of causing a misclassification. 3.It also helps generalization 4.There’s some theory that this is a good thing. 5.Empirically it works very very well.

Support Vector Machines: Slide 12 Copyright © 2001, 2003, Andrew W. Moore Computing the margin width Plus-plane = { x : w. x + b = +1 } Minus-plane = { x : w. x + b = -1 } M= “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width How do we compute M in terms of w and b?

Support Vector Machines: Slide 13 Learning the Maximum Margin Classifier Given a guess of w and b we can Compute whether all data points in the correct half-planes Compute the width of the margin So now we just need to write a program to search the space of w’s and b’s to find the widest margin that matches all the datapoints. How? Gradient descent? Simulated Annealing? “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width = x-x- x+x+

Support Vector Machines: Slide 14 Learning via Quadratic Programming QP is a well-studied class of optimization algorithms to maximize a quadratic function of some real-valued variables subject to linear constraints. Minimize both w  w (to maximize M) and misclassification error

Support Vector Machines: Slide 15 Quadratic Programming Find And subject to n additional linear inequality constraints e additional linear equality constraints Quadratic criterion Subject to

Support Vector Machines: Slide 16 Learning Maximum Margin with Noise Given guess of w, b we can Compute sum of distances of points to their correct zones Compute the margin width Assume R datapoints, each (x k,y k ) where y k = +/- 1 wx+b=1 wx+b=0 wx+b=-1 M = What should our quadratic optimization criterion be? How many constraints will we have? What should they be?

Support Vector Machines: Slide 17 Learning Maximum Margin with Noise Given guess of w, b we can Compute sum of distances of points to their correct zones Compute the margin width Assume R datapoints, each (x k,y k ) where y k = +/- 1 wx+b=1 wx+b=0 wx+b=-1 M = What should our quadratic optimization criterion be? Minimize How many constraints will we have? 2R What should they be? w. x k + b >= 1-  k if y k = 1 w. x k + b <= -1+  k if y k = -1  k >= 0 for all k 77  11 22  k = distance of error points to their correct place

Support Vector Machines: Slide 18 Suppose we’re in 1-dimension What would SVMs do with this data? x=0 From LSVM to general SVM

Support Vector Machines: Slide 19 Not a big surprise Positive “plane” Negative “plane” x=0

Support Vector Machines: Slide 20 Harder 1-dimensional dataset Points are not linearly separable. What can we do now? x=0

Support Vector Machines: Slide 21 Harder 1-dimensional dataset Transform the data points from 1-dim to 2-dim by some nonlinear basis function (called Kernel functions) These points sometimes are called feature vectors. x=0

Support Vector Machines: Slide 22 Harder 1-dimensional dataset x=0 These points are linearly separable now! Boundary can be found by QP

Support Vector Machines: Slide 23 Common SVM basis functions z k = (polynomial terms of x k of degree 1 to q ) z k = (radial basis functions of x k ) z k = (sigmoid functions of x k )

Support Vector Machines: Slide 24 Doing multi-class classification SVMs can only handle two-class outputs (i.e. a categorical output variable with arity 2). What can be done? Answer: with N classes, learn N SVM’s SVM 1 learns “Output==1” vs “Output != 1” SVM 2 learns “Output==2” vs “Output != 2” : SVM N learns “Output==N” vs “Output != N” Then to predict the output for a new input, just predict with each SVM and find out which one puts the prediction the furthest into the positive region.

Support Vector Machines: Slide 25 Compare SVM with NN Similarity −SVM + sigmoid kernel ~ two-layer feedforward NN −SVM + Gaussian kernel ~ RBF network −For most problems, SVM and NN have similar performance Advantages −Based on sound mathematics theory −Learning result is more robust −Over-fitting is not common −Not trapped in local minima (because of QP) −Fewer parameters to consider (kernel, error cost C) −Works well with fewer training samples (none support vectors do not matter much). Disadvantages −Problem need to be formulated as 2-class classification −Learning takes long time (QP optimization)