Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.

Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002

Presentation Outline Introduction Linear Learning Machines Support Vector Machines (SVM) Examples Conclusions

Introduction Building machines capable of learning from experiences. Experiences are usually specified by finite amount of training data. The goal is to achieve high generalization performance via learning from the training set. The construction of a good learning machine is a compromise between the accuracy attained on a particular training set and the “capacity” of the machine. SVMs have large learning capacity and can have excellent generalization performance.

Linear Learning Machines Binary classification uses a linear function g(x) = w t x+w 0. x is the feature vector, w is the weight vector and w 0 the bias or threshold weight. A two-category classifier implements the decision rule: Decide class 1 if g(x)>0 and class -1 if g(x)<0.

A Simple Linear Classifier

Some Properties of Linear Learning Machines Decision surface is a hyperplane. The feature space is divided into two half-spaces.

Several Questions Does there exist a hyperplane which separates the training set? If yes, how to compute it? Is it unique? If not unique, can we and how can we find an “optimal” one? What can we do if there doesn’t exist one?

Facts If the training set is linearly separable, then there exist infinitely many separating hyperplanes for the given training set. If the training set is linearly inseparable then there does not exist any separating hyperplane for the given training set.

Support Vector Machines Linearly Separable

Support Vector Machines Margin: 2/|w| H 1 : w t x-w 0 =1 H: w t x-w 0 =0 H 2 : w t x-w 0 =-1

Support Vector Machines Maximize the margin  Minimize |w|/2

Support Vector Machines Quadratic Program (Maximal Margin) min w,w 0 |w| 2 /2, s.t.w t x i ≥w 0 +1 for y i =1, and w t x i  w 0 -1 for y i =-1. (or equivalently y i (w t x i -w 0 ) ≥1) Dual QP (Maximal Margin) min  0.5  i=1,…,m  j=1,…,m y i y j  i  j x i t x j -  i=1,…,m  i s.t.  i=1,…,m y i  i =0,  i  0, i=1,…,m Support Vectors w is a linear combination of support vectors.

Support Vector Machines Linearly Inseparable

Support Vector Machines Maximize Margin and Minimize Error (Soft Margin) min w,w 0,z |w| 2 /2+C  i=1,…m z i, s. t.y i (w t x i -w 0 )+z i ≥1, z i ≥0, i=1,…,m. (z i is slack or error variable) Dual QP (Soft Margin) min  0.5  i=1,…,m  j=1,…,m y i y j  i  j x i t x j -  i=1,…,m  i s.t.  i=1,…,m y i  i =0 C  i  0, i=1,…,m

Support Vector Machines Nonlinear Mappings via Kernels Idea: Map original features into higher dimensional feature space x  (x). Design classifier in the new feature space. The classifier is nonlinear in the original feature space but linear in the new feature space. (With an appropriate nonlinear mapping to a sufficiently high dimension, data from two categories can always be separated by a hyperplane.)

Support Vector Machines Maximal Margin min  0.5  i=1,…,m  j=1,…,m y i y j  i  j  (x i ) t  (x j ) -  i=1,…,m  i s.t.  i=1,…,m y i  i =0,  i  0, i=1,…,m Soft Margin min  0.5  i=1,…,m  j=1,…,m y i y j  i  j  (x i ) t  (x j ) -  i=1,…,m  i s.t.  i=1,…,m y i  i =0, C  i  0, i=1,…,m

Support Vector Machines Role of Kernels Simplify the computation of inner product in the new feature space: K(x,y) =  (x) t  (y). Some Popular Kernels PolynomialK(x,y)=(x t y+1) p GaussianK(x,y)=e -|x-y| 2 /2  2 SigmoidK(x,y)=tanh(  x t y-  ) Maximal Margin and Soft Margin

Support Vector Machines Maximal Margin min  0.5  i=1,…,m  j=1,…,m y i y j  i  j K(x i,x j ) -  i=1,…,m  i s.t.  i=1,…,m y i  i =0,  i  0, i=1,…,m Soft Margin min  0.5  i=1,…,m  j=1,…,m y i y j  i  j K(x i,x j ) -  i=1,…,m  i s.t.  i=1,…,m y i  i =0, C  i  0, i=1,…,m

Examples Checker-Board Problem

169 training samples, Gauss Kernel, Soft Margin, C=1000

Checker-Board Problem 169 training samples, Gauss Kernel, Soft Margin, C=1000

Examples Two-Spiral Problem

154 training samples, Gauss Kernel, Soft Margin, C=1000

Two-Spiral Problem 154 training samples, Gauss Kernel, Soft Margin, C=1000

Conclusions Advantages Always finds a global minimum. Simple and clear geometric interpretation. Limitations Choice of Kernel. Training a multi-class SVM in one step.

References N. Cristianini and J.Shawe-Taylor, An Intorduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, 2000. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, John Wiley & Sons, INC., 2001. C. J.C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, c, 121-167, 1998. K. P. Bennett and C. Campbell, Support Vector Machines: Hype or Hallelujah?, SIGKDD Explorations, 2, 2, 1-13, 2000. SVM Light, http://svmlight.joachims.org/http://svmlight.joachims.org/

Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.

Similar presentations

Presentation on theme: "Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.

Similar presentations

Presentation on theme: "Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002."— Presentation transcript:

Similar presentations

About project

Feedback