Download presentation
Presentation is loading. Please wait.
Published bySusan Preston Modified over 9 years ago
1
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002
2
Presentation Outline Introduction Linear Learning Machines Support Vector Machines (SVM) Examples Conclusions
3
Introduction Building machines capable of learning from experiences. Experiences are usually specified by finite amount of training data. The goal is to achieve high generalization performance via learning from the training set. The construction of a good learning machine is a compromise between the accuracy attained on a particular training set and the “capacity” of the machine. SVMs have large learning capacity and can have excellent generalization performance.
4
Linear Learning Machines Binary classification uses a linear function g(x) = w t x+w 0. x is the feature vector, w is the weight vector and w 0 the bias or threshold weight. A two-category classifier implements the decision rule: Decide class 1 if g(x)>0 and class -1 if g(x)<0.
5
A Simple Linear Classifier
6
Some Properties of Linear Learning Machines Decision surface is a hyperplane. The feature space is divided into two half-spaces.
7
Several Questions Does there exist a hyperplane which separates the training set? If yes, how to compute it? Is it unique? If not unique, can we and how can we find an “optimal” one? What can we do if there doesn’t exist one?
8
Facts If the training set is linearly separable, then there exist infinitely many separating hyperplanes for the given training set. If the training set is linearly inseparable then there does not exist any separating hyperplane for the given training set.
9
Support Vector Machines Linearly Separable
10
Support Vector Machines Margin: 2/|w| H 1 : w t x-w 0 =1 H: w t x-w 0 =0 H 2 : w t x-w 0 =-1
11
Support Vector Machines Maximize the margin Minimize |w|/2
12
Support Vector Machines Quadratic Program (Maximal Margin) min w,w 0 |w| 2 /2, s.t.w t x i ≥w 0 +1 for y i =1, and w t x i w 0 -1 for y i =-1. (or equivalently y i (w t x i -w 0 ) ≥1) Dual QP (Maximal Margin) min 0.5 i=1,…,m j=1,…,m y i y j i j x i t x j - i=1,…,m i s.t. i=1,…,m y i i =0, i 0, i=1,…,m Support Vectors w is a linear combination of support vectors.
13
Support Vector Machines Linearly Inseparable
14
Support Vector Machines Maximize Margin and Minimize Error (Soft Margin) min w,w 0,z |w| 2 /2+C i=1,…m z i, s. t.y i (w t x i -w 0 )+z i ≥1, z i ≥0, i=1,…,m. (z i is slack or error variable) Dual QP (Soft Margin) min 0.5 i=1,…,m j=1,…,m y i y j i j x i t x j - i=1,…,m i s.t. i=1,…,m y i i =0 C i 0, i=1,…,m
15
Support Vector Machines Nonlinear Mappings via Kernels Idea: Map original features into higher dimensional feature space x (x). Design classifier in the new feature space. The classifier is nonlinear in the original feature space but linear in the new feature space. (With an appropriate nonlinear mapping to a sufficiently high dimension, data from two categories can always be separated by a hyperplane.)
16
Support Vector Machines Maximal Margin min 0.5 i=1,…,m j=1,…,m y i y j i j (x i ) t (x j ) - i=1,…,m i s.t. i=1,…,m y i i =0, i 0, i=1,…,m Soft Margin min 0.5 i=1,…,m j=1,…,m y i y j i j (x i ) t (x j ) - i=1,…,m i s.t. i=1,…,m y i i =0, C i 0, i=1,…,m
17
Support Vector Machines Role of Kernels Simplify the computation of inner product in the new feature space: K(x,y) = (x) t (y). Some Popular Kernels PolynomialK(x,y)=(x t y+1) p GaussianK(x,y)=e -|x-y| 2 /2 2 SigmoidK(x,y)=tanh( x t y- ) Maximal Margin and Soft Margin
18
Support Vector Machines Maximal Margin min 0.5 i=1,…,m j=1,…,m y i y j i j K(x i,x j ) - i=1,…,m i s.t. i=1,…,m y i i =0, i 0, i=1,…,m Soft Margin min 0.5 i=1,…,m j=1,…,m y i y j i j K(x i,x j ) - i=1,…,m i s.t. i=1,…,m y i i =0, C i 0, i=1,…,m
19
Examples Checker-Board Problem
20
169 training samples, Gauss Kernel, Soft Margin, C=1000
21
Checker-Board Problem 169 training samples, Gauss Kernel, Soft Margin, C=1000
22
Examples Two-Spiral Problem
23
154 training samples, Gauss Kernel, Soft Margin, C=1000
24
Two-Spiral Problem 154 training samples, Gauss Kernel, Soft Margin, C=1000
25
Conclusions Advantages Always finds a global minimum. Simple and clear geometric interpretation. Limitations Choice of Kernel. Training a multi-class SVM in one step.
26
References N. Cristianini and J.Shawe-Taylor, An Intorduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, 2000. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, John Wiley & Sons, INC., 2001. C. J.C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, c, 121-167, 1998. K. P. Bennett and C. Campbell, Support Vector Machines: Hype or Hallelujah?, SIGKDD Explorations, 2, 2, 1-13, 2000. SVM Light, http://svmlight.joachims.org/http://svmlight.joachims.org/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.