1 An introduction to support vector machine (SVM) Advisor : Dr.Hsu Graduate : Ching –Wen Hong
2 Outline 1.SVM : A brief overview 2.Simple SVM : Linear classifier for separable data 3.Simple SVM : Linear classifier for non- separable data 4.Conclusion
3 SVM : A brief overview 1-1 What is a SVM ? a family of learning algorithm for classification of objects into two classes. Input : a training set {(x 1,y 1 ),…,(x l,y l )} of object x i E Ŕ(n-dim vector space) and their known classes y i E {-1,+1}. Output : a classifier f :Ŕ→ {-1,+1}.which predicts the class f(x) for any (new) object x E Ŕ
4 1-2 Pattern recognition example
5 1-3 Example of classification tasks Optical character recognition : x is an image, y is a character. Text classification : x is a text, y is a category. Medical diagnosis : x is a set of features (age, sex, blood type, genome,…), y indicates the risk.
6 1-4 Are there other methods for classification ? Bayesian classifier (base on maximum a posterior probability) Fisher linear discriminant Neural networks Expert system (rule-based) Decision tree …
7 1-5 Why is it gaining popularity ? Good performance in real-world applications. Computational efficiency. Robust in high dimension. No strong hypothesis on the data generation process (contrary to Bayesian approach).
8 2.Simplest SVM :Linear SVM for separable training sets a training set S= {(x 1,y 1 ),…,(x l,y l )}, x i E Ŕ, y i E {-1,+1}. 2-1 Linearly separable training set
9 2-2 Linear classifier
Which one is the best ?
How to find the optimal hyperplane? x i ·w+b≥+1 for y i =+1 (1) y i (x i· w+b)-1≥0,i=1,…,l x i ·w+b≤-1 for y i = -1 (2), w is the Normal vector of H1,H2 H1: x i ·w+b=1,H2: x i ·w+b=-1 Margin=2/║w║, ◎ is a support vector.
Finding the optimal hyperplane The optimal hyperplane is defined by the pair (w,b). Solve the linear program problem Min ½║w║² st. y i (x i· w+b)-1≥0,i=1,…,l This is a class quadratic(convex) program
Lagrange Method
Recovery the optimal hyperplane Once α i,i=1,..,l is found. we recover (w,b) corresponding to the optimal hyperplane, w is given by w=∑ α i y i x i and the decision function f(x)=w·x+b
Solving the dual problem
The Karush-Kahn-Tucker conditions The KKT conditions are necessary and sufficient for w,b,α to be solution,Thus solving the SVM problem is equivalent to finding asolution to the KKT conditions. From the KKT conditions,we can the following conclusion,If αi>0 then y i (w·x i +b)=1 and xi is a support vector If all other training points(αi=0) were removed and training was repeated,the separating hyperplane would be found.
Examples by Pictures
18 3.Simplest SVM :Linear classifier for non-separable data 3-1 Finding the optimal hyperplane Solve the linear program problem Min ½║w║²+C(∑ε i ), c is a extreme large value S.t. y i (x i· w+b)-1+ ε i ≥0, ε i ≥0, 0≤αi≤c,i =1,…,l
Lagrange Method
20 Simplest SVM :Conclusion Finds the optimal hyperplane, which corresponds to the largest margin Can be solved easily using a dual formulation The solution is sparse : the number of support vectors can be very small compared to the size of the training set Only support vectors are important for prediction of future points. All other points can be forgotten.