Download presentation
Presentation is loading. Please wait.
Published byClaude Pope Modified over 9 years ago
1
Support Vector Machines Mei-Chen Yeh 04/20/2010
2
The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined categories. Example: Image classification
3
Starting from the simplest setting Two-class Samples are linearly separable Class 1 Class 2 Hyperplane g(x) = w T x + w 0 = 0 How many classifiers we may have to separate the data? infinite! weight vectorthreshold > 0 < 0
4
Formulation Given training data: (x i, y i ), i = 1, 2, …, N, – x i : feature vector – y i : label Learn a hyper-plane which separates all data – variables: w and w 0 Testing: decision function f(x) = sign(w T x + w 0 ) – x: test data
5
Class 1 Class 2 H1 H2 H3 Hyperplanes H1, H2, and H3 are candidate classifiers. Which one is preferred? Why?
6
Choose the one with large margin! Class 1 Class 2 Class 1 Class 2
7
Class 1 Class 2 w T x + w 0 = 0 scale w, w 0 so that margin? w T x + w 0 = δ w T x + w 0 = -δ 1
8
Formulation Compute w, w 0 so that to: Side information:
9
Formulation The problem is equal to the optimization task: w can be recovered by Classification rule: – Assign x to ω 1 (ω 2 ) if Lagrange multipliers
10
Remarks Just some λ are not zeros. x i with non-zero λ are called support vectors. The hyperplane is determined only by the support vectors. The cost function is in the form of inner products. – does not depend explicitly on the dimensionality of the input space! Class 1 Class 2
11
Non-separable Classes Class 1 Class 2 Allow training errors! Previous constraint: y i (w T x i + w 0 ) ≥ 1 Introduce errors: y i (w T x i + w 0 ) ≥ 1- ξ i ξ i > 1 0 < ξ i ≤ 1 others, ξ i = 0
12
Formulation Compute w, w 0 so that to: penalty parameter
13
Formulation The dual problem:
14
Non-linear Case Linear separable in other spaces? Idea: map the feature vector to higher dimensional space
15
Non-linear Case Example: ( ) (.) ( )
16
Problems – High computation burden – Hard to get a good estimate
17
Kernel Trick Recall that in the dual problem, w can be recovered by g(x) = w T x + w 0 = All we need here is the inner product of (transformed) feature vectors!
18
Kernel Trick Decision function Kernel function – K(x i, x j ) = (x i ) (x j )
19
Example kernel The inner product can be directly computed without going through the mapping (.)
20
Remarks In practice, we specify K, thereby specifying (.) indirectly, instead of choosing (.) Intuitively, K(x, y) represents the similarity between data x and y K(x, y) needs to satisfy the Mercer condition in order for (.) to exist
21
Examples of Kernel Functions Polynomial kernel with degree d Radial basis function kernel with width Sigmoid with parameter and
22
Pros and Cons Strengths – Training is relatively easy – It scales relatively well to high dimensional data – Tradeoff between classifier complexity and error can be controlled explicitly Weaknesses – No practical method for the best selection of the kernel function – Binary classification alone Binary classification alone
23
Combing SVM binary classifiers for multi-class problem (1) M-category classification (ω 1, ω 2, …, ω M ) Two popular approaches 1.One-against-all (ω i, M-1 others) M classifiers Choose the one with the largest output Example: 5 categories Winner: ω1
24
Combing SVM binary classifiers for multi-class problem (2) 2.Pair-wise coupling (ω i, ω j ) M(M-1)/2 classifiers Aggregate the outputs Example: 5 categories svm outputs decision Voting! 1: 4 2: 1 3: 3 4: 0 5: 2 Winner: ω1
25
Data normalization The features may have different ranges. Example: We use weight (w) and height (h) for classifying male and female college students. – male: avg.(w) = 69.80 kg, avg.(h) = 174.36 cm – female: avg.(w) = 52.86 kg, avg.(h) = 159.77 cm Different scales!
26
Data normalization “Data pre-processing” Equalize scales among different features – Zero mean and unit variance Zero mean and unit variance – Two cases in practice (0, 1) if all feature values are positive (-1, 1) if feature values may be positive or negative
27
Data normalization x ik : feature k, sample i, Mean and variance Normalization back
28
Assignment #4 Develop a SVM classifier using either – OpenCV, or – LIBSVM (http://www.csie.ntu.edu.tw/~cjlin/libsvm/)http://www.csie.ntu.edu.tw/~cjlin/libsvm/ Use “training.txt” to train your classifier, and evaluate the performance “test.txt” Write a 1-page report that summarizes how you implement your classifier, and the classification accuracy rate.
29
Final project announcement Please prepare a short (<5 minutes) presentation on what you’re going to develop for the final project.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.