Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.

Support Vector Machines Mei-Chen Yeh 04/20/2010

The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined categories. Example: Image classification

Starting from the simplest setting Two-class Samples are linearly separable Class 1 Class 2 Hyperplane g(x) = w T x + w 0 = 0 How many classifiers we may have to separate the data? infinite! weight vectorthreshold > 0 < 0

Formulation Given training data: (x i, y i ), i = 1, 2, …, N, – x i : feature vector – y i : label Learn a hyper-plane which separates all data – variables: w and w 0 Testing: decision function f(x) = sign(w T x + w 0 ) – x: test data

Class 1 Class 2 H1 H2 H3 Hyperplanes H1, H2, and H3 are candidate classifiers. Which one is preferred? Why?

Choose the one with large margin! Class 1 Class 2 Class 1 Class 2

Class 1 Class 2 w T x + w 0 = 0 scale w, w 0 so that margin? w T x + w 0 = δ w T x + w 0 = -δ 1

Formulation Compute w, w 0 so that to: Side information:

Formulation The problem is equal to the optimization task: w can be recovered by Classification rule: – Assign x to ω 1 (ω 2 ) if Lagrange multipliers

Remarks Just some λ are not zeros. x i with non-zero λ are called support vectors. The hyperplane is determined only by the support vectors. The cost function is in the form of inner products. – does not depend explicitly on the dimensionality of the input space! Class 1 Class 2

Non-separable Classes Class 1 Class 2 Allow training errors! Previous constraint: y i (w T x i + w 0 ) ≥ 1 Introduce errors: y i (w T x i + w 0 ) ≥ 1- ξ i ξ i > 1 0 < ξ i ≤ 1 others, ξ i = 0

Formulation Compute w, w 0 so that to: penalty parameter

Formulation The dual problem:

Non-linear Case Linear separable in other spaces? Idea: map the feature vector to higher dimensional space

Non-linear Case Example:  ( )  (.)  ( )

Problems – High computation burden – Hard to get a good estimate

Kernel Trick Recall that in the dual problem, w can be recovered by g(x) = w T x + w 0 = All we need here is the inner product of (transformed) feature vectors!

Kernel Trick Decision function Kernel function – K(x i, x j ) =  (x i )   (x j )

Example kernel The inner product can be directly computed without going through the mapping  (.)

Remarks In practice, we specify K, thereby specifying  (.) indirectly, instead of choosing  (.) Intuitively, K(x, y) represents the similarity between data x and y K(x, y) needs to satisfy the Mercer condition in order for  (.) to exist

Examples of Kernel Functions Polynomial kernel with degree d Radial basis function kernel with width  Sigmoid with parameter  and 

Pros and Cons Strengths – Training is relatively easy – It scales relatively well to high dimensional data – Tradeoff between classifier complexity and error can be controlled explicitly Weaknesses – No practical method for the best selection of the kernel function – Binary classification alone Binary classification alone

Combing SVM binary classifiers for multi-class problem (1) M-category classification (ω 1, ω 2, …, ω M ) Two popular approaches 1.One-against-all (ω i, M-1 others) M classifiers Choose the one with the largest output Example: 5 categories Winner: ω1

Combing SVM binary classifiers for multi-class problem (2) 2.Pair-wise coupling (ω i, ω j ) M(M-1)/2 classifiers Aggregate the outputs Example: 5 categories svm outputs decision Voting! 1: 4 2: 1 3: 3 4: 0 5: 2 Winner: ω1

Data normalization The features may have different ranges. Example: We use weight (w) and height (h) for classifying male and female college students. – male: avg.(w) = 69.80 kg, avg.(h) = 174.36 cm – female: avg.(w) = 52.86 kg, avg.(h) = 159.77 cm Different scales!

Data normalization “Data pre-processing” Equalize scales among different features – Zero mean and unit variance Zero mean and unit variance – Two cases in practice (0, 1) if all feature values are positive (-1, 1) if feature values may be positive or negative

Data normalization x ik : feature k, sample i, Mean and variance Normalization back

Assignment #4 Develop a SVM classifier using either – OpenCV, or – LIBSVM (http://www.csie.ntu.edu.tw/~cjlin/libsvm/)http://www.csie.ntu.edu.tw/~cjlin/libsvm/ Use “training.txt” to train your classifier, and evaluate the performance “test.txt” Write a 1-page report that summarizes how you implement your classifier, and the classification accuracy rate.

Final project announcement Please prepare a short (<5 minutes) presentation on what you’re going to develop for the final project.

Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.

Similar presentations

Presentation on theme: "Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.

Similar presentations

Presentation on theme: "Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined."— Presentation transcript:

Similar presentations

About project

Feedback