Download presentation
Presentation is loading. Please wait.
1
LOGO Classification IV Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn
2
Overview Support Vector Machines 2
3
Linear Classifier 3 w · x + b >0 w · x + b <0 w · x + b =0 w
4
Distance to Hyperplane 4 x x'x'
5
Selection of Classifiers 5 Which classifier is the best? All have the same training error. How about generalization? ?
6
Unknown Samples 6 A B Classifier B divides the space more consistently (unbiased).
7
Margins 7 Support Vectors
8
Margins The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point. Intuitively, it is safer to choose a classifier with a larger margin. Wider buffer zone for mistakes The hyperplane is decided by only a few data points. Support Vectors! Others can be discarded! Select the classifier with the maximum margin. Linear Support Vector Machines (LSVM) Works very well in practice. How to specify the margin formally? 8
9
Margins 9 “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 X-X- x+x+ M=Margin Width
10
Objective Function Correctly classify all data points: Maximize the margin Quadratic Optimization Problem Minimize Subject to 10
11
Lagrange Multipliers 11 Dual Problem Quadratic problem again!
12
Solutions of w & b 12 inner product
13
An Example 13 (1, 1, +1) (0, 0, -1) x1x1 x2x2
14
Soft Margin 14 wx+b=1 wx+b=0 wx+b=-1 77 11 22
15
Soft Margin 15
16
Non-linear SVMs 16 0 x 0 x x2x2 x
17
Feature Space 17 Φ: x → φ(x) x1x1 x2x2 x12x12 x22x22
18
Feature Space 18 Φ: x → φ(x) x2x2 x1x1
19
Quadratic Basis Functions 19 Constant Terms Linear Terms Pure Quadratic Terms Quadratic Cross-Terms Number of terms
20
Calculation of Φ(x i )·Φ(x j ) 20
21
It turns out … 21
22
Kernel Trick The linear classifier relies on dot products between vectors x i ·x j If every data point is mapped into a high-dimensional space via some transformation Φ: x → φ(x), the dot product becomes: φ(x i ) ·φ(x j ) A kernel function is some function that corresponds to an inner product in some expanded feature space: K(x i, x j ) = φ(x i ) ·φ(x j ) Example: x=[x 1, x 2 ]; K(x i, x j ) = (1 + x i · x j ) 2 22
23
Kernels 23
24
String Kernel 24 Similarity between text strings: Car vs. Custard
25
Solutions of w & b 25
26
Decision Boundaries 26
27
More Maths … 27
28
SVM Roadmap 28 Kernel Trick K(a,b)= Φ(a)·Φ(b) a · b → Φ(a)·Φ(b) High Computational Cost Soft Margin Nonlinear Problem Linear SVM Noise Linear Classifier Maximum Margin
29
Reading Materials Text Book Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000. Online Resources http://www.kernel-machines.org/ http://www.support-vector-machines.org/ http://www.tristanfletcher.co.uk/SVM%20Explained.pdf http://www.csie.ntu.edu.tw/~cjlin/libsvm/ A list of papers uploaded to the web learning portal Wikipedia & Google 29
30
Review What is the definition of margin in a linear classifier? Why do we want to maximize the margin? What is the mathematical expression of margin? How to solve the objective function in SVM? What are support vectors? What is soft margin? How does SVM solve nonlinear problems? What is so called “kernel trick”? What are the commonly used kernels? 30
31
Next Week’s Class Talk Volunteers are required for next week’s class talk. Topic : SVM in Practice Hints: Applications Demos Multi-Class Problems Software A very popular toolbox: Libsvm Any other interesting topics beyond this lecture Length: 20 minutes plus question time 31
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.