Download presentation
Presentation is loading. Please wait.
1
Statistical Learning Dong Liu Dept. EEIS, USTC
2
Chapter 3. Support Vector Machine (SVM)
Max-margin linear classification Soft-margin linear classification The kernel trick Efficient algorithm for SVM 2018/11/27 Chap 3. SVM
3
Linear classification
Which one is the optimal? 2018/11/27 Chap 3. SVM
4
Classification margin
We want to maximize the margin Intuitively, this way the classifier is the most tolerant to noise Theoretically, this way the classifier has the best generalization ability margin 2018/11/27 Chap 3. SVM
5
Geometric margin & Functional margin
For a point , its distance to the decision boundary is Geometric margin is Functional margin is Since we can amplify both w and b by a scaling factor, without changing the geometric margin, we can set 2018/11/27 Chap 3. SVM
6
Maximize geometric margin 1/2
The problem is Equivalent to 2018/11/27 Chap 3. SVM
7
Maximize geometric margin 2/2
Using the Lagrange multiplier According to KKT condition is determined by the samples that have non-zero where These samples are termed support vectors 2018/11/27 Chap 3. SVM
8
Support vectors margin is determined by the samples that have non-zero
wT x + b = 1 margin is determined by the samples that have non-zero where These samples are termed support vectors wT x + b = -1 2018/11/27 Chap 3. SVM
9
Lagrange dual For a general constrained optimization problem
We try to solve And its dual problem is Under certain conditions, the original problem and its dual problem are equivalent 2018/11/27 Chap 3. SVM
10
Lagrange dual of max-margin
Original problem: Dual problem: Plus KKT condition: 2018/11/27 Chap 3. SVM
11
Solution of max-margin
Once we have solved the dual problem, we have Summary: for max-margin classification, we can solve the dual problem to find out support vectors, and then determine the best decision boundary 2018/11/27 Chap 3. SVM
12
Chapter 3. Support Vector Machine (SVM)
Max-margin linear classification Soft-margin linear classification The kernel trick Efficient algorithm for SVM 2018/11/27 Chap 3. SVM
13
Why soft margin 1/4 If the dataset is linearly non-separable, how to define margin? 2018/11/27 Chap 3. SVM
14
Why soft margin 2/4 We may still define margin disregarding the “error” samples margin 2018/11/27 Chap 3. SVM
15
Why soft margin 3/4 Or, even if the dataset is linearly separable, we still want to give rise to a large margin 2018/11/27 Chap 3. SVM
16
Why soft margin 4/4 We may still define margin but allow samples to be exceptions margin 2018/11/27 Chap 3. SVM
17
Soft margin formulation 1/3
We change our objective to where the indicator function is Compared to the “hard” margin 2018/11/27 Chap 3. SVM
18
Soft margin formulation 2/3
Using the Lagrange multiplier Since the indicator function is intractable, we replace it with So the problem becomes It can be interpreted as to minimize hinge loss with L2 norm regularization 2018/11/27 Chap 3. SVM
19
Soft margin formulation 3/3
Define slack variables The problem becomes x1 x2 wT x + b = 0 wT x + b = -1 wT x + b = 1 2018/11/27 Chap 3. SVM
20
Soft margin solution 1/2 Using the Lagrange multiplier
The KKT condition: 2018/11/27 Chap 3. SVM
21
Soft margin solution 2/2 Thus the Lagrange dual problem is
Samples are categorized into Support vectors 2018/11/27 Chap 3. SVM
22
Support vectors in soft margin SVM
wT x + b = 1 margin is determined by the samples that have non-zero where These samples are termed support vectors wT x + b = -1 2018/11/27 Chap 3. SVM
23
Chapter 3. Support Vector Machine (SVM)
Max-margin linear classification Soft-margin linear classification The kernel trick Efficient algorithm for SVM 2018/11/27 Chap 3. SVM
24
Using basis functions A non-linear transform to allow for an easier (linear) classification 2018/11/27 Chap 3. SVM
25
SVM with basis functions
Consider to solve The dual problem is The solution is 2018/11/27 Chap 3. SVM
26
From basis function to kernel function
We notice that, all the basis functions appear in the form of inner product We define the inner product of basis functions as which is termed kernel function The space of basis function is then termed Reproducing Kernel Hilbert Space (RKHS) 2018/11/27 Chap 3. SVM
27
Kernel function: example
For example, we can prove that if , then is a kernel function Since we can set Similarly, we can prove the following kernel functions RBF = Radial-Basis Function 2018/11/27 Chap 3. SVM
28
Kernel function: benefit
For SVM (and alike), defining kernel function is equivalent to designing basis functions, which is termed kernel trick Sometimes it is easier to express in kernel function than in basis function, for example RBF kernel Sometimes we can prove a function is a kernel function but it is difficult to write out its corresponding basis function If a function satisfies the Mercer’s condition, it is a kernel function 2018/11/27 Chap 3. SVM
29
Kernelized SVM The dual problem is Once solved, we have 2018/11/27
Chap 3. SVM
30
Kernelized SVM: example
Using the RBF kernel 2018/11/27 Chap 3. SVM
31
More about the kernel trick
There are many problems that can be formulated using the kernel trick, i.e. using kernel function to replace basis function The Representation Theorem claims that, the solution to can be expressed as 2018/11/27 Chap 3. SVM
32
Chapter 3. Support Vector Machine (SVM)
Max-margin linear classification Soft-margin linear classification The kernel trick Efficient algorithm for SVM 2018/11/27 Chap 3. SVM
33
SMO algorithm The (dual) problem is
For this problem, sequential minimal optimization (SMO) is an efficient algorithm Choose two Lagrange multipliers as variables, optimize over them while keeping the other multipliers unchanged, and iterate 2018/11/27 Chap 3. SVM
34
SMO algorithm: considering two variables
Due to the KKT condition, we have either or And our objective is a quadratic function 2018/11/27 Chap 3. SVM
35
Chapter summary Kernel trick Margin; geometric ~; soft ~
Dictionary Toolbox Kernel trick Margin; geometric ~; soft ~ Mercer’s condition Representation theorem RKHS Slack variable Support vector Hinge loss Kernel function Lagrange dual Sequential minimal optimization (SMO) 2018/11/27 Chap 3. SVM
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.