Download presentation
Presentation is loading. Please wait.
1
Geometrical intuition behind the dual problem
Based on: KP Bennett, EJ Bredensteiner, βDuality and Geometry in SVM Classifiersβ, Proceedings of the International Conference on Machine Learning, 2000
2
Geometrical intuition behind the dual problem
Convex hulls for two sets of points A and B π₯ 1 π₯ 2 A B
3
Geometrical intuition behind the dual problem
Convex hulls for two sets of points A and B defined by all possible points PA and PB π₯ 1 π₯ 2 A PA B PB πβπ΄ Ξ» π =1 Ξ» π β₯0,πβπ΄ πβπ΅ Ξ» π =1 Ξ» π β₯0,πβπ΅ π π΄ = πβπ΄ Ξ» π π₯ π π π΅ = πβπ΅ Ξ» π π₯ π
4
Geometrical intuition behind the dual problem
For the plane separating A and B we choose the bisecting line between specific PA and PB that have minimal distance π₯ 1 π₯ 2 β₯ π π΄ β π π΅ β₯ 2 A Here, for PA , a single coeff. is non-zero And for PB , two coeffs. are non-zero B PA PB πβπ΄ Ξ» π =1 Ξ» π β₯0,πβπ΄ πβπ΅ Ξ» π =1 Ξ» π β₯0,πβπ΅ π π΄ = πβπ΄ Ξ» π π₯ π π π΅ = πβπ΅ Ξ» π π₯ π
5
Geometrical intuition behind the dual problem
π₯ 1 π₯ 2 πππ β₯ π π΄ β π π΅ β₯ 2 = πππ π β₯ πβπ΄ Ξ» π π± π’ β πβπ΅ Ξ» π π± π£ β₯ 2 A = πππ π β₯ πβπ΄ Ξ» π π¦ π π± π’ + πβπ΅ Ξ» π π¦ π π± π£ β₯ 2 +1 -1 B PA = πππ π π,π Ξ» π Ξ» π π¦ π π¦ π ξ π± π’ . π± π£ ξ PB πβπ΄ Ξ» π =1 Ξ» π β₯0,πβπ΄ πβπ΅ Ξ» π =1 Ξ» π β₯0,πβπ΅ π π΄ = πβπ΄ Ξ» π π₯ π π π΅ = πβπ΅ Ξ» π π₯ π
6
Going from the Primal to the Dual
πππ π€ β₯π°β₯ 2 π .π‘.: π¦ π π°. π± π’ +π β₯1,π=1,β¦,π Constrained Optimization Problem label input A convex optimization problem (objective and constraints) Unique solution if datapoints are linearly separable
7
Going from the Primal to the Dual
πππ π€ β₯π°β₯ 2 π .π‘.: π¦ π π°. π± π’ +π β₯1,π=1,β¦,π Constrained Optimization Problem label input πΏ π°,π,π = 1 2 β₯π°β₯ 2 β π=1 π Ξ» π π¦ π π°. π± π’ +π β1 Lagrange:
8
Going from the Primal to the Dual
πππ π€ β₯π°β₯ 2 π .π‘.: π¦ π π°. π± π’ +π β₯1,π=1,β¦,π Constrained Optimization Problem label input πΏ π°,π,π = 1 2 β₯π°β₯ 2 β π=1 π Ξ» π π¦ π π°. π± π’ +π β1 Lagrange: KKT conditions: β π° πΏ π°,π,π =π°β π=1 π Ξ» π π¦ π π± π’ =0 β π πΏ π°,π,π =β π=1 π Ξ» π π¦ π =0
9
Going from the Primal to the Dual
πππ π€ β₯π°β₯ 2 π .π‘.: π¦ π π°. π± π’ +π β₯1,π=1,β¦,π Constrained Optimization Problem label input πΏ π°,π,π = 1 2 β₯π°β₯ 2 β π=1 π Ξ» π π¦ π π°. π± π’ +π β1 Lagrange: KKT conditions: Plus KKT: β π° πΏ π°,π,π =π°β π=1 π Ξ» π π¦ π π± π’ =0 β π πΏ π°,π,π =β π=1 π Ξ» π π¦ π =0 π°= π=1 π Ξ» π π¦ π π± π’ π=1 π Ξ» π π¦ π =0 Ξ» π π¦ π π°. π± π’ +π β1 =0,π=1,...,π
10
Going from the Primal to the Dual
πππ π€ β₯π°β₯ 2 π .π‘.: π¦ π π°. π± π’ +π β₯1,π=1,β¦,π Constrained Optimization Problem label input πΏ π°,π,π = 1 2 β₯π°β₯ 2 β π=1 π Ξ» π π¦ π π°. π± π’ +π β1 Lagrange: 1 2 β₯ π=1 π Ξ» π π¦ π π± π’ β₯ 2 β π,π=1 π Ξ» π Ξ» π π¦ π π¦ π π± π’ . π± π£ β π=1 π Ξ» π π¦ π π+ π=1 π Ξ» π πΏ π°,π,π = π°= π=1 π Ξ» π π¦ π π± π’ π=1 π Ξ» π π¦ π =0 β 1 2 π,π=1 π Ξ» π Ξ» π π¦ π π¦ π π± π’ . π± π£ + π=1 π Ξ» π πΏ π°,π,π =
11
Going from the Primal to the Dual
πππ π€ β₯π°β₯ 2 π .π‘.: π¦ π π°. π± π’ +π β₯1,π=1,β¦,π Constrained Optimization Problem label input πππ₯ π β 1 2 π,π=1 π Ξ» π Ξ» π π¦ π π¦ π π± π’ . π± π£ + π=1 π Ξ» π π .π‘.: Ξ» π β₯0,π=1,...,π π=1 π Ξ» π π¦ π =0 Equivalent Dual Problem:
12
Solution to the Dual Problem
π πππ β π± =π πππ π=1 π Ξ» π π¦ π π± π’ .π± +π π= π¦ π β π=1 π Ξ» π π¦ π π± π£ . π± π’ ,π=1,...,π The Dual Problem below admits the following solution: πππ₯ π β 1 2 π,π=1 π Ξ» π Ξ» π π¦ π π¦ π π± π’ . π± π£ + π=1 π Ξ» π π .π‘.: Ξ» π β₯0,π=1,...,π π=1 π Ξ» π π¦ π =0 Equivalent Dual Problem:
13
What are Support Vector Machines?
Linear classifiers (Mostly) binary classifiers Supervised training Good generalization with explicit bounds
14
Main Ideas Behind Support Vector Machines
Maximal margin Dual space Linear classifiers in high-dimensional space using non-linear mapping Kernel trick
15
Quadratic Programming
16
Using the Lagrangian
17
Dual Space
18
Dual Form
19
Non-linear separation of datasets
Non-linear separation is impossible in most problems Illustration from Prof. Mohri's lecture notes
20
Non-separable datasets
Solutions: 1) Nonlinear classifiers 2) Increase dimensionality of dataset and add a non-linear mapping Π€
21
Kernel Trick βsimilarity measureβ between 2 data samples
22
Kernel Trick Illustrated
23
Curse of Dimensionality Due to the Non-Linear Mapping
24
Positive Semi-Definite (P.S.D.) Kernels (Mercer Condition)
25
Advantages of SVM Work very well... Error bounds easy to obtain:
Generalization error small and predictable Fool-proof method: (Mostly) three kernels to choose from: Gaussian Linear and Polynomial Sigmoid Very small number of parameters to optimize
26
Limitations of SVM Size limitation:
Size of kernel matrix is quadratic with the number of training vectors Speed limitations: 1) During training: very large quadratic programming problem solved numerically Solutions: Chunking Sequential Minimal Optimization (SMO) breaks QP problem into many small QP problems solved analytically Hardware implementations 2) During testing: number of support vectors Solution: Online SVM
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.