Geometrical intuition behind the dual problem

Geometrical intuition behind the dual problem
Based on: KP Bennett, EJ Bredensteiner, “Duality and Geometry in SVM Classifiers”, Proceedings of the International Conference on Machine Learning, 2000

Convex hulls for two sets of points A and B 𝑥 1 𝑥 2 A B

Convex hulls for two sets of points A and B defined by all possible points PA and PB 𝑥 1 𝑥 2 A PA B PB 𝑖∈𝐴 λ 𝑖 =1 λ 𝑖 ≥0,𝑖∈𝐴 𝑖∈𝐵 λ 𝑖 =1 λ 𝑖 ≥0,𝑖∈𝐵 𝑃 𝐴 = 𝑖∈𝐴 λ 𝑖 𝑥 𝑖 𝑃 𝐵 = 𝑖∈𝐵 λ 𝑖 𝑥 𝑖

For the plane separating A and B we choose the bisecting line between specific PA and PB that have minimal distance 𝑥 1 𝑥 2 ∥ 𝑃 𝐴 − 𝑃 𝐵 ∥ 2 A Here, for PA , a single coeff. is non-zero And for PB , two coeffs. are non-zero B PA PB 𝑖∈𝐴 λ 𝑖 =1 λ 𝑖 ≥0,𝑖∈𝐴 𝑖∈𝐵 λ 𝑖 =1 λ 𝑖 ≥0,𝑖∈𝐵 𝑃 𝐴 = 𝑖∈𝐴 λ 𝑖 𝑥 𝑖 𝑃 𝐵 = 𝑖∈𝐵 λ 𝑖 𝑥 𝑖

𝑥 1 𝑥 2 𝑚𝑖𝑛 ∥ 𝑃 𝐴 − 𝑃 𝐵 ∥ 2 = 𝑚𝑖𝑛 𝛌 ∥ 𝑖∈𝐴 λ 𝑖 𝐱 𝐢 − 𝑗∈𝐵 λ 𝑗 𝐱 𝐣 ∥ 2 A = 𝑚𝑖𝑛 𝛌 ∥ 𝑖∈𝐴 λ 𝑖 𝑦 𝑖 𝐱 𝐢 + 𝑗∈𝐵 λ 𝑗 𝑦 𝑗 𝐱 𝐣 ∥ 2 +1 -1 B PA = 𝑚𝑖𝑛 𝛌 𝑖,𝑗 λ 𝑖 λ 𝑗 𝑦 𝑖 𝑦 𝑗  𝐱 𝐢 . 𝐱 𝐣  PB 𝑖∈𝐴 λ 𝑖 =1 λ 𝑖 ≥0,𝑖∈𝐴 𝑖∈𝐵 λ 𝑖 =1 λ 𝑖 ≥0,𝑖∈𝐵 𝑃 𝐴 = 𝑖∈𝐴 λ 𝑖 𝑥 𝑖 𝑃 𝐵 = 𝑖∈𝐵 λ 𝑖 𝑥 𝑖

Going from the Primal to the Dual
𝑚𝑖𝑛 𝑤 ∥𝐰∥ 2 𝑠.𝑡.: 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 ≥1,𝑖=1,…,𝑚 Constrained Optimization Problem label input A convex optimization problem (objective and constraints) Unique solution if datapoints are linearly separable

𝑚𝑖𝑛 𝑤 ∥𝐰∥ 2 𝑠.𝑡.: 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 ≥1,𝑖=1,…,𝑚 Constrained Optimization Problem label input 𝐿 𝐰,𝑏,𝛌 = 1 2 ∥𝐰∥ 2 − 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 −1 Lagrange:

𝑚𝑖𝑛 𝑤 ∥𝐰∥ 2 𝑠.𝑡.: 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 ≥1,𝑖=1,…,𝑚 Constrained Optimization Problem label input 𝐿 𝐰,𝑏,𝛌 = 1 2 ∥𝐰∥ 2 − 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 −1 Lagrange: KKT conditions: ∇ 𝐰 𝐿 𝐰,𝑏,𝛌 =𝐰− 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐱 𝐢 =0 ∇ 𝑏 𝐿 𝐰,𝑏,𝛌 =− 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 =0

𝑚𝑖𝑛 𝑤 ∥𝐰∥ 2 𝑠.𝑡.: 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 ≥1,𝑖=1,…,𝑚 Constrained Optimization Problem label input 𝐿 𝐰,𝑏,𝛌 = 1 2 ∥𝐰∥ 2 − 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 −1 Lagrange: KKT conditions: Plus KKT: ∇ 𝐰 𝐿 𝐰,𝑏,𝛌 =𝐰− 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐱 𝐢 =0 ∇ 𝑏 𝐿 𝐰,𝑏,𝛌 =− 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 =0 𝐰= 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐱 𝐢 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 =0 λ 𝑖 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 −1 =0,𝑖=1,...,𝑚

𝑚𝑖𝑛 𝑤 ∥𝐰∥ 2 𝑠.𝑡.: 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 ≥1,𝑖=1,…,𝑚 Constrained Optimization Problem label input 𝐿 𝐰,𝑏,𝛌 = 1 2 ∥𝐰∥ 2 − 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 −1 Lagrange: 1 2 ∥ 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐱 𝐢 ∥ 2 − 𝑖,𝑗=1 𝑚 λ 𝑖 λ 𝑗 𝑦 𝑖 𝑦 𝑗 𝐱 𝐢 . 𝐱 𝐣 − 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝑏+ 𝑖=1 𝑚 λ 𝑖 𝐿 𝐰,𝑏,𝛌 = 𝐰= 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐱 𝐢 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 =0 − 1 2 𝑖,𝑗=1 𝑚 λ 𝑖 λ 𝑗 𝑦 𝑖 𝑦 𝑗 𝐱 𝐢 . 𝐱 𝐣 + 𝑖=1 𝑚 λ 𝑖 𝐿 𝐰,𝑏,𝛌 =

𝑚𝑖𝑛 𝑤 ∥𝐰∥ 2 𝑠.𝑡.: 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 ≥1,𝑖=1,…,𝑚 Constrained Optimization Problem label input 𝑚𝑎𝑥 𝛌 − 1 2 𝑖,𝑗=1 𝑚 λ 𝑖 λ 𝑗 𝑦 𝑖 𝑦 𝑗 𝐱 𝐢 . 𝐱 𝐣 + 𝑖=1 𝑚 λ 𝑖 𝑠.𝑡.: λ 𝑖 ≥0,𝑖=1,...,𝑚 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 =0 Equivalent Dual Problem:

Solution to the Dual Problem
𝑠𝑖𝑔𝑛 ℎ 𝐱 =𝑠𝑖𝑔𝑛 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐱 𝐢 .𝐱 +𝑏 𝑏= 𝑦 𝑖 − 𝑗=1 𝑚 λ 𝑗 𝑦 𝑗 𝐱 𝐣 . 𝐱 𝐢 ,𝑖=1,...,𝑚 The Dual Problem below admits the following solution: 𝑚𝑎𝑥 𝛌 − 1 2 𝑖,𝑗=1 𝑚 λ 𝑖 λ 𝑗 𝑦 𝑖 𝑦 𝑗 𝐱 𝐢 . 𝐱 𝐣 + 𝑖=1 𝑚 λ 𝑖 𝑠.𝑡.: λ 𝑖 ≥0,𝑖=1,...,𝑚 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 =0 Equivalent Dual Problem:

What are Support Vector Machines?
Linear classifiers (Mostly) binary classifiers Supervised training Good generalization with explicit bounds

Main Ideas Behind Support Vector Machines
Maximal margin Dual space Linear classifiers in high-dimensional space using non-linear mapping Kernel trick

Quadratic Programming

Using the Lagrangian

Dual Space

Dual Form

Non-linear separation of datasets
Non-linear separation is impossible in most problems Illustration from Prof. Mohri's lecture notes

Non-separable datasets
Solutions: 1) Nonlinear classifiers 2) Increase dimensionality of dataset and add a non-linear mapping Ф

Kernel Trick “similarity measure” between 2 data samples

Kernel Trick Illustrated

Curse of Dimensionality Due to the Non-Linear Mapping

Positive Semi-Definite (P.S.D.) Kernels (Mercer Condition)

Advantages of SVM Work very well... Error bounds easy to obtain:
Generalization error small and predictable Fool-proof method: (Mostly) three kernels to choose from: Gaussian Linear and Polynomial Sigmoid Very small number of parameters to optimize

Limitations of SVM Size limitation:
Size of kernel matrix is quadratic with the number of training vectors Speed limitations: 1) During training: very large quadratic programming problem solved numerically Solutions: Chunking Sequential Minimal Optimization (SMO) breaks QP problem into many small QP problems solved analytically Hardware implementations 2) During testing: number of support vectors Solution: Online SVM

Geometrical intuition behind the dual problem

Similar presentations

Presentation on theme: "Geometrical intuition behind the dual problem"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Geometrical intuition behind the dual problem

Similar presentations

Presentation on theme: "Geometrical intuition behind the dual problem"— Presentation transcript:

Similar presentations

About project

Feedback