Presentation is loading. Please wait.

Presentation is loading. Please wait.

Geometrical intuition behind the dual problem

Similar presentations


Presentation on theme: "Geometrical intuition behind the dual problem"β€” Presentation transcript:

1 Geometrical intuition behind the dual problem
Based on: KP Bennett, EJ Bredensteiner, β€œDuality and Geometry in SVM Classifiers”, Proceedings of the International Conference on Machine Learning, 2000

2 Geometrical intuition behind the dual problem
Convex hulls for two sets of points A and B π‘₯ 1 π‘₯ 2 A B

3 Geometrical intuition behind the dual problem
Convex hulls for two sets of points A and B defined by all possible points PA and PB π‘₯ 1 π‘₯ 2 A PA B PB π‘–βˆˆπ΄ Ξ» 𝑖 =1 Ξ» 𝑖 β‰₯0,π‘–βˆˆπ΄ π‘–βˆˆπ΅ Ξ» 𝑖 =1 Ξ» 𝑖 β‰₯0,π‘–βˆˆπ΅ 𝑃 𝐴 = π‘–βˆˆπ΄ Ξ» 𝑖 π‘₯ 𝑖 𝑃 𝐡 = π‘–βˆˆπ΅ Ξ» 𝑖 π‘₯ 𝑖

4 Geometrical intuition behind the dual problem
For the plane separating A and B we choose the bisecting line between specific PA and PB that have minimal distance π‘₯ 1 π‘₯ 2 βˆ₯ 𝑃 𝐴 βˆ’ 𝑃 𝐡 βˆ₯ 2 A Here, for PA , a single coeff. is non-zero And for PB , two coeffs. are non-zero B PA PB π‘–βˆˆπ΄ Ξ» 𝑖 =1 Ξ» 𝑖 β‰₯0,π‘–βˆˆπ΄ π‘–βˆˆπ΅ Ξ» 𝑖 =1 Ξ» 𝑖 β‰₯0,π‘–βˆˆπ΅ 𝑃 𝐴 = π‘–βˆˆπ΄ Ξ» 𝑖 π‘₯ 𝑖 𝑃 𝐡 = π‘–βˆˆπ΅ Ξ» 𝑖 π‘₯ 𝑖

5 Geometrical intuition behind the dual problem
π‘₯ 1 π‘₯ 2 π‘šπ‘–π‘› βˆ₯ 𝑃 𝐴 βˆ’ 𝑃 𝐡 βˆ₯ 2 = π‘šπ‘–π‘› π›Œ βˆ₯ π‘–βˆˆπ΄ Ξ» 𝑖 𝐱 𝐒 βˆ’ π‘—βˆˆπ΅ Ξ» 𝑗 𝐱 𝐣 βˆ₯ 2 A = π‘šπ‘–π‘› π›Œ βˆ₯ π‘–βˆˆπ΄ Ξ» 𝑖 𝑦 𝑖 𝐱 𝐒 + π‘—βˆˆπ΅ Ξ» 𝑗 𝑦 𝑗 𝐱 𝐣 βˆ₯ 2 +1 -1 B PA = π‘šπ‘–π‘› π›Œ 𝑖,𝑗 Ξ» 𝑖 Ξ» 𝑗 𝑦 𝑖 𝑦 𝑗 ξ‚ž 𝐱 𝐒 . 𝐱 𝐣 ξ‚Ÿ PB π‘–βˆˆπ΄ Ξ» 𝑖 =1 Ξ» 𝑖 β‰₯0,π‘–βˆˆπ΄ π‘–βˆˆπ΅ Ξ» 𝑖 =1 Ξ» 𝑖 β‰₯0,π‘–βˆˆπ΅ 𝑃 𝐴 = π‘–βˆˆπ΄ Ξ» 𝑖 π‘₯ 𝑖 𝑃 𝐡 = π‘–βˆˆπ΅ Ξ» 𝑖 π‘₯ 𝑖

6 Going from the Primal to the Dual
π‘šπ‘–π‘› 𝑀 βˆ₯𝐰βˆ₯ 2 𝑠.𝑑.: 𝑦 𝑖 𝐰. 𝐱 𝐒 +𝑏 β‰₯1,𝑖=1,…,π‘š Constrained Optimization Problem label input A convex optimization problem (objective and constraints) Unique solution if datapoints are linearly separable

7 Going from the Primal to the Dual
π‘šπ‘–π‘› 𝑀 βˆ₯𝐰βˆ₯ 2 𝑠.𝑑.: 𝑦 𝑖 𝐰. 𝐱 𝐒 +𝑏 β‰₯1,𝑖=1,…,π‘š Constrained Optimization Problem label input 𝐿 𝐰,𝑏,π›Œ = 1 2 βˆ₯𝐰βˆ₯ 2 βˆ’ 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 𝐰. 𝐱 𝐒 +𝑏 βˆ’1 Lagrange:

8 Going from the Primal to the Dual
π‘šπ‘–π‘› 𝑀 βˆ₯𝐰βˆ₯ 2 𝑠.𝑑.: 𝑦 𝑖 𝐰. 𝐱 𝐒 +𝑏 β‰₯1,𝑖=1,…,π‘š Constrained Optimization Problem label input 𝐿 𝐰,𝑏,π›Œ = 1 2 βˆ₯𝐰βˆ₯ 2 βˆ’ 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 𝐰. 𝐱 𝐒 +𝑏 βˆ’1 Lagrange: KKT conditions: βˆ‡ 𝐰 𝐿 𝐰,𝑏,π›Œ =π°βˆ’ 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 𝐱 𝐒 =0 βˆ‡ 𝑏 𝐿 𝐰,𝑏,π›Œ =βˆ’ 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 =0

9 Going from the Primal to the Dual
π‘šπ‘–π‘› 𝑀 βˆ₯𝐰βˆ₯ 2 𝑠.𝑑.: 𝑦 𝑖 𝐰. 𝐱 𝐒 +𝑏 β‰₯1,𝑖=1,…,π‘š Constrained Optimization Problem label input 𝐿 𝐰,𝑏,π›Œ = 1 2 βˆ₯𝐰βˆ₯ 2 βˆ’ 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 𝐰. 𝐱 𝐒 +𝑏 βˆ’1 Lagrange: KKT conditions: Plus KKT: βˆ‡ 𝐰 𝐿 𝐰,𝑏,π›Œ =π°βˆ’ 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 𝐱 𝐒 =0 βˆ‡ 𝑏 𝐿 𝐰,𝑏,π›Œ =βˆ’ 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 =0 𝐰= 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 𝐱 𝐒 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 =0 Ξ» 𝑖 𝑦 𝑖 𝐰. 𝐱 𝐒 +𝑏 βˆ’1 =0,𝑖=1,...,π‘š

10 Going from the Primal to the Dual
π‘šπ‘–π‘› 𝑀 βˆ₯𝐰βˆ₯ 2 𝑠.𝑑.: 𝑦 𝑖 𝐰. 𝐱 𝐒 +𝑏 β‰₯1,𝑖=1,…,π‘š Constrained Optimization Problem label input 𝐿 𝐰,𝑏,π›Œ = 1 2 βˆ₯𝐰βˆ₯ 2 βˆ’ 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 𝐰. 𝐱 𝐒 +𝑏 βˆ’1 Lagrange: 1 2 βˆ₯ 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 𝐱 𝐒 βˆ₯ 2 βˆ’ 𝑖,𝑗=1 π‘š Ξ» 𝑖 Ξ» 𝑗 𝑦 𝑖 𝑦 𝑗 𝐱 𝐒 . 𝐱 𝐣 βˆ’ 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 𝑏+ 𝑖=1 π‘š Ξ» 𝑖 𝐿 𝐰,𝑏,π›Œ = 𝐰= 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 𝐱 𝐒 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 =0 βˆ’ 1 2 𝑖,𝑗=1 π‘š Ξ» 𝑖 Ξ» 𝑗 𝑦 𝑖 𝑦 𝑗 𝐱 𝐒 . 𝐱 𝐣 + 𝑖=1 π‘š Ξ» 𝑖 𝐿 𝐰,𝑏,π›Œ =

11 Going from the Primal to the Dual
π‘šπ‘–π‘› 𝑀 βˆ₯𝐰βˆ₯ 2 𝑠.𝑑.: 𝑦 𝑖 𝐰. 𝐱 𝐒 +𝑏 β‰₯1,𝑖=1,…,π‘š Constrained Optimization Problem label input π‘šπ‘Žπ‘₯ π›Œ βˆ’ 1 2 𝑖,𝑗=1 π‘š Ξ» 𝑖 Ξ» 𝑗 𝑦 𝑖 𝑦 𝑗 𝐱 𝐒 . 𝐱 𝐣 + 𝑖=1 π‘š Ξ» 𝑖 𝑠.𝑑.: Ξ» 𝑖 β‰₯0,𝑖=1,...,π‘š 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 =0 Equivalent Dual Problem:

12 Solution to the Dual Problem
𝑠𝑖𝑔𝑛 β„Ž 𝐱 =𝑠𝑖𝑔𝑛 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 𝐱 𝐒 .𝐱 +𝑏 𝑏= 𝑦 𝑖 βˆ’ 𝑗=1 π‘š Ξ» 𝑗 𝑦 𝑗 𝐱 𝐣 . 𝐱 𝐒 ,𝑖=1,...,π‘š The Dual Problem below admits the following solution: π‘šπ‘Žπ‘₯ π›Œ βˆ’ 1 2 𝑖,𝑗=1 π‘š Ξ» 𝑖 Ξ» 𝑗 𝑦 𝑖 𝑦 𝑗 𝐱 𝐒 . 𝐱 𝐣 + 𝑖=1 π‘š Ξ» 𝑖 𝑠.𝑑.: Ξ» 𝑖 β‰₯0,𝑖=1,...,π‘š 𝑖=1 π‘š Ξ» 𝑖 𝑦 𝑖 =0 Equivalent Dual Problem:

13 What are Support Vector Machines?
Linear classifiers (Mostly) binary classifiers Supervised training Good generalization with explicit bounds

14 Main Ideas Behind Support Vector Machines
Maximal margin Dual space Linear classifiers in high-dimensional space using non-linear mapping Kernel trick

15 Quadratic Programming

16 Using the Lagrangian

17 Dual Space

18 Dual Form

19 Non-linear separation of datasets
Non-linear separation is impossible in most problems Illustration from Prof. Mohri's lecture notes

20 Non-separable datasets
Solutions: 1) Nonlinear classifiers 2) Increase dimensionality of dataset and add a non-linear mapping Π€

21 Kernel Trick β€œsimilarity measure” between 2 data samples

22 Kernel Trick Illustrated

23 Curse of Dimensionality Due to the Non-Linear Mapping

24 Positive Semi-Definite (P.S.D.) Kernels (Mercer Condition)

25 Advantages of SVM Work very well... Error bounds easy to obtain:
Generalization error small and predictable Fool-proof method: (Mostly) three kernels to choose from: Gaussian Linear and Polynomial Sigmoid Very small number of parameters to optimize

26 Limitations of SVM Size limitation:
Size of kernel matrix is quadratic with the number of training vectors Speed limitations: 1) During training: very large quadratic programming problem solved numerically Solutions: Chunking Sequential Minimal Optimization (SMO) breaks QP problem into many small QP problems solved analytically Hardware implementations 2) During testing: number of support vectors Solution: Online SVM


Download ppt "Geometrical intuition behind the dual problem"

Similar presentations


Ads by Google