Geometrical intuition behind the dual problem

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Lecture 9 Support Vector Machines
ECG Signal processing (2)
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
Classification / Regression Support Vector Machines
CHAPTER 10: Linear Discrimination
Support Vector Machines
Support vector machine
Separating Hyperplanes
Support Vector Machines
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Binary Classification Problem Learn a Classifier from the Training Set
Support Vector Machines and Kernel Methods
Support Vector Machines
CS 4700: Foundations of Artificial Intelligence
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
An Introduction to Support Vector Machines Martin Law.
Support Vector Machines
Machine Learning Week 4 Lecture 1. Hand In Data Is coming online later today. I keep test set with approx test images That will be your real test.
Support Vector Machine & Image Classification Applications
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
An Introduction to Support Vector Machines (M. Law)
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:
Support Vector Machines Exercise solutions Ata Kaban The University of Birmingham.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Support vector machines
PREDICT 422: Practical Machine Learning
Omer Boehm A tutorial about SVM Omer Boehm
Support Vector Machines
An Introduction to Support Vector Machines
Kernels Usman Roshan.
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Support Vector Machines
Statistical Learning Dong Liu Dept. EEIS, USTC.
CSSE463: Image Recognition Day 14
COSC 4335: Other Classification Techniques
The following slides are taken from:
Support vector machines
Machine Learning Week 3.
Usman Roshan CS 675 Machine Learning
Support vector machines
Support vector machines
COSC 4368 Machine Learning Organization
University of Wisconsin - Madison
University of Wisconsin - Madison
Minimal Kernel Classifiers
Presentation transcript:

Geometrical intuition behind the dual problem Based on: KP Bennett, EJ Bredensteiner, “Duality and Geometry in SVM Classifiers”, Proceedings of the International Conference on Machine Learning, 2000

Geometrical intuition behind the dual problem Convex hulls for two sets of points A and B 𝑥 1 𝑥 2 A B

Geometrical intuition behind the dual problem Convex hulls for two sets of points A and B defined by all possible points PA and PB 𝑥 1 𝑥 2 A PA B PB 𝑖∈𝐴 λ 𝑖 =1 λ 𝑖 ≥0,𝑖∈𝐴 𝑖∈𝐵 λ 𝑖 =1 λ 𝑖 ≥0,𝑖∈𝐵 𝑃 𝐴 = 𝑖∈𝐴 λ 𝑖 𝑥 𝑖 𝑃 𝐵 = 𝑖∈𝐵 λ 𝑖 𝑥 𝑖

Geometrical intuition behind the dual problem For the plane separating A and B we choose the bisecting line between specific PA and PB that have minimal distance 𝑥 1 𝑥 2 ∥ 𝑃 𝐴 − 𝑃 𝐵 ∥ 2 A Here, for PA , a single coeff. is non-zero And for PB , two coeffs. are non-zero B PA PB 𝑖∈𝐴 λ 𝑖 =1 λ 𝑖 ≥0,𝑖∈𝐴 𝑖∈𝐵 λ 𝑖 =1 λ 𝑖 ≥0,𝑖∈𝐵 𝑃 𝐴 = 𝑖∈𝐴 λ 𝑖 𝑥 𝑖 𝑃 𝐵 = 𝑖∈𝐵 λ 𝑖 𝑥 𝑖

Geometrical intuition behind the dual problem 𝑥 1 𝑥 2 𝑚𝑖𝑛 ∥ 𝑃 𝐴 − 𝑃 𝐵 ∥ 2 = 𝑚𝑖𝑛 𝛌 ∥ 𝑖∈𝐴 λ 𝑖 𝐱 𝐢 − 𝑗∈𝐵 λ 𝑗 𝐱 𝐣 ∥ 2 A = 𝑚𝑖𝑛 𝛌 ∥ 𝑖∈𝐴 λ 𝑖 𝑦 𝑖 𝐱 𝐢 + 𝑗∈𝐵 λ 𝑗 𝑦 𝑗 𝐱 𝐣 ∥ 2 +1 -1 B PA = 𝑚𝑖𝑛 𝛌 𝑖,𝑗 λ 𝑖 λ 𝑗 𝑦 𝑖 𝑦 𝑗  𝐱 𝐢 . 𝐱 𝐣  PB 𝑖∈𝐴 λ 𝑖 =1 λ 𝑖 ≥0,𝑖∈𝐴 𝑖∈𝐵 λ 𝑖 =1 λ 𝑖 ≥0,𝑖∈𝐵 𝑃 𝐴 = 𝑖∈𝐴 λ 𝑖 𝑥 𝑖 𝑃 𝐵 = 𝑖∈𝐵 λ 𝑖 𝑥 𝑖

Going from the Primal to the Dual 𝑚𝑖𝑛 𝑤 1 2 ∥𝐰∥ 2 𝑠.𝑡.: 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 ≥1,𝑖=1,…,𝑚 Constrained Optimization Problem label input A convex optimization problem (objective and constraints) Unique solution if datapoints are linearly separable

Going from the Primal to the Dual 𝑚𝑖𝑛 𝑤 1 2 ∥𝐰∥ 2 𝑠.𝑡.: 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 ≥1,𝑖=1,…,𝑚 Constrained Optimization Problem label input 𝐿 𝐰,𝑏,𝛌 = 1 2 ∥𝐰∥ 2 − 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 −1 Lagrange:

Going from the Primal to the Dual 𝑚𝑖𝑛 𝑤 1 2 ∥𝐰∥ 2 𝑠.𝑡.: 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 ≥1,𝑖=1,…,𝑚 Constrained Optimization Problem label input 𝐿 𝐰,𝑏,𝛌 = 1 2 ∥𝐰∥ 2 − 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 −1 Lagrange: KKT conditions: ∇ 𝐰 𝐿 𝐰,𝑏,𝛌 =𝐰− 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐱 𝐢 =0 ∇ 𝑏 𝐿 𝐰,𝑏,𝛌 =− 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 =0

Going from the Primal to the Dual 𝑚𝑖𝑛 𝑤 1 2 ∥𝐰∥ 2 𝑠.𝑡.: 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 ≥1,𝑖=1,…,𝑚 Constrained Optimization Problem label input 𝐿 𝐰,𝑏,𝛌 = 1 2 ∥𝐰∥ 2 − 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 −1 Lagrange: KKT conditions: Plus KKT: ∇ 𝐰 𝐿 𝐰,𝑏,𝛌 =𝐰− 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐱 𝐢 =0 ∇ 𝑏 𝐿 𝐰,𝑏,𝛌 =− 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 =0 𝐰= 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐱 𝐢 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 =0 λ 𝑖 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 −1 =0,𝑖=1,...,𝑚

Going from the Primal to the Dual 𝑚𝑖𝑛 𝑤 1 2 ∥𝐰∥ 2 𝑠.𝑡.: 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 ≥1,𝑖=1,…,𝑚 Constrained Optimization Problem label input 𝐿 𝐰,𝑏,𝛌 = 1 2 ∥𝐰∥ 2 − 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 −1 Lagrange: 1 2 ∥ 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐱 𝐢 ∥ 2 − 𝑖,𝑗=1 𝑚 λ 𝑖 λ 𝑗 𝑦 𝑖 𝑦 𝑗 𝐱 𝐢 . 𝐱 𝐣 − 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝑏+ 𝑖=1 𝑚 λ 𝑖 𝐿 𝐰,𝑏,𝛌 = 𝐰= 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐱 𝐢 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 =0 − 1 2 𝑖,𝑗=1 𝑚 λ 𝑖 λ 𝑗 𝑦 𝑖 𝑦 𝑗 𝐱 𝐢 . 𝐱 𝐣 + 𝑖=1 𝑚 λ 𝑖 𝐿 𝐰,𝑏,𝛌 =

Going from the Primal to the Dual 𝑚𝑖𝑛 𝑤 1 2 ∥𝐰∥ 2 𝑠.𝑡.: 𝑦 𝑖 𝐰. 𝐱 𝐢 +𝑏 ≥1,𝑖=1,…,𝑚 Constrained Optimization Problem label input 𝑚𝑎𝑥 𝛌 − 1 2 𝑖,𝑗=1 𝑚 λ 𝑖 λ 𝑗 𝑦 𝑖 𝑦 𝑗 𝐱 𝐢 . 𝐱 𝐣 + 𝑖=1 𝑚 λ 𝑖 𝑠.𝑡.: λ 𝑖 ≥0,𝑖=1,...,𝑚 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 =0 Equivalent Dual Problem:

Solution to the Dual Problem 𝑠𝑖𝑔𝑛 ℎ 𝐱 =𝑠𝑖𝑔𝑛 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 𝐱 𝐢 .𝐱 +𝑏 𝑏= 𝑦 𝑖 − 𝑗=1 𝑚 λ 𝑗 𝑦 𝑗 𝐱 𝐣 . 𝐱 𝐢 ,𝑖=1,...,𝑚 The Dual Problem below admits the following solution: 𝑚𝑎𝑥 𝛌 − 1 2 𝑖,𝑗=1 𝑚 λ 𝑖 λ 𝑗 𝑦 𝑖 𝑦 𝑗 𝐱 𝐢 . 𝐱 𝐣 + 𝑖=1 𝑚 λ 𝑖 𝑠.𝑡.: λ 𝑖 ≥0,𝑖=1,...,𝑚 𝑖=1 𝑚 λ 𝑖 𝑦 𝑖 =0 Equivalent Dual Problem:

What are Support Vector Machines? Linear classifiers (Mostly) binary classifiers Supervised training Good generalization with explicit bounds

Main Ideas Behind Support Vector Machines Maximal margin Dual space Linear classifiers in high-dimensional space using non-linear mapping Kernel trick

Quadratic Programming

Using the Lagrangian

Dual Space

Dual Form

Non-linear separation of datasets Non-linear separation is impossible in most problems Illustration from Prof. Mohri's lecture notes

Non-separable datasets Solutions: 1) Nonlinear classifiers 2) Increase dimensionality of dataset and add a non-linear mapping Ф

Kernel Trick “similarity measure” between 2 data samples

Kernel Trick Illustrated

Curse of Dimensionality Due to the Non-Linear Mapping

Positive Semi-Definite (P.S.D.) Kernels (Mercer Condition)

Advantages of SVM Work very well... Error bounds easy to obtain: Generalization error small and predictable Fool-proof method: (Mostly) three kernels to choose from: Gaussian Linear and Polynomial Sigmoid Very small number of parameters to optimize

Limitations of SVM Size limitation: Size of kernel matrix is quadratic with the number of training vectors Speed limitations: 1) During training: very large quadratic programming problem solved numerically Solutions: Chunking Sequential Minimal Optimization (SMO) breaks QP problem into many small QP problems solved analytically Hardware implementations 2) During testing: number of support vectors Solution: Online SVM