1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.

1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit Chopra, Fu-Jie Huang and Mehryar Mohri

2 Outline What is behind Support Vector Machines? Constrained optimization Lagrange constraints “Dual” solution Support Vector Machines in detail Kernel trick LibSVM demo

3 Binary Classification Problem Given: Training data generated according to the distribution D Problem: Find a classifier (a function) such that it generalizes well on the test set obtained from the same distribution D Solution: Linear Approach: linear classifiers (e.g. logistic regression, Perceptron) Non Linear Approach: non-linear classifiers (e.g. Neural Networks, SVM) input label input space label space

4 Linearly Separable Data Assume that the training data is linearly separable

5 Linearly Separable Data Assume that the training data is linearly separable

6 Linearly Separable Data Assume that the training data is linearly separable abscissa on axis parallel to abscissa of origin 0 is b

7 Linearly Separable Data Assume that the training data is linearly separable Then the classifier is: Inference: abscissa on axis parallel to abscissa of origin 0 is b

8 Linearly Separable Data Assume that the training data is linearly separable Margin(in the space) Maximize margin ρ (or 2 ρ) so that: For the closest points:

9 input Optimization Problem A Constrained Optimization Problem Equivalent to maximizing the margin A convex optimization problem: Objective is convex Constraints are affine hence convex Therefore, admits an unique optimum at w 0 label

10 Optimization Problem constraints Compare: With: objective regularization energy/errors

11 Optimization: Some Theory The problem: objective function inequality constraints equality constraints Solution of problem: Global (unique) optimum – if the problem is convex Local optimum – if the problem is not convex (notation change: the parameters to optimize are noted x )

12 Optimization: Some Theory Example: Standard Linear Program (LP) Example: Least Squares Solution of Linear Equations (with L 2 norm regularization of the solution x ) i.e. Ridge Regression

13 Big Picture Constrained / unconstrained optimization Hierarchy of objective function: smooth = infinitely derivable convex = has a global optimum convex non-convex smooth non-smooth SVM NN

14 Introducing the concept of Lagrange function on a toy example

15 Toy Example: Equality Constraint Example 1: At Optimal Solution:

16 is not an optimal solution, if there exists such that Using first order Taylor's expansion Such an can exist only when and are not parallel Toy Example: Equality Constraint

17 Thus we have The Lagrangian This is just a necessary (not a sufficient) condition” x solution implies Thus at the solution Lagrange multiplier or dual variable for Toy Example: Equality Constraint

18 Toy Example: Inequality Constraint Example 2:

19 Toy Example: Inequality Constraint is not an optimal solution, if there exists such that Using first order Taylor's expansion

20 Toy Example: Inequality Constraint Case 1: Inactive constraint Any sufficiently small s as long as Thus Case 2: Active constraint In that case, s = 0 when:

21 Thus we have the Lagrange function (as before) The optimality conditions and Lagrange multiplier or dual variable for Toy Example: Inequality Constraint Complementarity condition eitheror (active)(inactive)

22 Same Concepts in a More General Setting

23 Lagrange Function The Problem Standard tool for constrained optimization: the Lagrange Function dual variables or Lagrange multipliers m inequality constraints p equality constraints objective function

24 Lagrange Dual Function Defined, for as the minimum value of the Lagrange function over x m inequality constraints p equality constraints

25 unsatisfied Lagrange Dual Function Interpretation of Lagrange dual function: Writing the original problem as unconstrained problem but with hard indicators (penalties) where indicator functions unsatisfied satisfied

26 Lagrange Dual Function Interpretation of Lagrange dual function: The Lagrange multipliers in Lagrange dual function can be seen as “softer” version of indicator (penalty) functions.

27 Sufficient Condition If is a saddle point, i.e. if... then is a solution of the primal problem

28 Lagrange Dual Problem Lagrange dual function gives a lower bound on the optimal value of the problem. We seek the “best” lower bound to minimize the objective: The dual optimal value and solution: The Lagrange dual problem is convex even if the original problem is not.

29 Primal / Dual Problems Primal problem: Dual problem:

30 Optimality Conditions: First Order Karush-Kuhn-Tucker (KKT) conditions If the strong duality holds, then at optimality: KKT conditions are ➔ necessary in general (local optimum) ➔ necessary and sufficient in case of convex problems (global optimum)

1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.

Similar presentations

Presentation on theme: "1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.

Similar presentations

Presentation on theme: "1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit."— Presentation transcript:

Similar presentations

About project

Feedback