Download presentation
Presentation is loading. Please wait.
Published byBartholomew Hawkins Modified over 8 years ago
1
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit Chopra, Fu-Jie Huang and Mehryar Mohri
2
2 Outline What is behind Support Vector Machines? Constrained optimization Lagrange constraints “Dual” solution Support Vector Machines in detail Kernel trick LibSVM demo
3
3 Binary Classification Problem Given: Training data generated according to the distribution D Problem: Find a classifier (a function) such that it generalizes well on the test set obtained from the same distribution D Solution: Linear Approach: linear classifiers (e.g. logistic regression, Perceptron) Non Linear Approach: non-linear classifiers (e.g. Neural Networks, SVM) input label input space label space
4
4 Linearly Separable Data Assume that the training data is linearly separable
5
5 Linearly Separable Data Assume that the training data is linearly separable
6
6 Linearly Separable Data Assume that the training data is linearly separable abscissa on axis parallel to abscissa of origin 0 is b
7
7 Linearly Separable Data Assume that the training data is linearly separable Then the classifier is: Inference: abscissa on axis parallel to abscissa of origin 0 is b
8
8 Linearly Separable Data Assume that the training data is linearly separable Margin(in the space) Maximize margin ρ (or 2 ρ) so that: For the closest points:
9
9 input Optimization Problem A Constrained Optimization Problem Equivalent to maximizing the margin A convex optimization problem: Objective is convex Constraints are affine hence convex Therefore, admits an unique optimum at w 0 label
10
10 Optimization Problem constraints Compare: With: objective regularization energy/errors
11
11 Optimization: Some Theory The problem: objective function inequality constraints equality constraints Solution of problem: Global (unique) optimum – if the problem is convex Local optimum – if the problem is not convex (notation change: the parameters to optimize are noted x )
12
12 Optimization: Some Theory Example: Standard Linear Program (LP) Example: Least Squares Solution of Linear Equations (with L 2 norm regularization of the solution x ) i.e. Ridge Regression
13
13 Big Picture Constrained / unconstrained optimization Hierarchy of objective function: smooth = infinitely derivable convex = has a global optimum convex non-convex smooth non-smooth SVM NN
14
14 Introducing the concept of Lagrange function on a toy example
15
15 Toy Example: Equality Constraint Example 1: At Optimal Solution:
16
16 is not an optimal solution, if there exists such that Using first order Taylor's expansion Such an can exist only when and are not parallel Toy Example: Equality Constraint
17
17 Thus we have The Lagrangian This is just a necessary (not a sufficient) condition” x solution implies Thus at the solution Lagrange multiplier or dual variable for Toy Example: Equality Constraint
18
18 Toy Example: Inequality Constraint Example 2:
19
19 Toy Example: Inequality Constraint is not an optimal solution, if there exists such that Using first order Taylor's expansion
20
20 Toy Example: Inequality Constraint Case 1: Inactive constraint Any sufficiently small s as long as Thus Case 2: Active constraint In that case, s = 0 when:
21
21 Thus we have the Lagrange function (as before) The optimality conditions and Lagrange multiplier or dual variable for Toy Example: Inequality Constraint Complementarity condition eitheror (active)(inactive)
22
22 Same Concepts in a More General Setting
23
23 Lagrange Function The Problem Standard tool for constrained optimization: the Lagrange Function dual variables or Lagrange multipliers m inequality constraints p equality constraints objective function
24
24 Lagrange Dual Function Defined, for as the minimum value of the Lagrange function over x m inequality constraints p equality constraints
25
25 unsatisfied Lagrange Dual Function Interpretation of Lagrange dual function: Writing the original problem as unconstrained problem but with hard indicators (penalties) where indicator functions unsatisfied satisfied
26
26 Lagrange Dual Function Interpretation of Lagrange dual function: The Lagrange multipliers in Lagrange dual function can be seen as “softer” version of indicator (penalty) functions.
27
27 Sufficient Condition If is a saddle point, i.e. if... then is a solution of the primal problem
28
28 Lagrange Dual Problem Lagrange dual function gives a lower bound on the optimal value of the problem. We seek the “best” lower bound to minimize the objective: The dual optimal value and solution: The Lagrange dual problem is convex even if the original problem is not.
29
29 Primal / Dual Problems Primal problem: Dual problem:
30
30 Optimality Conditions: First Order Karush-Kuhn-Tucker (KKT) conditions If the strong duality holds, then at optimality: KKT conditions are ➔ necessary in general (local optimum) ➔ necessary and sufficient in case of convex problems (global optimum)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.