1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.

Slides:



Advertisements
Similar presentations
3.6 Support Vector Machines
Advertisements

Introduction to Support Vector Machines (SVM)
Support Vector Machine
Lecture 9 Support Vector Machines
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
Pattern Recognition and Machine Learning
Support vector machine
Separating Hyperplanes
Support Vector Machines
Support Vector Machines (and Kernel Methods in general)
Optimization in Engineering Design 1 Lagrange Multipliers.
Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
SVM QP & Midterm Review Rob Hall 10/14/ This Recitation Review of Lagrange multipliers (basic undergrad calculus) Getting to the dual for a QP.
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machine (SVM) Classification
Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and.
Binary Classification Problem Learn a Classifier from the Training Set
Support Vector Machines and Kernel Methods
Unconstrained Optimization Problem
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Machines
Lecture 10: Support Vector Machines
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
An Introduction to Support Vector Machines Martin Law.
Support Vector Machines
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Machine Learning Week 4 Lecture 1. Hand In Data Is coming online later today. I keep test set with approx test images That will be your real test.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
An Introduction to Support Vector Machines (M. Law)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
CS 478 – Tools for Machine Learning and Data Mining SVM.
An Introduction to Support Vector Machine (SVM)
Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 New Horizon in Machine Learning — Support Vector Machine for non-Parametric Learning Zhao Lu, Ph.D. Associate Professor Department of Electrical Engineering,
Support Vector Machines
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines Tao Department of computer science University of Illinois.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Linear Separation and Margins. Non-Separable and.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Support vector machines
PREDICT 422: Practical Machine Learning
Support Vector Machine
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Support Vector Machines
Geometrical intuition behind the dual problem
Support Vector Machines
Support Vector Machines
Kernels Usman Roshan.
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Statistical Learning Dong Liu Dept. EEIS, USTC.
CSSE463: Image Recognition Day 14
Support vector machines
Machine Learning Week 3.
Lecture 18. SVM (II): Non-separable Cases
Usman Roshan CS 675 Machine Learning
Support vector machines
Support vector machines
CS5321 Numerical Optimization
Support Vector Machines Kernels
Presentation transcript:

1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit Chopra, Fu-Jie Huang and Mehryar Mohri

2 Outline What is behind Support Vector Machines? Constrained optimization Lagrange constraints “Dual” solution Support Vector Machines in detail Kernel trick LibSVM demo

3 Binary Classification Problem Given: Training data generated according to the distribution D Problem: Find a classifier (a function) such that it generalizes well on the test set obtained from the same distribution D Solution: Linear Approach: linear classifiers (e.g. logistic regression, Perceptron) Non Linear Approach: non-linear classifiers (e.g. Neural Networks, SVM) input label input space label space

4 Linearly Separable Data Assume that the training data is linearly separable

5 Linearly Separable Data Assume that the training data is linearly separable

6 Linearly Separable Data Assume that the training data is linearly separable abscissa on axis parallel to abscissa of origin 0 is b

7 Linearly Separable Data Assume that the training data is linearly separable Then the classifier is: Inference: abscissa on axis parallel to abscissa of origin 0 is b

8 Linearly Separable Data Assume that the training data is linearly separable Margin(in the space) Maximize margin ρ (or 2 ρ) so that: For the closest points:

9 input Optimization Problem A Constrained Optimization Problem Equivalent to maximizing the margin A convex optimization problem: Objective is convex Constraints are affine hence convex Therefore, admits an unique optimum at w 0 label

10 Optimization Problem constraints Compare: With: objective regularization energy/errors

11 Optimization: Some Theory The problem: objective function inequality constraints equality constraints Solution of problem: Global (unique) optimum – if the problem is convex Local optimum – if the problem is not convex (notation change: the parameters to optimize are noted x )

12 Optimization: Some Theory Example: Standard Linear Program (LP) Example: Least Squares Solution of Linear Equations (with L 2 norm regularization of the solution x ) i.e. Ridge Regression

13 Big Picture Constrained / unconstrained optimization Hierarchy of objective function: smooth = infinitely derivable convex = has a global optimum convex non-convex smooth non-smooth SVM NN

14 Introducing the concept of Lagrange function on a toy example

15 Toy Example: Equality Constraint Example 1: At Optimal Solution:

16 is not an optimal solution, if there exists such that Using first order Taylor's expansion Such an can exist only when and are not parallel Toy Example: Equality Constraint

17 Thus we have The Lagrangian This is just a necessary (not a sufficient) condition” x solution implies Thus at the solution Lagrange multiplier or dual variable for Toy Example: Equality Constraint

18 Toy Example: Inequality Constraint Example 2:

19 Toy Example: Inequality Constraint is not an optimal solution, if there exists such that Using first order Taylor's expansion

20 Toy Example: Inequality Constraint Case 1: Inactive constraint Any sufficiently small s as long as Thus Case 2: Active constraint In that case, s = 0 when:

21 Thus we have the Lagrange function (as before) The optimality conditions and Lagrange multiplier or dual variable for Toy Example: Inequality Constraint Complementarity condition eitheror (active)(inactive)

22 Same Concepts in a More General Setting

23 Lagrange Function The Problem Standard tool for constrained optimization: the Lagrange Function dual variables or Lagrange multipliers m inequality constraints p equality constraints objective function

24 Lagrange Dual Function Defined, for as the minimum value of the Lagrange function over x m inequality constraints p equality constraints

25 unsatisfied Lagrange Dual Function Interpretation of Lagrange dual function: Writing the original problem as unconstrained problem but with hard indicators (penalties) where indicator functions unsatisfied satisfied

26 Lagrange Dual Function Interpretation of Lagrange dual function: The Lagrange multipliers in Lagrange dual function can be seen as “softer” version of indicator (penalty) functions.

27 Sufficient Condition If is a saddle point, i.e. if... then is a solution of the primal problem

28 Lagrange Dual Problem Lagrange dual function gives a lower bound on the optimal value of the problem. We seek the “best” lower bound to minimize the objective: The dual optimal value and solution: The Lagrange dual problem is convex even if the original problem is not.

29 Primal / Dual Problems Primal problem: Dual problem:

30 Optimality Conditions: First Order Karush-Kuhn-Tucker (KKT) conditions If the strong duality holds, then at optimality: KKT conditions are ➔ necessary in general (local optimum) ➔ necessary and sufficient in case of convex problems (global optimum)