Exploiting Duality (Particularly the dual of SVM) M. Pawan Kumar VISUAL GEOMETRY GROUP.

Slides:



Advertisements
Similar presentations
3.6 Support Vector Machines
Advertisements

1 Outline relationship among topics secrets LP with upper bounds by Simplex method basic feasible solution (BFS) by Simplex method for bounded variables.
Duality for linear programming. Illustration of the notion Consider an enterprise producing r items: f k = demand for the item k =1,…, r using s components:
1 LP Duality Lecture 13: Feb Min-Max Theorems In bipartite graph, Maximum matching = Minimum Vertex Cover In every graph, Maximum Flow = Minimum.
Geometry and Theory of LP Standard (Inequality) Primal Problem: Dual Problem:
Lecture #3; Based on slides by Yinyu Ye
Introduction to Algorithms
C&O 355 Mathematical Programming Fall 2010 Lecture 15 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.
+ Convex Functions, Convex Sets and Quadratic Programs Sivaraman Balakrishnan.
A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz.
Lecture 8 – Nonlinear Programming Models Topics General formulations Local vs. global solutions Solution characteristics Convexity and convex programming.
by Rianto Adhy Sasongko Supervisor: Dr.J.C.Allwright
Thursday, April 25 Nonlinear Programming Theory Separable programming Handouts: Lecture Notes.
Easy Optimization Problems, Relaxation, Local Processing for a small subset of variables.
Basic Feasible Solutions: Recap MS&E 211. WILL FOLLOW A CELEBRATED INTELLECTUAL TEACHING TRADITION.
Separating Hyperplanes
The Most Important Concept in Optimization (minimization)  A point is said to be an optimal solution of a unconstrained minimization if there exists no.
Linear programming Thomas S. Ferguson University of California at Los Angeles Compressive Sensing Tutorial PART 3 Svetlana Avramov-Zamurovic January 29,
1 Introduction to Linear and Integer Programming Lecture 9: Feb 14.
Duality Lecture 10: Feb 9. Min-Max theorems In bipartite graph, Maximum matching = Minimum Vertex Cover In every graph, Maximum Flow = Minimum Cut Both.
Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!
Duality Dual problem Duality Theorem Complementary Slackness
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and.
Unconstrained Optimization Problem
Optimality Conditions for Nonlinear Optimization Ashish Goel Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Computer Algorithms Mathematical Programming ECE 665 Professor Maciej Ciesielski By DFG.
Tier I: Mathematical Methods of Optimization
Lecture 9 – Nonlinear Programming Models
1 OR II GSLM Outline  separable programming  quadratic programming.
Machine Learning Week 4 Lecture 1. Hand In Data Is coming online later today. I keep test set with approx test images That will be your real test.
Polyhedral Optimization Lecture 1 – Part 2 M. Pawan Kumar Slides available online
Chapter 11 Nonlinear Programming
Duality Theory 對偶理論.
1 Chapter 7 Linear Programming. 2 Linear Programming (LP) Problems Both objective function and constraints are linear. Solutions are highly structured.
3.4 Linear Programming p Optimization - Finding the minimum or maximum value of some quantity. Linear programming is a form of optimization where.
Introduction to Operations Research
Duality Theory  Every LP problem (called the ‘Primal’) has associated with another problem called the ‘Dual’.  The ‘Dual’ problem is an LP defined directly.
Discrete Optimization Lecture 2 – Part I M. Pawan Kumar Slides available online
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A A A A A A Image:
Nonlinear Programming Models
Advanced Operations Research Models Instructor: Dr. A. Seifi Teaching Assistant: Golbarg Kazemi 1.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
Optimization unconstrained and constrained Calculus part II.
An Introduction to Support Vector Machine (SVM)
Optimization - Lecture 4, Part 1 M. Pawan Kumar Slides available online
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Chapter 4 Sensitivity Analysis, Duality and Interior Point Methods.
CPSC 536N Sparse Approximations Winter 2013 Lecture 1 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA.
+ Quadratic Programming and Duality Sivaraman Balakrishnan.
Nonlinear Programming In this handout Gradient Search for Multivariable Unconstrained Optimization KKT Conditions for Optimality of Constrained Optimization.
Linear Programming Chapter 9. Interior Point Methods  Three major variants  Affine scaling algorithm - easy concept, good performance  Potential.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
3-5: Linear Programming. Learning Target I can solve linear programing problem.
Approximation Algorithms Duality My T. UF.
OR II GSLM
Approximation Algorithms based on linear programming.
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
Chapter 1. Introduction Mathematical Programming (Optimization) Problem: min/max
Lecture 8 – Nonlinear Programming Models
Chap 9. General LP problems: Duality and Infeasibility
Chapter 5. The Duality Theorem
CS5321 Numerical Optimization
1.6 Linear Programming Pg. 30.
Chapter 2. Simplex method
Linear Constrained Optimization
Constraints.
Presentation transcript:

Exploiting Duality (Particularly the dual of SVM) M. Pawan Kumar VISUAL GEOMETRY GROUP

PART I : General duality theory PART II : Solving the SVM dual Basics of Mathematical Optimization The algebra The geometry Examples General Decomposition Algorithm Good Working Set Implementation Details

Mathematical Optimization min f 0 (x) s.t. f i (x) ≤ 0 h i (x) = 0 Objective function Inequality constraints Equality constraints x is a feasible point  f i (x) ≤ 0, h i (x) = 0 x is a strictly feasible point  f i (x) < 0, h i (x) = 0 Feasible region - set of all feasible points

Convex Optimization min f 0 (x) s.t. f i (x) ≤ 0 h i (x) = 0 Objective function Inequality constraints Equality constraints Objective function is convex Feasible region is convex Convex set??? Convex function???

Convex Set x1x1 x2x2 c x 1 + (1 - c) x 2 c  [0,1] Line Segment Endpoints

Convex Set x1x1 x2x2 All points on the line segment lie within the set For all line segments with endpoints in the set

Non-Convex Set x1x1 x2x2

Examples of Convex Sets x1x1 x2x2 Line Segment

Examples of Convex Sets x1x1 x2x2 Line

Examples of Convex Sets Hyperplane a T x - b = 0

Examples of Convex Sets Halfspace a T x - b ≤ 0

Examples of Convex Sets Second-order Cone ||x|| ≤ t t x2x2 x1x1

Operations that Preserve Convexity Intersection Polyhedron / Polytope

Operations that Preserve Convexity Intersection

Operations that Preserve Convexity Affine Transformation x  Ax + b

Convex Function x f(x) Blue point always lies above red point x1x1 x2x2

Convex Function x f(x) f( c x 1 + (1 - c) x 2 ) ≤ c f(x 1 ) + (1 - c) f(x 2 ) x1x1 x2x2 Domain of f(.) has to be convex

Convex Function x f(x) x1x1 x2x2 -f(.) is concave f( c x 1 + (1 - c) x 2 ) ≤ c f(x 1 ) + (1 - c) f(x 2 )

Convex Function Once-differentiable functions f(y) +  f(y) T (x - y) ≤ f(x) x f(x) (y,f(y)) f(y) +  f(y) T (x - y) Twice-differentiable functions  2 f(x) 0

Convex Function and Convex Sets x f(x) Epigraph of a convex function is a convex set

Examples of Convex Functions Linear function a T x p-Norm functions (x 1 p + x 2 p + x n p ) 1/p, p ≥ 1 Quadratic functions x T Q x Q 0

Operations that Preserve Convexity Non-negative weighted sum x f 1 (x) w1w1 x f 2 (x) + w 2 + …. x T Q x + a T x + b Q 0

Operations that Preserve Convexity Pointwise maximum x f 1 (x) max x f 2 (x), Pointwise minimum of concave functions is concave

Convex Optimization min f 0 (x) s.t. f i (x) ≤ 0 h i (x) = 0 Objective function Inequality constraints Equality constraints Objective function is convex  Feasible region is convex 

PART I : General duality theory PART II : Solving the SVM dual Basics of Mathematical Optimization The algebra The geometry Examples General Decomposition Algorithm Good Working Set Implementation Details

Lagrangian min f 0 (x) s.t. f i (x) ≤ 0 h i (x) = 0 f0(x)f0(x) + ∑ i i f i (x) i ≥ 0 + ∑ i i h i (x) L(x,, )

Lagrangian Dual + ∑ i i f i (x) i ≥ 0 + ∑ i i h i (x) L(x,, ) f0(x)f0(x) min x L(x,, )g(, ) x belongs to intersection of domains of f 0, f i and h i x  Dx  D

Lagrangian Dual + ∑ i i f i (x) i ≥ 0 + ∑ i i h i (x) f0(x)f0(x) min x g(, ) = Pointwise minimum of affine (concave) functions Dual function is concave

Lagrangian Dual + ∑ i i f i (x) i ≥ 0 + ∑ i i h i (x) f0(x)f0(x) min f 0 (x) s.t. f i (x) ≤ 0 h i (x) = 0 p* = min x g(, ) = ≥ For all (, )

The Dual Problem The lower bound could be far from p* Best lower bound? + ∑ i i f i (x) i ≥ 0 + ∑ i i h i (x) f0(x)f0(x) min x max, Easy to obtain d* = p* - d* ≥ 0Duality Gap

The Geometric Interpretation (f i (x), h i (x), f 0 (x)) uvt x  Dx  D G t G u p*

The Geometric Interpretation (u, v, t) G t u (,, 1) T ≥ g(, ) p* g( ) d*

The Duality Gap + ∑ i i f i (x) i ≥ 0 + ∑ i i h i (x) f0(x)f0(x) min f 0 (x) s.t. f i (x) ≤ 0 h i (x) = 0 p* = max, min x d* = ≥

The Duality Gap p* - d*Duality Gap p* - d* ≥ 0 Weak Duality p* - d* = 0 Strong Duality

Problem is convex There exists a strictly feasible point Slater’s Condition Taken care of by most solvers

At Strong Duality f 0 (x*) = g( *, *) = min x ( f 0 (x) + ∑ i i *f i (x) + ∑ i i *h i (x) ) ≤ f 0 (x*) + ∑ i i *f i (x*) + ∑ i i *h i (x*) ≤ f 0 (x*) Inequalities hold with equality x* minimizes the Lagrangian at ( *, *)

At Strong Duality f 0 (x*) = g( *, *) = min x ( f 0 (x) + ∑ i i *f i (x) + ∑ i i *h i (x) ) ≤ f 0 (x*) + ∑ i i *f i (x*) + ∑ i i *h i (x*) ≤ f 0 (x*) Inequalities hold with equality i *f i (x*) = 0

KKT Conditions f i (x*) ≤ 0h i (x*) = 0 i * ≥ 0 Primal feasible Dual feasible i *f i (x*) = 0 Complementary Slackness  f 0 (x*) + ∑ i i *  f i (x*) + ∑ i i *  h i (x*) = 0 Necessary conditions for strong duality

KKT Conditions f i (x*) ≤ 0h i (x*) = 0 i * ≥ 0 Primal feasible Dual feasible i *f i (x*) = 0 Complementary Slackness  f 0 (x*) + ∑ i i *  f i (x*) + ∑ i i *  h i (x*) = 0 Necessary and sufficient for convex problems

PART I : General duality theory PART II : Solving the SVM dual Basics of Mathematical Optimization The algebra The geometry Examples General Decomposition Algorithm Good Working Set Implementation Details

Linear Program min c T x s.t. A x = b x ≥ 0

QCQP min (1/2)x T P 0 x + q 0 x + r 0 s.t. (1/2)x T P i x + q i x + r i

Entropy Maximization min ∑ i x i log(x i ) s.t. A x ≤ b ∑ i x i = 1

The SVM Framework Points X = {x i } Labels y= {y i } w T x + b = 0 y i  {-1, +1} y i (w T x i + b) ≥ 1 -  i  i ≥ 0 min C   i 2/||w|| 1/2 w T w + Convex Quadratic Program

The SVM Dual min (1/2)  T Q  -  T 1 s.t.  T y = 0 0 ≤  ≤ C1 Q ij = y i y j x i T x j = y i y j k(x i,x j )

PART I : General duality theory PART II : Solving the SVM dual Basics of Mathematical Optimization The algebra The geometry Examples General Decomposition Algorithm Good Working Set Implementation Details

The SVM Dual min (1/2)  T Q  -  T 1 s.t.  T y = 0 0 ≤  ≤ C1 Choose ‘q’ variables. Fix the rest. Change unfixed variables, satisfying constraints, to decrease objective function (small problem). Repeat. Minimum ‘q’ ???Till When ??? Best set B?

KKT Conditions min (1/2)  T Q  -  T 1 s.t.  T y = 0 0 ≤  ≤ C1 eq i lo i up -1 + Q  + eq y - lo + up = 0 i lo  i = 0 i up (  i - C) = 0 i lo ≥ 0 i up ≥ 0 g(  )

KKT Conditions -1 + g(  ) + eq y - lo + up = 0 i lo  i = 0 i up (  i - C) = 0 i lo ≥ 0 i up ≥ 0 For all 0 <  i < C -1 + g i (  ) + eq y i = 0 For all  i = g i (  ) + eq y i - i lo = 0 For all  i = C -1 + g i (  ) + eq y i + i up = 0

KKT Conditions -1 + g(  ) + eq y - lo + up = 0 i lo  i = 0 i up (  i - C) = 0 i lo ≥ 0 i up ≥ 0 g i (  ) = y i ∑ j  j y j k(x i,x j ) g i t (  ) = g i (  t-1 ) + y i ∑ j  B (  j t -  j t-1 )y j k(x i,x j ) Best set of ‘q’ variables (Working set)

PART I : General duality theory PART II : Solving the SVM dual Basics of Mathematical Optimization The algebra The geometry Examples General Decomposition Algorithm Good Working Set Implementation Details

Working Set g i (  ) = y i ∑ j  j y j k(x i,x j ) d : feasible direction of descent  t =  t-1 + d Choose steepest descent direction First order approximation of objective (-1 + g(  t-1 )) T d

Working Set min d (-1 + g(  t-1 )) T d s.t. y T d = 0 d i ≥ 0 if  i t-1 = 0 d i ≤ 0 if  i t-1 = C Card{d} = q -1 ≤ d i ≤ 1

Working Set s i = y i (-1 + g i (  t-1 )) Sort according decreasing values of s i Choose q/2 from top if 0 <  i t-1 < C, or d i = -y i satisfies feasibility of direction Choose q/2 from bottom if 0 <  i t-1 < C, or d i = y i satisfies feasibility of direction

Working Set min d (-1 + g(  t-1 )) T d s.t. y T d = 0 d i ≥ 0 if  i t-1 = 0 d i ≤ 0 if  i t-1 = C Card{d} = q -1 ≤ d i ≤ 1

PART I : General duality theory PART II : Solving the SVM dual Basics of Mathematical Optimization The algebra The geometry Examples General Decomposition Algorithm Good Working Set Implementation Details

Shrinking For all 0 <  i < C -1 + g i (  ) + eq y i = 0 For all  i = g i (  ) + eq y i - i lo = 0 For all  i = C -1 + g i (  ) + eq y i + i up = 0 If i lo > 0 or i up > 0 for n consecutive iterations Drop  i from problem (temporarily)

Caching Kernel evaluation can be expensive Cache them in a least-recently-used manner Choose q’ variables where cache available

Results Those who have used SVM light : You know that it works very well. Those who haven’t used SVM light : It works very well. See paper. Download.

Questions???