Exploiting Duality (Particularly the dual of SVM) M. Pawan Kumar VISUAL GEOMETRY GROUP
PART I : General duality theory PART II : Solving the SVM dual Basics of Mathematical Optimization The algebra The geometry Examples General Decomposition Algorithm Good Working Set Implementation Details
Mathematical Optimization min f 0 (x) s.t. f i (x) ≤ 0 h i (x) = 0 Objective function Inequality constraints Equality constraints x is a feasible point f i (x) ≤ 0, h i (x) = 0 x is a strictly feasible point f i (x) < 0, h i (x) = 0 Feasible region - set of all feasible points
Convex Optimization min f 0 (x) s.t. f i (x) ≤ 0 h i (x) = 0 Objective function Inequality constraints Equality constraints Objective function is convex Feasible region is convex Convex set??? Convex function???
Convex Set x1x1 x2x2 c x 1 + (1 - c) x 2 c [0,1] Line Segment Endpoints
Convex Set x1x1 x2x2 All points on the line segment lie within the set For all line segments with endpoints in the set
Non-Convex Set x1x1 x2x2
Examples of Convex Sets x1x1 x2x2 Line Segment
Examples of Convex Sets x1x1 x2x2 Line
Examples of Convex Sets Hyperplane a T x - b = 0
Examples of Convex Sets Halfspace a T x - b ≤ 0
Examples of Convex Sets Second-order Cone ||x|| ≤ t t x2x2 x1x1
Operations that Preserve Convexity Intersection Polyhedron / Polytope
Operations that Preserve Convexity Intersection
Operations that Preserve Convexity Affine Transformation x Ax + b
Convex Function x f(x) Blue point always lies above red point x1x1 x2x2
Convex Function x f(x) f( c x 1 + (1 - c) x 2 ) ≤ c f(x 1 ) + (1 - c) f(x 2 ) x1x1 x2x2 Domain of f(.) has to be convex
Convex Function x f(x) x1x1 x2x2 -f(.) is concave f( c x 1 + (1 - c) x 2 ) ≤ c f(x 1 ) + (1 - c) f(x 2 )
Convex Function Once-differentiable functions f(y) + f(y) T (x - y) ≤ f(x) x f(x) (y,f(y)) f(y) + f(y) T (x - y) Twice-differentiable functions 2 f(x) 0
Convex Function and Convex Sets x f(x) Epigraph of a convex function is a convex set
Examples of Convex Functions Linear function a T x p-Norm functions (x 1 p + x 2 p + x n p ) 1/p, p ≥ 1 Quadratic functions x T Q x Q 0
Operations that Preserve Convexity Non-negative weighted sum x f 1 (x) w1w1 x f 2 (x) + w 2 + …. x T Q x + a T x + b Q 0
Operations that Preserve Convexity Pointwise maximum x f 1 (x) max x f 2 (x), Pointwise minimum of concave functions is concave
Convex Optimization min f 0 (x) s.t. f i (x) ≤ 0 h i (x) = 0 Objective function Inequality constraints Equality constraints Objective function is convex Feasible region is convex
PART I : General duality theory PART II : Solving the SVM dual Basics of Mathematical Optimization The algebra The geometry Examples General Decomposition Algorithm Good Working Set Implementation Details
Lagrangian min f 0 (x) s.t. f i (x) ≤ 0 h i (x) = 0 f0(x)f0(x) + ∑ i i f i (x) i ≥ 0 + ∑ i i h i (x) L(x,, )
Lagrangian Dual + ∑ i i f i (x) i ≥ 0 + ∑ i i h i (x) L(x,, ) f0(x)f0(x) min x L(x,, )g(, ) x belongs to intersection of domains of f 0, f i and h i x Dx D
Lagrangian Dual + ∑ i i f i (x) i ≥ 0 + ∑ i i h i (x) f0(x)f0(x) min x g(, ) = Pointwise minimum of affine (concave) functions Dual function is concave
Lagrangian Dual + ∑ i i f i (x) i ≥ 0 + ∑ i i h i (x) f0(x)f0(x) min f 0 (x) s.t. f i (x) ≤ 0 h i (x) = 0 p* = min x g(, ) = ≥ For all (, )
The Dual Problem The lower bound could be far from p* Best lower bound? + ∑ i i f i (x) i ≥ 0 + ∑ i i h i (x) f0(x)f0(x) min x max, Easy to obtain d* = p* - d* ≥ 0Duality Gap
The Geometric Interpretation (f i (x), h i (x), f 0 (x)) uvt x Dx D G t G u p*
The Geometric Interpretation (u, v, t) G t u (,, 1) T ≥ g(, ) p* g( ) d*
The Duality Gap + ∑ i i f i (x) i ≥ 0 + ∑ i i h i (x) f0(x)f0(x) min f 0 (x) s.t. f i (x) ≤ 0 h i (x) = 0 p* = max, min x d* = ≥
The Duality Gap p* - d*Duality Gap p* - d* ≥ 0 Weak Duality p* - d* = 0 Strong Duality
Problem is convex There exists a strictly feasible point Slater’s Condition Taken care of by most solvers
At Strong Duality f 0 (x*) = g( *, *) = min x ( f 0 (x) + ∑ i i *f i (x) + ∑ i i *h i (x) ) ≤ f 0 (x*) + ∑ i i *f i (x*) + ∑ i i *h i (x*) ≤ f 0 (x*) Inequalities hold with equality x* minimizes the Lagrangian at ( *, *)
At Strong Duality f 0 (x*) = g( *, *) = min x ( f 0 (x) + ∑ i i *f i (x) + ∑ i i *h i (x) ) ≤ f 0 (x*) + ∑ i i *f i (x*) + ∑ i i *h i (x*) ≤ f 0 (x*) Inequalities hold with equality i *f i (x*) = 0
KKT Conditions f i (x*) ≤ 0h i (x*) = 0 i * ≥ 0 Primal feasible Dual feasible i *f i (x*) = 0 Complementary Slackness f 0 (x*) + ∑ i i * f i (x*) + ∑ i i * h i (x*) = 0 Necessary conditions for strong duality
KKT Conditions f i (x*) ≤ 0h i (x*) = 0 i * ≥ 0 Primal feasible Dual feasible i *f i (x*) = 0 Complementary Slackness f 0 (x*) + ∑ i i * f i (x*) + ∑ i i * h i (x*) = 0 Necessary and sufficient for convex problems
PART I : General duality theory PART II : Solving the SVM dual Basics of Mathematical Optimization The algebra The geometry Examples General Decomposition Algorithm Good Working Set Implementation Details
Linear Program min c T x s.t. A x = b x ≥ 0
QCQP min (1/2)x T P 0 x + q 0 x + r 0 s.t. (1/2)x T P i x + q i x + r i
Entropy Maximization min ∑ i x i log(x i ) s.t. A x ≤ b ∑ i x i = 1
The SVM Framework Points X = {x i } Labels y= {y i } w T x + b = 0 y i {-1, +1} y i (w T x i + b) ≥ 1 - i i ≥ 0 min C i 2/||w|| 1/2 w T w + Convex Quadratic Program
The SVM Dual min (1/2) T Q - T 1 s.t. T y = 0 0 ≤ ≤ C1 Q ij = y i y j x i T x j = y i y j k(x i,x j )
PART I : General duality theory PART II : Solving the SVM dual Basics of Mathematical Optimization The algebra The geometry Examples General Decomposition Algorithm Good Working Set Implementation Details
The SVM Dual min (1/2) T Q - T 1 s.t. T y = 0 0 ≤ ≤ C1 Choose ‘q’ variables. Fix the rest. Change unfixed variables, satisfying constraints, to decrease objective function (small problem). Repeat. Minimum ‘q’ ???Till When ??? Best set B?
KKT Conditions min (1/2) T Q - T 1 s.t. T y = 0 0 ≤ ≤ C1 eq i lo i up -1 + Q + eq y - lo + up = 0 i lo i = 0 i up ( i - C) = 0 i lo ≥ 0 i up ≥ 0 g( )
KKT Conditions -1 + g( ) + eq y - lo + up = 0 i lo i = 0 i up ( i - C) = 0 i lo ≥ 0 i up ≥ 0 For all 0 < i < C -1 + g i ( ) + eq y i = 0 For all i = g i ( ) + eq y i - i lo = 0 For all i = C -1 + g i ( ) + eq y i + i up = 0
KKT Conditions -1 + g( ) + eq y - lo + up = 0 i lo i = 0 i up ( i - C) = 0 i lo ≥ 0 i up ≥ 0 g i ( ) = y i ∑ j j y j k(x i,x j ) g i t ( ) = g i ( t-1 ) + y i ∑ j B ( j t - j t-1 )y j k(x i,x j ) Best set of ‘q’ variables (Working set)
PART I : General duality theory PART II : Solving the SVM dual Basics of Mathematical Optimization The algebra The geometry Examples General Decomposition Algorithm Good Working Set Implementation Details
Working Set g i ( ) = y i ∑ j j y j k(x i,x j ) d : feasible direction of descent t = t-1 + d Choose steepest descent direction First order approximation of objective (-1 + g( t-1 )) T d
Working Set min d (-1 + g( t-1 )) T d s.t. y T d = 0 d i ≥ 0 if i t-1 = 0 d i ≤ 0 if i t-1 = C Card{d} = q -1 ≤ d i ≤ 1
Working Set s i = y i (-1 + g i ( t-1 )) Sort according decreasing values of s i Choose q/2 from top if 0 < i t-1 < C, or d i = -y i satisfies feasibility of direction Choose q/2 from bottom if 0 < i t-1 < C, or d i = y i satisfies feasibility of direction
Working Set min d (-1 + g( t-1 )) T d s.t. y T d = 0 d i ≥ 0 if i t-1 = 0 d i ≤ 0 if i t-1 = C Card{d} = q -1 ≤ d i ≤ 1
PART I : General duality theory PART II : Solving the SVM dual Basics of Mathematical Optimization The algebra The geometry Examples General Decomposition Algorithm Good Working Set Implementation Details
Shrinking For all 0 < i < C -1 + g i ( ) + eq y i = 0 For all i = g i ( ) + eq y i - i lo = 0 For all i = C -1 + g i ( ) + eq y i + i up = 0 If i lo > 0 or i up > 0 for n consecutive iterations Drop i from problem (temporarily)
Caching Kernel evaluation can be expensive Cache them in a least-recently-used manner Choose q’ variables where cache available
Results Those who have used SVM light : You know that it works very well. Those who haven’t used SVM light : It works very well. See paper. Download.
Questions???