A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz.

Slides:



Advertisements
Similar presentations
Lecture 9 Support Vector Machines
Advertisements

Solving LP Models Improving Search Special Form of Improving Search
C&O 355 Mathematical Programming Fall 2010 Lecture 22 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Engineering Optimization
Introduction to Algorithms
A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
SVM—Support Vector Machines
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
+ Convex Functions, Convex Sets and Quadratic Programs Sivaraman Balakrishnan.
Lecture 8 – Nonlinear Programming Models Topics General formulations Local vs. global solutions Solution characteristics Convexity and convex programming.
by Rianto Adhy Sasongko Supervisor: Dr.J.C.Allwright
Continuous optimization Problems and successes
Separating Hyperplanes
Visual Recognition Tutorial
Support Vector Machines
1 Introduction to Linear and Integer Programming Lecture 9: Feb 14.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Approximation Algorithms
Constrained Optimization
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and.
Exploiting Duality (Particularly the dual of SVM) M. Pawan Kumar VISUAL GEOMETRY GROUP.
Seminar in Advanced Machine Learning Rong Jin. Course Description  Introduction to the state-of-the-art techniques in machine learning  Focus of this.
Lecture 10: Support Vector Machines
1 Multiple Kernel Learning Naouel Baili MRL Seminar, Fall 2009.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Optimality Conditions for Nonlinear Optimization Ashish Goel Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A.
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Tier I: Mathematical Methods of Optimization
Support Vector Machines
Machine Learning Week 4 Lecture 1. Hand In Data Is coming online later today. I keep test set with approx test images That will be your real test.
Simplex method (algebraic interpretation)
Nonlinear Programming.  A nonlinear program (NLP) is similar to a linear program in that it is composed of an objective function, general constraints,
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
Nonlinear Programming Models
Advanced Operations Research Models Instructor: Dr. A. Seifi Teaching Assistant: Golbarg Kazemi 1.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
Biointelligence Laboratory, Seoul National University
L8 Optimal Design concepts pt D
Today’s Topics 11/10/15CS Fall 2015 (Shavlik©), Lecture 22, Week 101 Support Vector Machines (SVMs) Three Key Ideas –Max Margins –Allowing Misclassified.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Chapter 4 Sensitivity Analysis, Duality and Interior Point Methods.
OR Chapter 8. General LP Problems Converting other forms to general LP problem : min c’x  - max (-c)’x   = by adding a nonnegative slack variable.
CPSC 536N Sparse Approximations Winter 2013 Lecture 1 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA.
OR Chapter 7. The Revised Simplex Method  Recall Theorem 3.1, same basis  same dictionary Entire dictionary can be constructed as long as we.
+ Quadratic Programming and Duality Sivaraman Balakrishnan.
OR Relation between (P) & (D). OR optimal solution InfeasibleUnbounded Optimal solution OXX Infeasible X( O )O Unbounded XOX (D) (P)
Introduction to Optimization
© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:
Linear Programming Chapter 9. Interior Point Methods  Three major variants  Affine scaling algorithm - easy concept, good performance  Potential.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
Approximation Algorithms based on linear programming.
Regularized Least-Squares and Convex Optimization.
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
Chapter 11 Optimization with Equality Constraints
Support Vector Machines
Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi
Chapter 1. Introduction Mathematical Programming (Optimization) Problem: min/max
Nonnegative polynomials and applications to learning
Polynomial DC decompositions
Chap 9. General LP problems: Duality and Infeasibility
Chapter 6. Large Scale Optimization
Chap 3. The simplex method
CSCI B609: “Foundations of Data Science”
Chapter 2. Simplex method
Chapter 6. Large Scale Optimization
Constraints.
Presentation transcript:

A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 2 About today’s discussion… Chapter 7 – no separate discussion of convex optimization Discusses with SVM problems Instead: Today: Discuss convex optimization Next Week: Discuss some specific convex optimization problems (from text), e.g. SVMs

A KTEC Center of Excellence 3 About today’s discussion… Mostly follow alternate text: Convex Optimization, Stephen Boyd and Lieven Vandenberghe – Borrowed material from book and related course notes – Some figures and equations shown here Available online: Nice course lecture videos available from Stephen Boyd online: Corresponding convex optimization tool (discuss later) - CVX:

A KTEC Center of Excellence 4 Overview Why convex? What is convex? Key examples of linear and quadratic programming Key mathematical ideas to discuss: ->Lagrange Duality ->KKT conditions Brief concept of interior point methods CVX – convex opt. made easy

A KTEC Center of Excellence 5 Mathematical Optimization All learning is some optimization problem -> Stick to canonical form x = (x 1, x 2, …, x p ) – opt. variables ; x* f 0 : R p -> R – objective function f i : R p -> R – constraint function

A KTEC Center of Excellence 6 Optimization Example Well familiar with: regularized regression Least squares Add some constraints, ridge, lasso

A KTEC Center of Excellence 7 Why convex optimization? Can’t solve most OPs E.g. NP Hard, even high polynomial time too slow Convex OPs (Generally) No analytic solution Efficient algorithms to find (global) solution Interior point methods (basically Iterated Newton) can be used: – ~[10-100]*max{p 3, p 2 m, F} ; F cost eval. obj. and constr. f At worst solve with general IP methods (CVX), faster specialized

A KTEC Center of Excellence 8 What is Convex Optimization? OP with convex objective and constraint functions f 0, …, f m are convex = convex OP that has an efficient solution!

A KTEC Center of Excellence 9 Convex Function Definition: the weighted mean of function evaluated at any two points is greater than or equal to the function evaluated at the weighted mean of the two points

A KTEC Center of Excellence 10 Convex Function What does definition mean? Pick any two points x, y and evaluate along the function, f(x), f(y) Draw the line passing through the two points f(x) and f(y) Convex if function evaluated on any point along the line between x and y is below the line between f(x) and f(y)

A KTEC Center of Excellence 11 Convex Function

A KTEC Center of Excellence 12 Convex Function Convex!

A KTEC Center of Excellence 13 Convex Function Not Convex!!!

A KTEC Center of Excellence 14 Convex Function Easy to see why convexity allows for efficient solution Just “slide” down the objective function as far as possible and will reach a minimum

A KTEC Center of Excellence 15 Local Optima is Global (simple proof)

A KTEC Center of Excellence 16 Convex vs. Non-convex Ex. Convex, min. easy to find Affine – border case of convexity

A KTEC Center of Excellence 17 Convex vs. Non-convex Ex. Non-convex, easy to get stuck in a local min. Can’t rely on only local search techniques

A KTEC Center of Excellence 18 Non-convex Some non-convex problems highly multi-modal, or NP hard Could be forced to search all solutions, or hope stochastic search is successful Cannot guarantee best solution, inefficient Harder to make performance guarantees with approximate solutions

A KTEC Center of Excellence 19 Determine/Prove Convexity Can use definition (prove holds) to prove If function restricted to any line is convex, function is convex If 2X differentiable, show hessian >= 0 Often easier to: Convert to a known convex OP – E.g. QP, LP, SOCP, SDP, often of a more general form Combine known convex functions (building blocks) using operations that preserve convexity – Similar idea to building kernels

A KTEC Center of Excellence 20 Some common convex OPs Of particular interest for this book and chapter: linear programming (LP) and quadratic programming (QP) LP: affine objective function, affine constraints -e.g. LP SVM, portfolio management

A KTEC Center of Excellence 21 LP Visualization Note: constraints form feasible set -for LP, polyhedra

A KTEC Center of Excellence 22 Quadratic Program QP: Quadratic objective, affine constraints LP is special case Many SVM problems result in QP, regression If constraint functions quadratic, then Quadratically Constrained Quadratic Program (QCQP)

A KTEC Center of Excellence 23 QP Visualization

A KTEC Center of Excellence 24 Second Order Cone Program A i = 0 - results in LP c i = 0 - results in QCQP Constraint requires the affine functions to lie in 2 nd order cone

A KTEC Center of Excellence 25 Second Order Cone (Boundary) in R 3

A KTEC Center of Excellence 26 Semidefinite Programming Linear matrix inequality (LMI) constraints Many problems can be expressed using LMIs LP and SOCP

A KTEC Center of Excellence 27 Semidefinite Programming

A KTEC Center of Excellence 28 Building Convex Functions From simple convex functions to complex: some operations that preserve complexity Nonnegative weighted sum Composition with affine function Pointwise maximum and supremum Composition Minimization Perspective ( g(x,t) = tf(x/t) )

A KTEC Center of Excellence 29 Verifying Convexity Remarks For more detail and expansion, consult the referenced text, Convex Optimization Geometric Programs also convex, can be handled with a series of SDPs (skipped details here) CVX converts the problem either to SOCP or SDM (or a series of) and uses efficient solver

A KTEC Center of Excellence 30 Lagrangian Standard form: Lagrangian L: Lambda, nu, Lagrange multipliers (dual variables)

A KTEC Center of Excellence 31 Lagrange Dual Function Lagrange Dual found by minimizing L with respect to primal variables Often can take gradient of L w.r.t. primal var.’s and set = 0 (SVM)

A KTEC Center of Excellence 32 Lagrange Dual Function Note: Lagrange dual function is the point- wise infimum of family of affine functions of (lambda, nu) Thus, g is concave even if problem is not convex

A KTEC Center of Excellence 33 Lagrange Dual Function Lagrange Dual provides lower bound on objective value at solution

A KTEC Center of Excellence 34 Lagrangian as Linear Approximation, Lower Bound Simple interpretation of Lagrangian Can incorporate the constraints into objective as indicator functions Infinity if violated, 0 otherwise: In Lagrangian we use a “soft” linear approximation to the indicator functions; under-estimator since 0

A KTEC Center of Excellence 35 Lagrange Dual Problem Why not make the lower bound best possible? Dual problem: Always convex opt. problem (even when primal is non-convex) Weak Duality: d* <= p* (have already seen this)

A KTEC Center of Excellence 36 Strong Duality If d* = p*, strong duality holds Does not hold in general Slater’s Theorem: If convex problem, and strictly feasible point exists, then strong duality holds! (proof too involved, refer to text) => For convex problems, can use dual problem to find solution

A KTEC Center of Excellence 37 Complementary Slackness When strong duality holds Sandwiched between f 0 (x), last 2 inequalities are equalities, simple! (definition) (since constraints satisfied at x*)

A KTEC Center of Excellence 38 Complementary Slackness Which means: Since each term is non-positive, we have complementary slackness: Whenever constraint is non-active, corresponding multiplier is zero

A KTEC Center of Excellence 39 Complementary Slackness This can also be described by Since usually only a few active constraints at solution (see geometry), the dual variable lambda is often sparse Note: In general no guarantee

A KTEC Center of Excellence 40 Complementary Slackness As we will see, this is why support vector machines result in solution with only key support vectors These come from the dual problem, constraints correspond to points, and complementary slackness ensures only the “active” points are kept

A KTEC Center of Excellence 41 Complementary Slackness However, avoid common misconceptions when it comes to SVM and complementary slackness! E.g. if Lagrange multiplier is 0, constraint could still be active! (not bijection!) This means:

A KTEC Center of Excellence 42 KKT Conditions The KKT conditions are then just what we call that set of conditions required at the solution (basically list what we know) KKT conditions play important role Can sometimes be used to find solution analytically Otherwise can think of many methods as ways of solving KKT conditions

A KTEC Center of Excellence 43 KKT Conditions Again given strong duality and assuming differentiable, since gradient must be 0 at x* Thus, putting it all together, for non-convex problems we have

A KTEC Center of Excellence 44 KKT Conditions – non-convex Necessary conditions

A KTEC Center of Excellence 45 KKT Conditions – convex Also sufficient conditions: 1+2 -> xt is feasible. 3 -> L(x,lt,nt) is convex 5 -> xt minimizes L(x,lt,nt) so g(lt,nt) = L(xt,lt,nt)

A KTEC Center of Excellence 46 Brief description of interior point method Solve a series of equality constrained problems with Newton’s method Approximate constraints with log-barrier (approx. of indicator)

A KTEC Center of Excellence 47 Brief description of interior point method As t gets larger, approximation becomes better

A KTEC Center of Excellence 48 Central Path Idea

A KTEC Center of Excellence 49 CVX: Convex Optimization Made Easy CVX is a Matlab toolbox Allows you to flexibly express convex optimization problems Translates these to a general form and uses efficient solver (SOCP, SDP, or a series of these) All you have to do is design the convex optimization problem Plug into CVX, a first version of algorithm implemented More specialized solver may be necessary for some applications

A KTEC Center of Excellence 50 CVX - Examples Quadratic program: given H, f, A, and b cvx_begin variable x(n) minimize (x’*H*x + f’*x) subject to A*x >= b cvx_end

A KTEC Center of Excellence 51 CVX - Examples SVM-type formulation with L1 norm cvx_begin variable w(p) variable b(1) variable e(n) expression by(n) by = train_label.*b; minimize( w'*(L + I)*w + C*sum(e) + l1_lambda*norm(w,1) ) subject to X*w + by >= a - e; e >= ec; cvx_end

A KTEC Center of Excellence 52 CVX - Examples More complicated terms built with expressions cvx_begin variable w(p+1+n); expression q(ec); for i =1:p for j =i:p if(A(i,j) == 1) q(ct) = max(abs(w(i))/d(i),abs(w(j))/d(j)); ct=ct+1; end end end minimize( f'*w + lambda*sum(q) ) subject to X*w >= a; cvx_end

A KTEC Center of Excellence 53 Questions Questions, Comments?

A KTEC Center of Excellence 54 Extra proof