Optimality Conditions for Nonlinear Optimization Ashish Goel Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A.

Slides:



Advertisements
Similar presentations
C&O 355 Lecture 15 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A A A A A A.
Advertisements

Nonlinear Programming McCarl and Spreen Chapter 12.
Lecture #3; Based on slides by Yinyu Ye
MS&E 211 Quadratic Programming Ashish Goel. A simple quadratic program Minimize (x 1 ) 2 Subject to: -x 1 + x 2 ≥ 3 -x 1 – x 2 ≥ -2.
1 OR II GSLM Outline  some terminology  differences between LP and NLP  basic questions in NLP  gradient and Hessian  quadratic form  contour,
The Simplex Method and Linear Programming Duality Ashish Goel Department of Management Science and Engineering Stanford University Stanford, CA 94305,
Page 1 Page 1 ENGINEERING OPTIMIZATION Methods and Applications A. Ravindran, K. M. Ragsdell, G. V. Reklaitis Book Review.
C&O 355 Mathematical Programming Fall 2010 Lecture 15 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.
1 OR II GSLM Outline  classical optimization – unconstrained optimization  dimensions of optimization  feasible direction.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Lecture 8 – Nonlinear Programming Models Topics General formulations Local vs. global solutions Solution characteristics Convexity and convex programming.
L12 LaGrange Multiplier Method Homework Review Summary Test 1.
Separating Hyperplanes
The Most Important Concept in Optimization (minimization)  A point is said to be an optimal solution of a unconstrained minimization if there exists no.
Engineering Optimization
Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!
Constrained Optimization
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Exploiting Duality (Particularly the dual of SVM) M. Pawan Kumar VISUAL GEOMETRY GROUP.
Unconstrained Optimization Problem
CS Pattern Recognition Review of Prerequisites in Math and Statistics Prepared by Li Yang Based on Appendix chapters of Pattern Recognition, 4.
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Computer Algorithms Mathematical Programming ECE 665 Professor Maciej Ciesielski By DFG.
Definition and Properties of the Cost Function
Tier I: Mathematical Methods of Optimization
KKT Practice and Second Order Conditions from Nash and Sofer
C&O 355 Lecture 2 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A.
C&O 355 Mathematical Programming Fall 2010 Lecture 2 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A.
Chapter 11 Nonlinear Programming
C&O 355 Mathematical Programming Fall 2010 Lecture 4 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Linear Programming System of Linear Inequalities  The solution set of LP is described by Ax  b. Gauss showed how to solve a system of linear.
Nonlinear Programming Models
Advanced Operations Research Models Instructor: Dr. A. Seifi Teaching Assistant: Golbarg Kazemi 1.
EASTERN MEDITERRANEAN UNIVERSITY Department of Industrial Engineering Non linear Optimization Spring Instructor: Prof.Dr.Sahand Daneshvar Submited.
Nonlinear Programming I Li Xiaolei. Introductory concepts A general nonlinear programming problem (NLP) can be expressed as follows: objective function.
L8 Optimal Design concepts pt D
C&O 355 Mathematical Programming Fall 2010 Lecture 5 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A.
Chapter 4 Sensitivity Analysis, Duality and Interior Point Methods.
CPSC 536N Sparse Approximations Winter 2013 Lecture 1 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA.
Introduction to Optimization
1 OR II GSLM Outline  equality constraint  tangent plane  regular point  FONC  SONC  SOSC.
Linear & Nonlinear Programming -- Basic Properties of Solutions and Algorithms.
Linear Programming Chapter 9. Interior Point Methods  Three major variants  Affine scaling algorithm - easy concept, good performance  Potential.
Inequality Constraints Lecture 7. Inequality Contraints (I) n A Review of Lagrange Multipliers –As we discussed last time, the first order necessary conditions.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
Linear Programming Chap 2. The Geometry of LP  In the text, polyhedron is defined as P = { x  R n : Ax  b }. So some of our earlier results should.
OR II GSLM
D Nagesh Kumar, IISc Water Resources Systems Planning and Management: M2L2 Introduction to Optimization (ii) Constrained and Unconstrained Optimization.
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
deterministic operations research
Chap 10. Sensitivity Analysis
Chapter 11 Optimization with Equality Constraints
L11 Optimal Design L.Multipliers
Computational Optimization
Proving that a Valid Inequality is Facet-defining
Lecture 8 – Nonlinear Programming Models
Dr. Arslan Ornek IMPROVING SEARCH
Chap 9. General LP problems: Duality and Infeasibility
Chapter 5. Sensitivity Analysis
Chap 3. The simplex method
L5 Optimal Design concepts pt A
Chapter 5. The Duality Theorem
I.4 Polyhedral Theory (NW)
EE 458 Introduction to Optimization
I.4 Polyhedral Theory.
Proving that a Valid Inequality is Facet-defining
Nonlinear programming
CS5321 Numerical Optimization
Chapter 2. Simplex method
L8 Optimal Design concepts pt D
Presentation transcript:

Optimality Conditions for Nonlinear Optimization Ashish Goel Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. Based on slides by Yinyu Ye 1

2 PROOFS FROM THIS LECTURE ARE NOT PART OF THE EXAM/HWs

Mathematical Optimization Problem Form: ( MOP ) s.t. x ∈ F. min f(x)f(x) Mathematical Optimization Problems 3 The first question: does the problem have a feasible solution, that is, a solution that satisfies all the constraints of the problem, that is, in F. The second question: How does one recognize or certify a (local) optimal solution? We answered it for LP by developing Optimality Conditions from the LP duality. But what about a generally constrained and objective optimization problem? We need more general Optimality Condition Theory.

The objective and constraint are often specified by functions that are continuously differentiable or in C 1 over certain regions. Sometimes the functions are twice continuously differentiable or in C 2 over certain regions. The theory distinguishes these two cases and develops first-order optimality conditions and second-order optimality conditions. For convex optimization, first order optimality conditions suffice. 4 Continuous Differentiable Function

Let f be a differentiable function on R n. For a given point x ∈ R n and a vector d ∈ R n such that ∇ f (x) T d < 0, where ∇ f (x) is the gradient vector of f (x) (partial derivative of f(x) to each x j ) then there exists a scalar τ> 0 such that f (x + τ d) < f (x) for all small τ. Such a vector d (above) is called a descent direction at x. If the gradient vector ∇ f (x) ≠ 0, then ∇ f(x) is the direction of steepest ascent and − ∇ f(x) is the direction of steepest descent at x. Denote by D x the set of all descent directions at x, that is, D x = { d ∈ R n : ∇ f(x) T d < 0}. Descent Direction 5

At feasible point x, a feasible direction is F x := {d ∈ R n : d ≠ 0, x + λd ∈ F for all small λ > 0}. Examples: Unconstrained: R n ⇒ F x = R n. Linear Equality: { x :Ax = b} ⇒ F x = {d : Ad = 0 }. Linear Inequality: { x :Ax ≥ b} ⇒ F x = {d : a i d ≥ 0, ∀ i ∈ B(x)}, where B(x) is the binding constraint index set B(x) := { i : a i x = b i }. Linear Equality and Nonnegativity: { x : Ax = b, x ≥ 0} ⇒ F x = {d : Ad = 0, d i ≥ 0, ∀ i ∈ B(x) }, where B(x) := { i : x i = 0 }. Feasible Direction 6

A fundamental question in Optimization is: given a feasible solution or point x, what are the necessary conditions such that x is a local optimizer? A general answer: there exists no direction d at x that is both descent and feasible. Or the intersection of D x and F x is empty. Optimality Conditions 7

Consider the unconstrained problem, where f is differentiable on R n, (UP) s.t.x ∈ R n. min f(x)f(x) Theorem 1 Let x be a (local) minimizer of (UP) where f is continuously differentiable at x. Then ∇ f(x ) = 0. Optimality Conditions for Unconstrained Problems Then the descent direction set: D x = { d ∈ R n : ∇ f(x) T d < 0}, must be empty. 8

x ab ced f Figure : Local minimizers of one-variable function 9

Quadratic Function: A minimizer or maximizer x must satisfy ∇ f(x ) = 2Qx − 2c = 0 or Q x = c. Quadratic Optimization 10 d 1 (x) = 2 - x 1 + x 2 d 2 (x) = 3 -2x 2 + x 1

Consider the linear equality constrained problem, where f is differentiable on R n, Theorem 2 Let x be a (local) minimizer of (LEP) where function f is continuously differentiable at x. Then ∇ f(x ) = A T y for a vector y = (y 1 ;... ; y m ) ∈ R m, which are called Lagrange or dual multipliers. The geometric interpretation: the objective gradient vector is perpendicular to or the objective level set tangents the constraint hyperplanes. Linear Equality Constrained Problems (LEP) s.t.Ax = b. min f(x)f(x) 11

x1x1 x2x2 The objective level set tangents the constraint hyperplane 12 Linear Equality Constrained Problems min (x 1 − 1) 2 + (x 2 − 1) 2 x 1 + x 2 = 1. s.t. 2(x 1 -1) = y 2(x 2 -1)=y, (y+2)/2+(y+2)/2=1 y=-1.

Consider the primal feasible system: {x : Ax = b, x ≥ 0}. If it is infeasible, then the dual must be unbounded, that is, there exists a y in system {y : A T y ≤ 0, b T y > 0}. The reverse is also true. These two systems are an alternative pair: one and only one of the two is feasible. Consider the dual feasible system: {y : A T y ≤ c}. If it is infeasible, then the primal must be unbounded, that is, there exists an x in system {x : Ax = 0, x ≥ 0, c T x < 0}. The reverse is also true. These two systems are also an alternative pair: one and only one of the two is feasible. 13 Farkas Lemma: Alternative Systems

If x is a local optimizer, then the intersection of the descent and feasible direction sets at x must be empty, or Ad = 0, ∇ f(x ) T d < 0 has no feasible solution d. Or no d’≥0 and d”≥0 such that A(d’ - d”) = 0, ∇ f(x ) T (d’-d”) < 0. By the Alternative System Theorem it must be true that its alternative system has a solution, that is, there is y ∈ R m such that A T y ≤ ∇ f(x ) and -A T y ≤ - ∇ f(x ), that is ∇ f (x ) =A T y. Sketch of Proof 14

Consider the linear inequality constrained problem, where f is differentiable on R n, Theorem 3 Let x be a (local) minimizer of (LIP) where function f is continuously differentiable at x. Then ∇ f(x ) = A T y, y ≥ 0, and y i (Ax - b) i =0, for all i=1,…,m for a vector y = (y 1 ;... ; y m ) ∈ R m, which are called Lagrange or dual multipliers. The geometric interpretation: the objective gradient vector is a conic combination of the normal directions of the binding/active constraint hyperplanes. Linear Inequality Constrained Problems (LIP) s.t.Ax ≥ b. min f(x)f(x) 15

x1x1 x2x2 The constraint cannot be binding. 16 Linear Inequality Constrained Problems min (x 1 − 1) 2 + (x 2 − 1) 2 x 1 + x 2 ≥ 1. s.t. 2(x 1 -1) = y 2(x 2 -1)=y, (y+2)/2+(y+2)/2=1 y=-1.

x1x1 x2x2 The constraint is binding. 17 Linear Inequality Constrained Problems min (x 1 − 1) 2 + (x 2 − 1) 2 -x 1 - x 2 ≥ -1. s.t. 2(x 1 -1) = -y 2(x 2 -1)=-y, (-y+2)/2+(-y+2)/2=1 y=1.

Sketch of Proof 18

Consider the linear equality and nonnegativity constrained problem, where f is differentiable on R n, Theorem 4 Let x be a (local) minimizer of (LENP) where function f is continuously differentiable at x. Then ∇ f(x ) - A T y ≥ 0, x j ( ∇ f(x ) - A T y ) j =0, for all j=1,..,n. for a vector y = (y 1 ;... ; y m ) ∈ R m, which are called Lagrange or dual multipliers. Linear Equality Constrained Problems (LENP) s.t.Ax = b, x ≥ 0 min f(x)f(x) 19

(LEP) (LIP) (LENP) 20 Optimality Conditions Linearly Constrained Problems s.t. Ax = b, x ≥ 0 min f(x)f(x) ∇ f(x ) - A T y ≥ 0, x j ( ∇ f(x ) - A T y ) j =0, for j=1,..,n. min f (x) Ax ≥ b s.t. ∇ f(x ) = A T y, y ≥ 0, y i (Ax - b) i =0, for i=1,…,m min f (x) Ax = b s.t. ∇ f(x ) = A T y where vector y = (y 1 ;... ; y m ) ∈ R m is called Lagrange or dual multiplier vector. They are also called KKT (Karush-Kuhn-Tucker) conditions.

(LEP) (LIP) (LENP) 21 Optimality Conditions Linearly Constrained Problems s.t. Ax = b, x ≥ 0 min f(x)f(x) ∇ f(x ) - A T y ≥ 0, x j ( ∇ f(x ) - A T y ) j =0, for j=1,..,n. min f (x) Ax ≥ b s.t. ∇ f(x ) = A T y, y ≥ 0, y i (Ax - b) i =0, for i=1,…,m min f (x) Ax = b s.t. ∇ f(x ) = A T y Necessary, but not sufficient in general; However, necessary and sufficient when f(x) is convex. Example of Convex Optimization.

Let f(x) be a twice-continuously-differentiable function of one variable. Then f(x) is convex if any of the following equivalent conditions hold 1.f’(x) is a non-decreasing function 2.f(ax + (1-a)y) <= af(x) + (1-a)f(y) for all x, y and all 0 <= a <= 1 3.f’’(x) is non-negative in the domain of f, i.e. where f is defined. When f is a function of many variables, 2 still holds, and 1, 3 have to be modified for high dimensions. The sum of two convex functions is convex. Convex Functions 22

The log-barrier problem min - log(x 1 ) - log(x 2 ) x 1 + 2x 2 = 1, x 1,x 2 ≥ 0, s.t. -(1/x 1 ) = y -(1/x 2 )=2y, -(1/y)+2(-1/2y)=1 y= x 1 = 1/2 x 2 =1/4. min - log(x 1 ) - log(x 2 ) - … - log(x n ) Ax = b, x 1,x 2 …, x n ≥ 0, s.t. There is a y such that -(1/x j ) = a j T y, j=1,…,n; or x j (-a j T y) j = 1, for all j.

Optimality Conditions for Nonlinearly Constrained Optimization 24

(NEP) (NIP) (NEIP) 25 Nonlinearly Constrained Optimization Problems s.t. h(x) = b’, g(x) ≥ b min f(x)f(x) f (x) g(x) ≥ b s.t. min f (x) h(x) = b’ s.t. The vector function h(x) is from R n to R m and g(x) is from R n to R k ; that is: h 1 (x) h 2 (x) h(x).... h m (x) = ∈ R m g 1 (x) g 2 (x) g(x).... g k (x) = ∈ R k

26 Nonlinearly Constrained Optimization Problems

27 Gradient Vector and Jacobian Matrix at x h 1 (x) h 2 (x) h(x).... h m (x) = g 1 (x) g 2 (x) g(x).... g k (x) = ∇ f(x) T =[∂f/∂x 1, ∂f/∂x 2, …, ∂f/∂x N ] ∇ h 1 (x) T ∇ h 2 (x) T ∇ h(x).... ∇ h m (x) T = ∈ R m X n ∇ g 1 (x) T ∇ g 2 (x) T ∇ g(x).... ∇ g k (x) T = ∈ R k X n

(NEP) (NIP) (NEIP) 28 Optimality Conditions Nonlinearly Constrained Problems s.t. h(x) = b’ g(x) ≥ b min f(x)f(x) ∇ f(x )- ∇ h(x) T y’- ∇ g(x) T y=0 y ≥ 0 y i (g(x) - b) i = 0, for i=1,..,k. min f (x) g(x) ≥ b s.t. ∇ f(x )- ∇ g(x) T y=0, y ≥ 0, y i (g(x) - b) i =0, for i=1,…,k min f (x) h(x) = b’ s.t. ∇ f(x )- ∇ h(x) T y’=0 where vector y’ = (y’ 1 ;... ; y’ m ) ∈ R m and y = (y 1 ;... ; y k ) ∈ R k called Lagrange multiplier vector. They are also called KKT conditions.

(NEP) (NIP) (NEIP) 29 Optimality Conditions Nonlinearly Constrained Problems s.t. h(x) = b’ g(x) ≥ b min f(x)f(x) ∇ f(x )- ∇ h(x) T y’- ∇ g(x) T y=0 y ≥ 0 y i (g(x) - b) i = 0, for i=1,..,k. min f (x) g(x) ≥ b s.t. ∇ f(x )- ∇ g(x) T y=0, y ≥ 0, y i (g(x) - b) i =0, for i=1,…,k min f (x) h(x) = b’ s.t. ∇ f(x )- ∇ h(x) T y’=0 These are FONC (First Order Necessary Conditions)

Convex Optimization 30 where f is a convex function and F is a convex set. Examples: Linear Programming; Quadratic Programming; Semi- Definite Programming ( CVXP ) s.t. x ∈ F. min f(x)f(x)

(NEP) (NIP) (NEIP) 31 Convex Optimization s.t. h(x) = b’ g(x) ≥ b min f(x)f(x) f (x) g(x) ≥ b s.t. min f (x) h(x) = b’ s.t. The above are convex programs if h is linear and g is concave (i.e. –g is convex) and f is convex

(NEP) (NIP) (NEIP) 32 Convex Optimization s.t. h(x) = b’ g(x) ≥ b min f(x)f(x) ∇ f(x )- ∇ h(x) T y’- ∇ g(x) T y=0 y ≥ 0 y i (g(x) - b) i = 0, for i=1,..,k. min f (x) g(x) ≥ b s.t. ∇ f(x )- ∇ g(x) T y=0, y ≥ 0, y i (g(x) - b) i =0, for i=1,…,k min f (x) h(x) = b’ s.t. ∇ f(x )- ∇ h(x) T y’=0 The above are convex programs if h is linear and g is concave (i.e. –g is convex) and f is convex For convex programs, the optimality conditions are necessary and sufficient.

33

x1x1 x2x2 34 Optimality Conditions: Example I

x1x1 x2x2 35 Optimality Conditions: Example II

x1x1 x2x2 36 Optimality Conditions: Example III

The fundamental concept of the first order necessary condition (FONC) in optimization is that there is no feasible and descent direction d at same time The fundamental concept of the second order necessary condition (SONC) is that even the first order condition is satisfied at x, then one must have also d T ∇ 2 f (x)d ≥ 0 for any feasible direction. Consider f(x) =x 2 and f(x)=-x 2, and they both satisfy the first-order necessary condition at x =0, but only one of them satisfy the second- order condition. The first order condition would be sufficient if d T ∇ 2 f (x)d > 0 for any feasible direction d, while meet the first-order condition. Second Order Optimality Condition 37

Theorem If f is a differentiable convex function in the feasible region and the feasible region is a convex set, then the first-order KKT optimality conditions are sufficient for the global optimality. How to check convexity f (x) ? Hessian matrix is PSD in the feasible region. Epigraph is a convex set This class: we will only give functions where you can easily verify that the second derivative is positive or where we have sums of convex functions or convex functions of a linear combination How to check the feasible region is a convex set? Linear equality and inequality constraints h(x) is linear function and g(x) is concave function When FONC is Sufficient? 38 s.t. h(x) = b’, g(x) ≥ b min f(x)f(x)

Any problem in the previous theorem is called a convex optimization problem. The convexity plays a big role in Optimization and it simplifies the optimality certification greatly. Linear programming is a special convex optimization problem. Many convex optimization modelers and solvers are available: CVX MOSEK CPLEX And of course EXCEL Remarks 39

Another useful form: Linear Inequality and Non-negative constraints (LINP) 40 Optimality Conditions for Linearly Constrained Problems min f (x) Ax ≥ b x ≥ 0 s.t. ∇ f(x ) ≥ A T y y ≥ 0, y i (Ax - b) i =0, for i=1,…,m x j ( ∇ f(x ) - A T y ) j =0, for j=1,..,n

“Convexfication” 41

Assume we control a probability distribution over N outcomes, i = 1,... 2, …, N x i : probability of the i-th outcome Goal: Maximize the entropy of the distribution, i.e. Maximize  i – x i log x i x i log x i is convex Hence maximizing entropy of a probability distribution under constraints is a convex function. Important for communications! Natural constraints: x ¸ 0;  i x i = 1 Another interesting example 42

OPTIONAL MATERIAL 43

Second Order Optimality Conditions: special cases 44 min f (x) h(x) = b s.t. ∇ f(x )- ∇ h(x) T y=0 L(x,y) =f(x)- (h(x)-b) T y

x1x1 x2x2 45 Who satisfies the second order necessary condition?

Constraint Qualification 46 To develop optimality or KKT conditions for nonlinearly constrained optimization, one needs some technical assumptions, called constrained qualification. For equality constraints, the standard assumption is that the Jacobian matrix on the test solution is full rank, or the rows of the matrix are linearly independent. For inequality constraints, the standard assumption is that there is a feasible direction at the test solution pointing to the interior of the feasible region. When the problem data are randomly perturbed a little, the constraint qualification would be met with probability one.