Numerical Optimization

Slides:



Advertisements
Similar presentations
Optimization.
Advertisements

Engineering Optimization
Optimization Methods TexPoint fonts used in EMF.
Optimization : The min and max of a function
Optimization methods Review
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Optimality conditions for constrained local optima, Lagrange multipliers and their use for sensitivity of optimal solutions Today’s lecture is on optimality.
Separating Hyperplanes
Inexact SQP Methods for Equality Constrained Optimization Frank Edward Curtis Department of IE/MS, Northwestern University with Richard Byrd and Jorge.
Steepest Decent and Conjugate Gradients (CG). Solving of the linear equation system.
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
Linear Discriminant Functions
Visual Recognition Tutorial
Jonathan Richard Shewchuk Reading Group Presention By David Cline
1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)
Multidimensional scaling
1cs542g-term Notes  Extra class this Friday 1-2pm  If you want to receive s about the course (and are auditing) send me .
Nonlinear Optimization for Optimal Control
Methods For Nonlinear Least-Square Problems
Engineering Optimization
The Widrow-Hoff Algorithm (Primal Form) Repeat: Until convergence criterion satisfied return: Given a training set and learning rate Initial:  Minimize.
1 Numerical geometry of non-rigid shapes In the Rigid Kingdom In the Rigid Kingdom Lecture 4 © Alexander & Michael Bronstein tosca.cs.technion.ac.il/book.
Optimization Methods One-Dimensional Unconstrained Optimization
Unconstrained Optimization Problem
CS Pattern Recognition Review of Prerequisites in Math and Statistics Prepared by Li Yang Based on Appendix chapters of Pattern Recognition, 4.
MAE 552 – Heuristic Optimization Lecture 1 January 23, 2002.
1 Numerical Geometry of Non-Rigid Shapes Invariant shape similarity Invariant shape similarity © Alexander & Michael Bronstein, © Michael Bronstein,
Advanced Topics in Optimization
1 Numerical geometry of non-rigid shapes Non-Euclidean Embedding Non-Euclidean Embedding Lecture 6 © Alexander & Michael Bronstein tosca.cs.technion.ac.il/book.
Optimization Methods One-Dimensional Unconstrained Optimization
Tier I: Mathematical Methods of Optimization

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
Computational Optimization
UNCONSTRAINED MULTIVARIABLE
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 9. Optimization problems.
Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal February 12, 2007 Inexact Methods for PDE-Constrained Optimization.
ENCI 303 Lecture PS-19 Optimization 2
84 b Unidimensional Search Methods Most algorithms for unconstrained and constrained optimisation use an efficient unidimensional optimisation technique.
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.
Nonlinear programming Unconstrained optimization techniques.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
LTSI (1) Faculty of Mech. & Elec. Engineering, University AL-Baath, Syria Ahmad Karfoul (1), Julie Coloigner (2,3), Laurent Albera (2,3), Pierre Comon.
1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.
Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal January 31, 2007 Inexact Methods for PDE-Constrained Optimization.
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
Part 4 Nonlinear Programming 4.1 Introduction. Standard Form.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Chapter 4 Sensitivity Analysis, Duality and Interior Point Methods.
Inexact SQP methods for equality constrained optimization Frank Edward Curtis Department of IE/MS, Northwestern University with Richard Byrd and Jorge.
Gradient Methods In Optimization
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
METHOD OF STEEPEST DESCENT ELE Adaptive Signal Processing1 Week 5.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Inequality Constraints Lecture 7. Inequality Contraints (I) n A Review of Lagrange Multipliers –As we discussed last time, the first order necessary conditions.
1 Introduction Optimization: Produce best quality of life with the available resources Engineering design optimization: Find the best system that satisfies.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
D Nagesh Kumar, IISc Water Resources Systems Planning and Management: M2L2 Introduction to Optimization (ii) Constrained and Unconstrained Optimization.
Optimal Control.
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
Bounded Nonlinear Optimization to Fit a Model of Acoustic Foams
Chapter 11 Optimization with Equality Constraints
Computational Optimization
Dr. Arslan Ornek IMPROVING SEARCH
CS5321 Numerical Optimization
CS5321 Numerical Optimization
METHOD OF STEEPEST DESCENT
Performance Optimization
Section 3: Second Order Methods
Presentation transcript:

Numerical Optimization Alexander Bronstein, Michael Bronstein © 2008 All rights reserved. Web: tosca.cs.technion.ac.il

Common denominator: optimization problems Slowest Longest Shortest Maximal Fastest Minimal Largest Smallest Common denominator: optimization problems

Optimization problems Generic unconstrained minimization problem where Vector space is the search space is a cost (or objective) function A solution is the minimizer of The value is the minimum

Local vs. global minimum Find minimum by analyzing the local behavior of the cost function Local minimum Global minimum

Broad Peak (K3), 12th highest mountain on Earth Local vs. global in real life False summit 8,030 m Main summit 8,047 m Broad Peak (K3), 12th highest mountain on Earth

Convex functions A function defined on a convex set is called convex if for any and For convex function local minimum = global minimum Convex Non-convex

One-dimensional optimality conditions Point is the local minimizer of a -function if . Approximate a function around as a parabola using Taylor expansion guarantees the minimum at guarantees the parabola is convex

Gradient In multidimensional case, linearization of the function according to Taylor gives a multidimensional analogy of the derivative. The function , denoted as , is called the gradient of In one-dimensional case, it reduces to standard definition of derivative

Gradient In Euclidean space ( ), can be represented in standard basis in the following way: i-th place which gives

Example 1: gradient of a matrix function Given (space of real matrices) with standard inner product Compute the gradient of the function where is an matrix For square matrices

Example 2: gradient of a matrix function Compute the gradient of the function where is an matrix

Hessian Linearization of the gradient gives a multidimensional analogy of the second- order derivative. The function , denoted as is called the Hessian of Ludwig Otto Hesse (1811-1874) In the standard basis, Hessian is a symmetric matrix of mixed second-order derivatives

Optimality conditions, bis Point is the local minimizer of a -function if . for all , i.e., the Hessian is a positive definite matrix (denoted ) Approximate a function around as a parabola using Taylor expansion guarantees the minimum at guarantees the parabola is convex

Optimization algorithms Descent direction Step size

Generic optimization algorithm Start with some Determine descent direction Choose step size such that Update iterate Increment iteration counter Solution Until convergence Descent direction Step size Stopping criterion

Stopping criteria Near local minimum, (or equivalently ) Stop when gradient norm becomes small Stop when step size becomes small Stop when relative objective change becomes small

Line search Optimal step size can be found by solving a one-dimensional optimization problem One-dimensional optimization algorithms for finding the optimal step size are generically called exact line search

Armijo [ar-mi-xo] rule The function sufficiently decreases if Armijo rule (Larry Armijo, 1966): start with and decrease it by multiplying by some until the function sufficiently decreases

Descent direction How to descend in the fastest way? Go in the direction in which the height lines are the densest Devil’s Tower Topographic map

Steepest descent Directional derivative: how much changes in the direction (negative for a descent direction) Find a unit-length direction minimizing directional derivative

Steepest descent L2 norm L1 norm Normalized steepest descent Coordinate descent (coordinate axis in which descent is maximal)

Steepest descent algorithm Start with some Compute steepest descent direction Choose step size using line search Update iterate Increment iteration counter Until convergence

MATLAB® intermezzo Steepest descent

Condition number Condition number is the ratio of maximal and minimal eigenvalues of the Hessian , -1 -0.5 0.5 1 -1 -0.5 0.5 1 Problem with large condition number is called ill-conditioned Steepest descent convergence rate is slow for ill-conditioned problems

Q-norm Change of coordinates Q-norm L2 norm Function Gradient Descent direction

Preconditioning Using Q-norm for steepest descent can be regarded as a change of coordinates, called preconditioning Preconditioner should be chosen to improve the condition number of the Hessian in the proximity of the solution In system of coordinates, the Hessian at the solution is (a dream)

Newton method as optimal preconditioner Best theoretically possible preconditioner , giving descent direction Ideal condition number Problem: the solution is unknown in advance Newton direction: use Hessian as a preconditioner at each iteration

(quadratic function in ) Another derivation of the Newton method Approximate the function as a quadratic function using second-order Taylor expansion (quadratic function in ) Close to solution the function looks like a quadratic function; the Newton method converges fast

Newton method Start with some Compute Newton direction Choose step size using line search Update iterate Increment iteration counter Until convergence

Frozen Hessian Observation: close to the optimum, the Hessian does not change significantly Reduce the number of Hessian inversions by keeping the Hessian from previous iterations and update it once in a few iterations Such a method is called Newton with frozen Hessian

Cholesky factorization Decompose the Hessian where is a lower triangular matrix Solve the Newton system in two steps Andre Louis Cholesky (1875-1918) Forward substitution Backward substitution Complexity: , better than straightforward matrix inversion

Truncated Newton Solve the Newton system approximately A few iterations of conjugate gradients or other algorithm for the solution of linear systems can be used Such a method is called truncated or inexact Newton

Non-convex optimization Using convex optimization methods with non-convex functions does not guarantee global convergence! There is no theoretical guaranteed global optimization, just heuristics Local minimum Global minimum Good initialization Multiresolution

Iterative majorization Construct a majorizing function satisfying . Majorizing inequality: for all is convex or easier to optimize w.r.t.

Iterative majorization Start with some Find such that Update iterate Increment iteration counter Solution Until convergence

Constrained optimization MINEFIELD CLOSED ZONE

Constrained optimization problems Generic constrained minimization problem where are inequality constraints are equality constraints A subset of the search space in which the constraints hold is called feasible set A point belonging to the feasible set is called a feasible solution A minimizer of the problem may be infeasible!

An example Equality constraint Inequality constraint Feasible set Inequality constraint is active at point if , inactive otherwise A point is regular if the gradients of equality constraints and of active inequality constraints are linearly independent

Lagrange multipliers Main idea to solve constrained problems: arrange the objective and constraints into a single function and minimize it as an unconstrained problem is called Lagrangian and are called Lagrange multipliers

KKT conditions If is a regular point and a local minimum, there exist Lagrange multipliers and such that for all and for all such that for active constraints and zero for inactive constraints Known as Karush-Kuhn-Tucker conditions Necessary but not sufficient!

KKT conditions Sufficient conditions: If the objective is convex, the inequality constraints are convex and the equality constraints are affine, and for all and for all such that for active constraints and zero for inactive constraints then is the solution of the constrained problem (global constrained minimizer)

The gradient of objective and constraint must line up at the solution Geometric interpretation Consider a simpler problem: Equality constraint The gradient of objective and constraint must line up at the solution

Penalty methods Define a penalty aggregate where and are parametric penalty functions For larger values of the parameter , the penalty on the constraint violation is stronger

Penalty methods Inequality penalty Equality penalty

Penalty methods Start with some and initial value of Find by solving an unconstrained optimization problem initialized with Set Update Solution Until convergence