Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.

Slides:

Advertisements

Similar presentations

Advertisements

Optimization Methods TexPoint fonts used in EMF.

Optimization : The min and max of a function

Optimization of thermal processes

Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.

Thursday, April 25 Nonlinear Programming Theory Separable programming Handouts: Lecture Notes.

A Combined Optimization Method for Solving the Inverse Kinematics Problem of Mechanical Manipulators Roland Mai B659, Spring 2010 Indiana University.

1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)

1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.

Numerical Optimization

Function Optimization Newton’s Method. Conjugate Gradients

Unconstrained Optimization Rong Jin. Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in.

1cs542g-term Notes  Extra class this Friday 1-2pm  If you want to receive s about the course (and are auditing) send me .

Design Optimization School of Engineering University of Bradford 1 Numerical optimization techniques Unconstrained multi-parameter optimization techniques.

1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.

Efficient Methodologies for Reliability Based Design Optimization

Nonlinear programming

Engineering Optimization

Linear and generalised linear models

Function Optimization. Newton’s Method Conjugate Gradients Method

Advanced Topics in Optimization

Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.

Unconstrained Optimization Rong Jin. Logistic Regression The optimization problem is to find weights w and b that maximizes the above log-likelihood How.

3.7. O THER G AME P HYSICS A PPROACHES Overview of other game engine physics approaches.

UNCONSTRAINED MULTIVARIABLE

Collaborative Filtering Matrix Factorization Approach

84 b Unidimensional Search Methods Most algorithms for unconstrained and constrained optimisation use an efficient unidimensional optimisation technique.

1 Jorge Nocedal Northwestern University With S. Hansen, R. Byrd and Y. Singer IPAM, UCLA, Feb 2014 A Stochastic Quasi-Newton Method for Large-Scale Learning.

Computing a posteriori covariance in variational DA I.Gejadze, F.-X. Le Dimet, V.Shutyaev.

Qualifier Exam in HPC February 10 th, Quasi-Newton methods Alexandru Cioaca.

Nonlinear programming Unconstrained optimization techniques.

Fin500J: Mathematical Foundations in Finance

Boundary Value Problems and Least Squares Minimization

Zorica Stanimirović Faculty of Mathematics, University of Belgrade

MA/CS 375 Fall MA/CS 375 Fall 2002 Lecture 31.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.

“On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice.

Solution of Nonlinear Functions

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

MODELING MATTER AT NANOSCALES 3. Empirical classical PES and typical procedures of optimization Geometries from energy derivatives.

559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.

Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.

Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

Lecture 13. Geometry Optimization References Computational chemistry: Introduction to the theory and applications of molecular and quantum mechanics, E.

1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.

Inexact SQP methods for equality constrained optimization Frank Edward Curtis Department of IE/MS, Northwestern University with Richard Byrd and Jorge.

Variations on Backpropagation.

Survey of unconstrained optimization gradient based algorithms

Hand-written character recognition

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

Optimization in Engineering Design 1 Introduction to Non-Linear Optimization.

INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 1 Primal Methods.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 117 Penalty and Barrier Methods General classical constrained.

Non-linear Minimization

CS5321 Numerical Optimization

CS5321 Numerical Optimization

Chapter 10. Numerical Solutions of Nonlinear Systems of Equations

CS5321 Numerical Optimization

CS5321 Numerical Optimization

~ Least Squares example

~ Least Squares example

Performance Optimization

Outline Preface Fundamentals of Optimization

Section 3: Second Order Methods

Computer Animation Algorithms and Techniques

Conjugate Direction Methods

CS5321 Numerical Optimization

Presentation transcript:

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 102 Background Assumption: the evaluation of the Hessian is impractical or costly. Central idea underlying quasi-Newton methods is to use an approximation of the inverse Hessian. Form of approximation differs among methods. Question: What is the simplest approximation? The quasi-Newton methods that build up an approximation of the inverse Hessian are often regarded as the most sophisticated for solving unconstrained problems.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 103 Modified Newton Method Question: What is a measure of effectiveness for the Classical Modified Newton Method?

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 104 Quasi-Newton Methods Big question: What is the update matrix? In quasi-Newton methods, instead of the true Hessian, an initial matrix H 0 is chosen (usually H 0 = I) which is subsequently updated by an update formula: H k+1 = H k + H k u where H k u is the update matrix. This updating can also be done with the inverse of the Hessian H -1 as follows: Let B = H -1 ; then the updating formula for the inverse is also of the form B k+1 = B k + B k u

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 105 Hessian Matrix Updates Given two points x k and x k+1, we define g k =  y(x k ) and g k+1 =  y(x k+1 ). Further, let p k = x k+1 - x k, then g k+1 - g k ≈ H(x k ) p k If the Hessian is constant, then g k+1 - g k = H p k which can be rewritten as q k = H p k If the Hessian is constant, then the following condition would hold as well H -1 k+1 q i = p i 0 ≤ i ≤ k This is called the quasi-Newton condition.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 106 Rank One and Rank Two Updates Let B = H -1, then the quasi-Newton condition becomes B k+1 q i = p i 0 ≤ i ≤ k Substitute the updating formula B k+1 = B k + B u k and the condition becomes p i = B k q i + B u k q i (1) (remember: p i = x i+1 - x i and q i = g i+1 - g i ) Note: There is no unique solution to funding the update matrix B u k A general form is B u k = a uu T + b vv T where a and b are scalars and u and v are vectors satisfying condition (1). The quantities auu T and bvv T are symmetric matrices of (at most) rank one. Quasi-Newton methods that take b = 0 are using rank one updates. Quasi-Newton methods that take b ≠ 0 are using rank two updates. Note that b ≠ 0 provides more flexibility.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 107 Update Formulas Rank one updates are simple, but have limitations. Rank two updates are the most widely used schemes. The rationale can be quite complicated (see, e.g., Luenberger). Rank one updates are simple, but have limitations. Rank two updates are the most widely used schemes. The rationale can be quite complicated (see, e.g., Luenberger). The following two update formulas have received wide acceptance: Davidon -Fletcher-Powell (DFP) formula Broyden-Fletcher-Goldfarb-Shanno (BFGS) formula. The following two update formulas have received wide acceptance: Davidon -Fletcher-Powell (DFP) formula Broyden-Fletcher-Goldfarb-Shanno (BFGS) formula.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 108 Davidon-Fletcher-Powel Formula Earliest (and one of the most clever) schemes for constructing the inverse Hessian was originally proposed by Davidon (1959) and later developed by Fletcher and Powell (1963). It has the interesting property that, for a quadratic objective, it simultaneously generates the directions of the conjugate gradient method while constructing the inverse Hessian. The method is also referred to as the variable metric method (originally suggested by Davidon).

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 109 Broyden–Fletcher–Goldfarb–Shanno Formula

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 110 Some Comments on Broyden Methods Broyden–Fletcher–Goldfarb–Shanno formula is more complicated than DFP, but straightforward to apply BFGS update formula can be used exactly like DFP formula. Numerical experiments have shown that BFGS formula's performance is superior over DFP formula. Hence, BFGS is often preferred over DFP. Both DFP and BFGS updates have symmetric rank two corrections that are constructed from the vectors pk and Bkqk. Weighted combinations of these formulae will therefore also have the same properties. This observation leads to a whole collection of updates, know as the Broyden family, defined by: B  = (1 -  )B DFP +  B BFGS where  is a parameter that may take any real value.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 111 Quasi-Newton Algorithm Note: You do have to calculate the vector of first order derivatives g for each iteration. 1.Input x 0, B 0, termination criteria. 2.For any k, set S k = – B k g k. 3.Compute a step size  (e.g., by line search on y(x k +  S k )) and set x k+1 = x k +  S k. 4.Compute the update matrix B u k according to a given formula (say, DFP or BFGS) using the values q k = g k+1 - g k, p k = x k+1 - x k, and B k. 5.Set B k+1 = B k + B u k. 6.Continue with next k until termination criteria are satisfied.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 112 Some Closing Remarks Both DFP and BFGS methods have theoretical properties that guarantee superlinear (fast) convergence rate and global convergence under certain conditions. However, both methods could fail for general nonlinear problems. Specifically, DFP is highly sensitive to inaccuracies in line searches. Both methods can get stuck on a saddle-point. In Newton's method, a saddle-point can be detected during modifications of the (true) Hessian. Therefore, search around the final point when using quasi-Newton methods. Update of Hessian becomes "corrupted" by round-off and other inaccuracies. All kind of "tricks" such as scaling and preconditioning exist to boost the performance of the methods.