Quasi-Newton Methods Problem: SD, CG too slow to converge if NxN H matrix is ill-conditioned. SD: dx = - g (slow but no inverse to store or compute) QN:

Slides:



Advertisements
Similar presentations
Zhen Lu CPACT University of Newcastle MDC Technology Reduced Hessian Sequential Quadratic Programming(SQP)
Advertisements

Instabilities of SVD Small eigenvalues -> m+ sensitive to small amounts of noise Small eigenvalues maybe indistinguishable from 0 Possible to remove small.
Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.
Inexact SQP Methods for Equality Constrained Optimization Frank Edward Curtis Department of IE/MS, Northwestern University with Richard Byrd and Jorge.
Matrices & Systems of Linear Equations
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
Unconstrained Optimization Rong Jin. Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in.
Newton's Method for Functions of Several Variables
Unconstrained Optimization Rong Jin. Logistic Regression The optimization problem is to find weights w and b that maximizes the above log-likelihood How.
Solving quadratic equations Factorisation Type 1: No constant term Solve x 2 – 6x = 0 x (x – 6) = 0 x = 0 or x – 6 = 0 Solutions: x = 0 or x = 6 Graph.
Using Inverse Matrices Solving Systems. You can use the inverse of the coefficient matrix to find the solution. 3x + 2y = 7 4x - 5y = 11 Solve the system.

Using Adaptive Methods for Updating/Downdating PageRank Gene H. Golub Stanford University SCCM Joint Work With Sep Kamvar, Taher Haveliwala.
Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal February 12, 2007 Inexact Methods for PDE-Constrained Optimization.
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.
Qualifier Exam in HPC February 10 th, Quasi-Newton methods Alexandru Cioaca.
Newton's Method for Functions of Several Variables Joe Castle & Megan Grywalski.
Application of Differential Applied Optimization Problems.
Efficient Integration of Large Stiff Systems of ODEs Using Exponential Integrators M. Tokman, M. Tokman, University of California, Merced 2 hrs 1.5 hrs.
1 Nonlinear Equations Jyun-Ming Chen. 2 Contents Bisection False Position Newton Quasi-Newton Inverse Interpolation Method Comparison.
Elliptic PDEs and the Finite Difference Method
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Inexact SQP methods for equality constrained optimization Frank Edward Curtis Department of IE/MS, Northwestern University with Richard Byrd and Jorge.
Objective: Subtract integers. Solve for the unknown in each equation. 1.–4 + (-3) = a.
Notes Over 5.6 Quadratic Formula
Data assimilation for weather forecasting G.W. Inverarity 06/05/15.
Fast 3D Least-squares Migration with a Deblurring Filter Wei Dai.
Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 11:00-12:00.
Lesson 9.6 Topic/ Objective: To solve non linear systems of equations. EQ: How do you find the point of intersection between two equations when one is.
Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
EEE 431 Computational Methods in Electrodynamics
CSCE 441: Computer Graphics Forward/Inverse kinematics
Ch. 7 – Matrices and Systems of Equations
Root Finding Methods Fish 559; Lecture 15 a.
CSLT ML Summer Seminar (13)
Computational Optimization
Newton’s Method for Systems of Non Linear Equations
We will be looking for a solution to the system of linear differential equations with constant coefficients.
A Comparison of some Iterative Methods in Scientific Computing
Iterative Non-Linear Optimization Methods
Steepest Descent Optimization
CS5321 Numerical Optimization
Solving Nonlinear Equation
CS5321 Numerical Optimization
Conjugate Gradient Problem: SD too slow to converge if NxN H matrix is ill-conditioned. SD: dx = - g (slow but no inverse to store or compute) CG: dx =
CS5321 Numerical Optimization
Numerical Examples Example: Find the system of finite difference equations for the following problem. Steady state heat conduction through a plane wall.
CS5321 Numerical Optimization
Computers in Civil Engineering 53:081 Spring 2003
Chapter 10. Numerical Solutions of Nonlinear Systems of Equations
CS5321 Numerical Optimization
CS5321 Numerical Optimization
METHOD OF STEEPEST DESCENT
Numerical Analysis Lecture14.
Optimization Methods TexPoint fonts used in EMF.
Numerical Analysis Lecture13.
SOLUTION OF NONLINEAR EQUATIONS
~ Least Squares example
~ Least Squares example
Some Comments on Root finding
Section 3: Second Order Methods
Steepest Descent Optimization
Numerical Analysis Lecture11.
CS5321 Numerical Optimization
Numerical Analysis – Solving Nonlinear Equations
Conjugate Gradient Optimization
Matrices and Linear Equations
Integer LP: Algorithms
CS5321 Numerical Optimization
Presentation transcript:

Quasi-Newton Methods Problem: SD, CG too slow to converge if NxN H matrix is ill-conditioned. SD: dx = - g (slow but no inverse to store or compute) QN: dx = -p (fast but no inverse to compute+store) GN: dx = -H-1 g (fast but expensive) Solution: Quasi-Newton converges in N iterations if NxN H is S.P.D. Quasi-Newton Condition: g’ – g = Hdx’  (g’-g)/dx’= d2g/dx2

Outline Rank 1 QN Method Rank 2 QN Method: DFP Rank 2 QN Method: LBGF

where Hk-1 is a cheap approximate inverse and satisfies Quasi-Newton Methods Key Idea; Iteratively precondition GN eqns Hdx=-g with preconditioner Hk-1 ~ H-1 so we solve dx(k) ~ -Hk-1 g (where Hk-1 is a cheap approximate inverse) x(k+1) = x(k) + a Hk-1 g(k) where Hk-1 is a cheap approximate inverse and satisfies Quasi-Newton Condition: Hk+1-1 (g(k+1) – g(k)) = x(k+1) - x(k) g(k) dx(k) x(k+1) dx(k+1) x(k) x(k-1)

Rank 1 QN Methods HDx = g’-g  Dx = H-1(g’-g)  Dx ~ H1-1(g’-g) 1. H0-1 =I; x(1) = x(0) + a H0-1 g(0) = x(0) + a g(0) Note, H0-1 = I does not satisfy QN condition (g(1) – g(0)) = x(1) - x(0) (1) Dg(0) Dx(0) HDx = g’-g Require H1-1 is rank-one update: H1-1 = H0-1 + auuT (2) Each column of uuT is integer multiple of any other column. Hence, it is rank one. For example: [u1 ](u1 u2 ) = [u1u1 u1 u2 ] [u2 ] = [u2u1 u2 u2 ] Both Nx1 u & a are found by where u is Nx1 vector and a is a constant. Dx(0) = (H0-1 + auuT )Dg(0) (3) Rearranging above we get auuT Dg (0) = Dx(0) - H0(-1) Dg(0) (4) N equations and N+1 unknowns

Rank 1 QN Methods Dx(0) = (H0-1 + auuT )Dg(0) (3) Rearranging above we get auuT Dg (0) = Dx(0) - H0(-1) Dg(0) (4) N equations and N+1 unknowns

Rank 1 QN Methods Dx(0) = (H0-1 + auuT )Dg(0) (3) Rearranging above we get auuT Dg (0) = Dx(0) - H0(-1) Dg(0) (4) N equations and N+1 unknowns One possible solution to eq. 4 is u = Dx(0) - H0(-1) Dg(0) where a = 1/[uTDg (0)] (5) [uTDg (0)] So eq. 5 can be plugged into eq 3 to give the 1st iterate ~H1-1 H1-1 = H0-1 +auuT (6) Show u in eq 5 satisfy QN condition Dx(0) = (H0-1 + auuT )Dg(0)

Rank 1 QN Methods u = Dx(0) - H0(-1) Dg(0 uk = Dx(k) - Hk(-1) Dg(k) summarizing H1-1 = H0-1 +au uT uk = Dx(k) - Hk(-1) Dg(k) generalizing Hk+1-1 = Hk-1 +auk ukT generalizing = Hk-1 + [Dx(k) - Hk(-1) Dg(k) ] [Dx(k) - Hk(-1) Dg(k) ]T [Dx(k) - Hk(-1) Dg(k) ]T Dg(k) x(k+1) = x(k) + a Hk-1 g(k) hence

Rank 1 QN Methods u = Dx(0) - H0(-1) Dg(0 H1-1 = H0-1 +au uT summarizing H1-1 = H0-1 +au uT uk = Dx(k) - Hk(-1) Dg(k) generalizing Hk+1-1 = Hk-1 +auk ukT generalizing = Hk-1 + [Dx(k) - Hk(-1) Dg(k) ] [Dx(k) - Hk(-1) Dg(k) ]T [Dx(k) - Hk(-1) Dg(k) ]T Dg(k) x(k+1) = x(k) + a Hk-1 g(k) hence = Hk-1 + [Dx(k) - Hk(-1) Dg(k) ] [Dx(k) - Hk(-1) Dg(k) ]T [Dx(k) - Hk(-1) Dg(k) ]T Dg(k) Hk+1-1 For k=1:N end

Outline Rank 1 QN Method Rank 2 QN Method: DFP Rank 2 QN Method: LBGF

Rank 2 QN Methods Dx(k) = Hk+1(-1) Dg(k) (a) QN condition: Hk+1-1 = Hk-1 + auk ukT + bvk vkT (b) Rank 2 update: = [Hk-1 + auk ukT + bvk vkT ] Plug (b) into (a) we enforce QN condition Dx(k) = Hk+1(-1) Dg(k) Dg(k) Solutions uk, vk, a, and b are uk = Dx(k) ; vk = -Hk-1Dg(k) a = 1/[uTDg(k)]; b = 1/[v T Dg(k) ]

Outline Rank 1 QN Method Rank 2 QN Method: Limited Memory DFP Rank 2 QN Method: LBGF

Limited MemoryRank 2 QN Methods (DFP) Dx(k) = Hk+1(-1) Dg(k) QN condition: uk = Dx(k) ; vk = -Hk-1Dg(k) a = 1/[uTDg(k)]; b = 1/[v T Dg(k) ] Hk-1 computed using stored vector-vector products Hk-1 is also stored using vector-vector products. Perhaps do this for at most 10 iterates of vectors. Twice as fast as CG? For non-linear, QN more accurate curvature estimate than CG so faster convergence. Why? Too expensive to store Hk so only store vectors Dx(k) , Dg(k) , uk , vk

Non-linear Quasi-Newton Reset to gradient direction after every approximately 3-5 iterations Locally quadratic

Outline Rank 1 QN Method Rank 2 QN Method: DFP Rank 2 QN Method: LBGF

LBFGS Quasi-Newton The DFP formula is quite effective, but it was soon superseded by the BFGS formula, which is its dual (interchanging the roles of Dx and g).. Nocedal, Jorge & Wright, Stephen J. (1999), Numerical Optimization, Springer-Verlag,ISBN 0-387-98793-2 Broyden–Fletcher– Goldfarb–Shanno (BFGS) method