Quasi-Newton Methods Problem: SD, CG too slow to converge if NxN H matrix is ill-conditioned. SD: dx = - g (slow but no inverse to store or compute) QN:

Quasi-Newton Methods Problem: SD, CG too slow to converge if NxN H matrix is ill-conditioned. SD: dx = - g (slow but no inverse to store or compute) QN: dx = -p (fast but no inverse to compute+store) GN: dx = -H-1 g (fast but expensive) Solution: Quasi-Newton converges in N iterations if NxN H is S.P.D. Quasi-Newton Condition: g’ – g = Hdx’  (g’-g)/dx’= d2g/dx2

Outline Rank 1 QN Method Rank 2 QN Method: DFP Rank 2 QN Method: LBGF

where Hk-1 is a cheap approximate inverse and satisfies
Quasi-Newton Methods Key Idea; Iteratively precondition GN eqns Hdx=-g with preconditioner Hk-1 ~ H-1 so we solve dx(k) ~ -Hk-1 g (where Hk-1 is a cheap approximate inverse) x(k+1) = x(k) a Hk-1 g(k) where Hk-1 is a cheap approximate inverse and satisfies Quasi-Newton Condition: Hk+1-1 (g(k+1) – g(k)) = x(k+1) - x(k) g(k) dx(k) x(k+1) dx(k+1) x(k) x(k-1)

Rank 1 QN Methods HDx = g’-g  Dx = H-1(g’-g)  Dx ~ H1-1(g’-g)
1. H0-1 =I; x(1) = x(0) a H0-1 g(0) = x(0) a g(0) Note, H0-1 = I does not satisfy QN condition (g(1) – g(0)) = x(1) - x(0) (1) Dg(0) Dx(0) HDx = g’-g Require H1-1 is rank-one update: H1-1 = H auuT (2) Each column of uuT is integer multiple of any other column. Hence, it is rank one. For example: [u1 ](u1 u2 ) = [u1u1 u1 u2 ] [u2 ] = [u2u1 u2 u2 ] Both Nx1 u & a are found by where u is Nx1 vector and a is a constant. Dx(0) = (H0-1 + auuT )Dg(0) (3) Rearranging above we get auuT Dg (0) = Dx(0) H0(-1) Dg(0) (4) N equations and N+1 unknowns

Rank 1 QN Methods Dx(0) = (H0-1 + auuT )Dg(0) (3)
Rearranging above we get auuT Dg (0) = Dx(0) H0(-1) Dg(0) (4) N equations and N+1 unknowns

Rank 1 QN Methods Dx(0) = (H0-1 + auuT )Dg(0) (3)
Rearranging above we get auuT Dg (0) = Dx(0) H0(-1) Dg(0) (4) N equations and N+1 unknowns One possible solution to eq. 4 is u = Dx(0) H0(-1) Dg(0) where a = 1/[uTDg (0)] (5) [uTDg (0)] So eq. 5 can be plugged into eq 3 to give the 1st iterate ~H1-1 H1-1 = H0-1 +auuT (6) Show u in eq 5 satisfy QN condition Dx(0) = (H0-1 + auuT )Dg(0)

Rank 1 QN Methods u = Dx(0) - H0(-1) Dg(0 uk = Dx(k) - Hk(-1) Dg(k)
summarizing H1-1 = H0-1 +au uT uk = Dx(k) Hk(-1) Dg(k) generalizing Hk+1-1 = Hk-1 +auk ukT generalizing = Hk [Dx(k) Hk(-1) Dg(k) ] [Dx(k) Hk(-1) Dg(k) ]T [Dx(k) Hk(-1) Dg(k) ]T Dg(k) x(k+1) = x(k) a Hk-1 g(k) hence

Rank 1 QN Methods u = Dx(0) - H0(-1) Dg(0 H1-1 = H0-1 +au uT
summarizing H1-1 = H0-1 +au uT uk = Dx(k) Hk(-1) Dg(k) generalizing Hk+1-1 = Hk-1 +auk ukT generalizing = Hk [Dx(k) Hk(-1) Dg(k) ] [Dx(k) Hk(-1) Dg(k) ]T [Dx(k) Hk(-1) Dg(k) ]T Dg(k) x(k+1) = x(k) a Hk-1 g(k) hence = Hk [Dx(k) Hk(-1) Dg(k) ] [Dx(k) Hk(-1) Dg(k) ]T [Dx(k) Hk(-1) Dg(k) ]T Dg(k) Hk+1-1 For k=1:N end

Rank 2 QN Methods Dx(k) = Hk+1(-1) Dg(k) (a)
QN condition: Hk+1-1 = Hk auk ukT bvk vkT (b) Rank 2 update: = [Hk auk ukT bvk vkT ] Plug (b) into (a) we enforce QN condition Dx(k) = Hk+1(-1) Dg(k) Dg(k) Solutions uk, vk, a, and b are uk = Dx(k) ; vk = -Hk-1Dg(k) a = 1/[uTDg(k)]; b = 1/[v T Dg(k) ]

Outline Rank 1 QN Method Rank 2 QN Method: Limited Memory DFP
Rank 2 QN Method: LBGF

Limited MemoryRank 2 QN Methods (DFP)
Dx(k) = Hk+1(-1) Dg(k) QN condition: uk = Dx(k) ; vk = -Hk-1Dg(k) a = 1/[uTDg(k)]; b = 1/[v T Dg(k) ] Hk-1 computed using stored vector-vector products Hk-1 is also stored using vector-vector products. Perhaps do this for at most 10 iterates of vectors. Twice as fast as CG? For non-linear, QN more accurate curvature estimate than CG so faster convergence. Why? Too expensive to store Hk so only store vectors Dx(k) , Dg(k) , uk , vk

Non-linear Quasi-Newton
Reset to gradient direction after every approximately 3-5 iterations Locally quadratic

LBFGS Quasi-Newton The DFP formula is quite effective, but it was soon superseded by the BFGS formula, which is its dual (interchanging the roles of Dx and g).. Nocedal, Jorge & Wright, Stephen J. (1999), Numerical Optimization, Springer-Verlag,ISBN Broyden–Fletcher– Goldfarb–Shanno (BFGS) method

Quasi-Newton Methods Problem: SD, CG too slow to converge if NxN H matrix is ill-conditioned. SD: dx = - g (slow but no inverse to store or compute) QN:

Similar presentations

Presentation on theme: "Quasi-Newton Methods Problem: SD, CG too slow to converge if NxN H matrix is ill-conditioned. SD: dx = - g (slow but no inverse to store or compute) QN:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Quasi-Newton Methods Problem: SD, CG too slow to converge if NxN H matrix is ill-conditioned. SD: dx = - g (slow but no inverse to store or compute) QN:

Similar presentations

Presentation on theme: "Quasi-Newton Methods Problem: SD, CG too slow to converge if NxN H matrix is ill-conditioned. SD: dx = - g (slow but no inverse to store or compute) QN:"— Presentation transcript:

Similar presentations

About project

Feedback