Conjugate Gradient Problem: SD too slow to converge if NxN H matrix is ill-conditioned. SD: dx = - g (slow but no inverse to store or compute) CG: dx = -p (fast but no inverse to compute+store) GN: dx = -H-1 g (fast but expensive) Solution: Conjugate Gradient converges in N iterations if NxN H is S.P.D. Quasi-Newton Condition: g’ – g = Hdx’ (g’-g)/dx’= d2g/dx2
Outline CG Algorithm Step Length: Polak-Ribiere vs Fletcher-Reeves CG Soln to Even & Overdetermined Equations Regularized CG Preconditioned CG Non-Linear CG
Conjugate Gradient . dxT g=0 -g dx Quasi-Newton Condition: g’ – g = Hdx’ (1) g’ dx’Tg’= dx’T f(x*)T = 0 D dxT g=0 . dx’ x* -g dx’ dx’ dx’ Kiss point dx For dx’ at the bullseye x*, g’=0 so eqn. 1 becomes, after multiplying by dx and recalling dxT g=0, dxT (g’-g)=0 zero at bullseye. Hence, Conjugacy Condition: 0 = dxTHdx’ (2) x’ = x + a p (where p is conjugate to previous direction) (3)
(no longer going downhill) Conjugate Gradient Quasi-Newton Condition: g’ – g = Hdx’ (1) Conjugacy Condition: 0 = dxHdx’ (2) x’ = x + a p (where p is conjugate to previous direction and a linear combo of dx & g) (3) For i = 1:nit end 0 = dxT H(bdx - g) Solve for b s.t. dx conjugate to dx’ find b find a p { dxTHdx dxT Hg b = p= bdx - g x* g dx’ dx’ = dx + ap x=x+ dx’ Solve for a s.t. dx’ kisses contour (no longer going downhill) Kiss point dx dxTHdx dxT g a =
Conjugate Gradient For i = 1:nit find b p= bdx - g find a dxTHdx dxT Hg b = find b p= bdx - g dxTHdx dxT g a = find a dx’ = dx + ap x=x+ dx’ end Recall, aHd (k-1) = g(k) - g(k-1)
Outline CG Algorithm Step Length: Polak-Ribiere vs Fletcher-Reeves CG Soln to Even & Overdetermined Equations Regularized CG Preconditioned CG Non-Linear CG
Conjugate Gradient For i = 1:nit find b p= dx + bg find a dx’ = dx + ap For i = 1:nit find b find a p= dx + bg x=x+ dx’ end dxTHdx dxT Hg b = dxT g a = Fletcher-Reeves Polak-Ribierre Not going downhill if moving perpindicular to gradient -g
Outline CG Algorithm Step Length: Polak-Ribiere vs Fletcher-Reeves CG Soln to Even & Overdetermined Equations Regularized CG Preconditioned CG Non-Linear CG
Conjugate Gradient: Lx=d dxT g=0 . dx’ x* -g dx’ dx’ dx’ Kiss point dx Conjugate Gradient: Lx=d
x* -g dk1 Kiss point dk
Conjugate Gradient: LTLx=LTd Compared to square system of equations, the gradient for overdetermined system of equations has an extra LT However, LLT has squared condition number
Conjugate Gradient Convergence Well conditioned In most dimensions Poorly conditioned In every dimension If NxN H is linear SPD then convergence in N iterations, but in practice much sooner. Stopping sooner is a form of regularization by excluding small eigenvalue components
Outline CG Algorithm Step Length: Polak-Ribiere vs Fletcher-Reeves CG Soln to Even & Overdetermined Equations Regularized CG Preconditioned CG Non-Linear CG
Regularized Conjugate Gradient Balance between solution That minimizes misfit and one that minimizes penalty
Outline CG Algorithm Step Length: Polak-Ribiere vs Fletcher-Reeves CG Soln to Even & Overdetermined Equations Regularized CG Preconditioned CG Non-Linear CG
Preconditioned Conjugate Gradient Find a cheap approximate inverse P~H-1 so that PH~I. Thus, Ill-conditioned system of equations: Hx=-g Well-conditioned system of equations: PHx=-Pg PHx=-Pg A cheap approximate inverse is [H-1]ii ~ 1/Hii . Warning: PH should be SPD
Outline CG Algorithm Step Length: Polak-Ribiere vs Fletcher-Reeves CG Soln to Even & Overdetermined Equations Regularized CG Preconditioned CG Non-Linear CG
Non-linear Conjugate Gradient Reset to gradient direction after every approximately 3-5 iterations Locally quadratic