Conjugate Direction Methods

Conjugate Direction Methods
®Copyright of Shun-Feng Su Conjugate Direction Methods The class of conjugate direction methods can be viewed as intermediate between the steepest descent method and Newton’s method. The conjugate direction methods have the following properties: Solve quadratics of n variables in n steps. The usual implementation does not require the Hessian matrix. No operation (inverse or even storage) on nn matrices are required.

®Copyright of Shun-Feng Su Conjugate Direction Methods The conjugate direction methods typically can perform better than the steepest descent method, but worse than Newton’s method. The crucial factor in the efficiency of an iterative search algorithm is the direction of search at each iteration. Thus, the conjugate direction methods are to define the so-called conjugate direction in the search.

®Copyright of Shun-Feng Su Conjugate Direction Methods Definition: Let Q be a real symmetric matrix. The directions d(0), d(1), …, d(m) are Q-conjugate, if for all ij, we have d(i)TQd(j)=0. Lemma: Let Q be a symmetric positive definite nn matrix. If the directions d(0), d(1), …, d(k) are non- zero and Q-conjugate, then they are linearly independent. Proof: Let 0, … k, be scalars such that 0d(0)+1d(1) … +kd(k)=0.  Pre-multiply d(i)TQ. d(i)TQd(i)=0 (other terms are 0 by Q-conjugate) Since d(i)0, i=0, for i=0, 1, …, k.  L.I.

®Copyright of Shun-Feng Su Conjugate Direction Methods Example: Q= (symmetric positive definite) All leading principal minors are all positive. 1=3, 2=det( )=12, 3=det(Q)=20 Let d(0)=[1, 0, 0]T. Find d(1) in d(0)TQd(1)=0. 3d1(1)+ d3(1)=0; select d1(1)=1, d2(1)=0, d3(1)=3. Find d(2) with d(0)TQd(2)=0 and d(1)TQd(2)=0.  3d1(2)+d3(2)=0 and 6d2(2)8d3(2)=0 d3=[1, 4, -3]T

®Copyright of Shun-Feng Su Conjugate Direction Methods A systematic procedure of finding Q-conjugate vectors is the Gram-Schmidt process (finding an orthonormal basis) as follows. Given a set of linearly independent vectors, p(0), p(1), …, p(n-1), the Gram-Schmidt process is d(0)=p(0), and d(k+1)=p(k+1) then d(0), d(1), …, d(n-1) are Q-conjugate.

®Copyright of Shun-Feng Su Conjugate Direction Methods Consider a quadratic function as f(x)=1/2 xTQx-bTx Q is a symmetric positive definite matrix. It is easy to see the global minimizer satisfies Qx=b. Basic conjugate direction algorithm: Given a starting point x(0) and Q-conjugate vector d(0), d(1), …, d(n-1), x(k+1)=x(k)+kd(k) with k , where f(x(k))=Qx(k)-b.

®Copyright of Shun-Feng Su Conjugate Direction Methods For any starting point x(0), the basic conjugate direction algorithm (Q-conjugate vector d(0), d(1), …, d(n-1)) converges to the unique x* in n steps. Since d(0), d(1), …, d(n-1) are linearly independent, x*  x(0)= 0d(0)+1d(1) … +n-1d(n-1) (basis) Pre-multiply d(k)TQ, for k=0, 1, …, n-1. We have d(i)TQ(x*  x(0))=kd(k)TQd(k). Then k

®Copyright of Shun-Feng Su Conjugate Direction Methods x(i+1)=x(i)+id(i), then after k steps, x(k)=x(0)+1d(1)+ … +k-1d(k-1). Then, x*x(0)=(x*x(k))+(x(k)x(0)) Pre-multiply d(k)TQ. d(k)TQ(x*x(0))=d(k)TQ(x*x(k))+ 0 (orthogonal) =d(k)Tf(x(k)) (note f(x(k))= Qx(k)b and Qx*=b) d(0), d(1), …, d(n-1)) x* in n steps. Then k = k and x*=x(n).

®Copyright of Shun-Feng Su Conjugate Direction Methods Example: f(x1, x2)=1/2xT xxT Let x(0)=[0, 0]T. It is easy to very that d(0)=[1, 0]T and d(1)=[-3/8, 3/4]T is Q-conjugate. Then f(x(0))=g(0)= [1, -1]T. 0= x(1)=x(0)+0d(0)= [-1/4, 0]T Next, f(x(1))=g(1)b=Qx(1)= [0, -3/2]T. 1= , then x(2)=x(1)+1d(1)= [-1, 3/2]T. It is easy to see x(2)=x*.

®Copyright of Shun-Feng Su Conjugate Direction Methods Lemma: In the conjugation direction algorithm, g(k+1)Td(i) =0 for all 0kn-1 and 0ik. Proof: Q(x(k+1)x(k))=g(k+1)g(k) (since g(k)=Qx(k)b) Thus, we have g(k+1)=g(k)+kQd(k). Then, we can proof the lemma by induction. <induction> Basic rules: 1. prove it is true for k=0. g(1)Td(0) =0 2. Assume it is true for k=i. Prove k=i+1 is also true. 3. Then by induction, it is true for all 0kn-1 and 0ik.

®Copyright of Shun-Feng Su Conjugate Direction Methods For g(1)Td(0)=0, g(1)Td(0)=(Qx(1)b)Td(0), where x(1)=x(0)+0d(0) and 0= , Then we have g(1)Td(0)=0. This in fact implies 0=arg min f(x(0)+d(0)). because f(x(0)+d(0))/=g(1)Td(0). With the lemma, in the conjugate direction algorithm, we also have k=arg min f(x(k)+d(k)).

®Copyright of Shun-Feng Su Conjugate Direction Methods For the induction step: assume g(k)Td(i)=0. To prove g(k+1)Td(i)=0 for 0ik. To use g(k+1)=g(k)+kQd(k), it is easy to prove g(k+1)Td(i)=0 for 0ik-1. To prove g(k+1)Td(k)=0, by use the detailed item, all terms are canceled. From this lemma, we have f(x(k+1))=minf(x(k)+d(k)). Also f(x(k+1))=min{all i}f(x(0)+ id(i)).

Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods The conjugate gradient algorithm does not use pre-specified conjugate directions, but instead computes the directions as the algorithm progresses. At each stage, the direction is calculated as a linear combination of the previous direction and the current gradient. The idea is that the direction is Q-conjugate to all previous directions.

®Copyright of Shun-Feng Su Conjugate Gradient Methods The direction is calculated as a linear combination of the previous direction and the current gradient, i.e., d(k+1)=g(k+1)+kd(k). Then k It can be proved that with such an approach, the obtained d(0), d(1), …, d(n-1) are Q-conjugate.

®Copyright of Shun-Feng Su Conjugate Gradient Methods

®Copyright of Shun-Feng Su Conjugate Gradient Methods Note that the conjugate direction methods are obtained from the quadratic form. Thus, for non-quadratic problem, for a starting point, a quadratic approximation for the objective function at this point is obtained. When the point is near the solution, its quadratic approximation behaves approximately. But now in the process, Q is no longer a constant matrix.

®Copyright of Shun-Feng Su Conjugate Gradient Methods If Q is calculated each time, it may be computationally expensive. It can be found that Q is only needed for the calculation of k and k. Since k =arg min0 f(x(k)+d(k)), the value of k can be obtained by a numerical line search. For k, it can be approximated by only using the gradients. Three modifications are introduced here.

®Copyright of Shun-Feng Su Conjugate Gradient Methods The Hestenes-Stiefel formula: Recall that k The Hestenes-Stiefel formula is to replace the term Qd(k) by (g(k+1)g(k))/k. (because x(k+1)=x(k)+kd(k), then pre-multiply Q. With the fact that g(k)=Qx(k) b and g(k+1)=Qx(k+1) b, we can have Qd(k)=(g(k+1)g(k))/k). In other words, k is true for quadratic forms, but now non-quadratic forms are considered.

®Copyright of Shun-Feng Su Conjugate Gradient Methods The Polak-Ribiere formula: Starting from the Hestenes-Stiefel formula k From the Lemma, we have g(k+1)Td(i) =0, for all 0kn-1 and 0ik and d(k)=g(k)+k-1d(k-1). Then g(k)Td(k)=g(k)Tg(k)+k-1g(k)Td(k-1)=g(k)Tg(k).  k Lemma is true for quadratic forms, but now non-quadratic forms are considered.

®Copyright of Shun-Feng Su Conjugate Gradient Methods The Fletcher-Reeves formula: Starting from the Polak-Ribiere formula k Similarly, g(k+1)Td(k)=g(k+1)Tg(k)+k-1g(k+1)Td(k-1). With the Lemma, g(k+1)Td(i) =0, for all 0kn-1 and 0ik, then we have g(k+1)Tg(k)=0.  k Lemma is true for quadratic forms, but now non-quadratic forms are considered.

®Copyright of Shun-Feng Su Conjugate Gradient Methods With the above modifications, Q is not needed. But, there are still some slight modifications: The termination criterion g(k+1)=0 is not practical. The algorithm will not stop in n steps. The Q- conjugacy of those vectors will deteriorate in the process. Thus, need to re-initiaize the direction vector after a few iterations(usually, n or n+1). If the line search is not accurate, the Hestenes- Stiefel formula is recommended.

®Copyright of Shun-Feng Su Conjugate Gradient Methods In general, the choice of which formula to use depends on the objective function. There is no definite superior for any formula. Nevertheless, a global convergence analysis suggests that the Fletcher-Reeves formula may be better. There is another suggestion for k Home work for Prob-5, 10.2, 10.5 and 10.7

Conjugate Direction Methods

Similar presentations

Presentation on theme: "Conjugate Direction Methods"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Conjugate Direction Methods

Similar presentations

Presentation on theme: "Conjugate Direction Methods"— Presentation transcript:

Similar presentations

About project

Feedback