Presentation is loading. Please wait.

Presentation is loading. Please wait.

Conjugate Direction Methods

Similar presentations


Presentation on theme: "Conjugate Direction Methods"— Presentation transcript:

1 Conjugate Direction Methods
®Copyright of Shun-Feng Su Conjugate Direction Methods The class of conjugate direction methods can be viewed as intermediate between the steepest descent method and Newton’s method. The conjugate direction methods have the following properties: Solve quadratics of n variables in n steps. The usual implementation does not require the Hessian matrix. No operation (inverse or even storage) on nn matrices are required.

2 Conjugate Direction Methods
®Copyright of Shun-Feng Su Conjugate Direction Methods The conjugate direction methods typically can perform better than the steepest descent method, but worse than Newton’s method. The crucial factor in the efficiency of an iterative search algorithm is the direction of search at each iteration. Thus, the conjugate direction methods are to define the so-called conjugate direction in the search.

3 Conjugate Direction Methods
®Copyright of Shun-Feng Su Conjugate Direction Methods Definition: Let Q be a real symmetric matrix. The directions d(0), d(1), …, d(m) are Q-conjugate, if for all ij, we have d(i)TQd(j)=0. Lemma: Let Q be a symmetric positive definite nn matrix. If the directions d(0), d(1), …, d(k) are non- zero and Q-conjugate, then they are linearly independent. Proof: Let 0, … k, be scalars such that 0d(0)+1d(1) … +kd(k)=0.  Pre-multiply d(i)TQ. d(i)TQd(i)=0 (other terms are 0 by Q-conjugate) Since d(i)0, i=0, for i=0, 1, …, k.  L.I.

4 Conjugate Direction Methods
®Copyright of Shun-Feng Su Conjugate Direction Methods Example: Q= (symmetric positive definite) All leading principal minors are all positive. 1=3, 2=det( )=12, 3=det(Q)=20 Let d(0)=[1, 0, 0]T. Find d(1) in d(0)TQd(1)=0. 3d1(1)+ d3(1)=0; select d1(1)=1, d2(1)=0, d3(1)=3. Find d(2) with d(0)TQd(2)=0 and d(1)TQd(2)=0.  3d1(2)+d3(2)=0 and 6d2(2)8d3(2)=0 d3=[1, 4, -3]T

5 Conjugate Direction Methods
®Copyright of Shun-Feng Su Conjugate Direction Methods A systematic procedure of finding Q-conjugate vectors is the Gram-Schmidt process (finding an orthonormal basis) as follows. Given a set of linearly independent vectors, p(0), p(1), …, p(n-1), the Gram-Schmidt process is d(0)=p(0), and d(k+1)=p(k+1) then d(0), d(1), …, d(n-1) are Q-conjugate.

6 Conjugate Direction Methods
®Copyright of Shun-Feng Su Conjugate Direction Methods Consider a quadratic function as f(x)=1/2 xTQx-bTx Q is a symmetric positive definite matrix. It is easy to see the global minimizer satisfies Qx=b. Basic conjugate direction algorithm: Given a starting point x(0) and Q-conjugate vector d(0), d(1), …, d(n-1), x(k+1)=x(k)+kd(k) with k , where f(x(k))=Qx(k)-b.

7 Conjugate Direction Methods
®Copyright of Shun-Feng Su Conjugate Direction Methods For any starting point x(0), the basic conjugate direction algorithm (Q-conjugate vector d(0), d(1), …, d(n-1)) converges to the unique x* in n steps. Since d(0), d(1), …, d(n-1) are linearly independent, x*  x(0)= 0d(0)+1d(1) … +n-1d(n-1) (basis) Pre-multiply d(k)TQ, for k=0, 1, …, n-1. We have d(i)TQ(x*  x(0))=kd(k)TQd(k). Then k

8 Conjugate Direction Methods
®Copyright of Shun-Feng Su Conjugate Direction Methods x(i+1)=x(i)+id(i), then after k steps, x(k)=x(0)+1d(1)+ … +k-1d(k-1). Then, x*x(0)=(x*x(k))+(x(k)x(0)) Pre-multiply d(k)TQ. d(k)TQ(x*x(0))=d(k)TQ(x*x(k))+ 0 (orthogonal) =d(k)Tf(x(k)) (note f(x(k))= Qx(k)b and Qx*=b) d(0), d(1), …, d(n-1)) x* in n steps. Then k = k and x*=x(n).

9 Conjugate Direction Methods
®Copyright of Shun-Feng Su Conjugate Direction Methods Example: f(x1, x2)=1/2xT xxT Let x(0)=[0, 0]T. It is easy to very that d(0)=[1, 0]T and d(1)=[-3/8, 3/4]T is Q-conjugate. Then f(x(0))=g(0)= [1, -1]T. 0= x(1)=x(0)+0d(0)= [-1/4, 0]T Next, f(x(1))=g(1)b=Qx(1)= [0, -3/2]T. 1= , then x(2)=x(1)+1d(1)= [-1, 3/2]T. It is easy to see x(2)=x*.

10 Conjugate Direction Methods
®Copyright of Shun-Feng Su Conjugate Direction Methods Lemma: In the conjugation direction algorithm, g(k+1)Td(i) =0 for all 0kn-1 and 0ik. Proof: Q(x(k+1)x(k))=g(k+1)g(k) (since g(k)=Qx(k)b) Thus, we have g(k+1)=g(k)+kQd(k). Then, we can proof the lemma by induction. <induction> Basic rules: 1. prove it is true for k=0. g(1)Td(0) =0 2. Assume it is true for k=i. Prove k=i+1 is also true. 3. Then by induction, it is true for all 0kn-1 and 0ik.

11 Conjugate Direction Methods
®Copyright of Shun-Feng Su Conjugate Direction Methods For g(1)Td(0)=0, g(1)Td(0)=(Qx(1)b)Td(0), where x(1)=x(0)+0d(0) and 0= , Then we have g(1)Td(0)=0. This in fact implies 0=arg min f(x(0)+d(0)). because f(x(0)+d(0))/=g(1)Td(0). With the lemma, in the conjugate direction algorithm, we also have k=arg min f(x(k)+d(k)).

12 Conjugate Direction Methods
®Copyright of Shun-Feng Su Conjugate Direction Methods For the induction step: assume g(k)Td(i)=0. To prove g(k+1)Td(i)=0 for 0ik. To use g(k+1)=g(k)+kQd(k), it is easy to prove g(k+1)Td(i)=0 for 0ik-1. To prove g(k+1)Td(k)=0, by use the detailed item, all terms are canceled. From this lemma, we have f(x(k+1))=minf(x(k)+d(k)). Also f(x(k+1))=min{all i}f(x(0)+ id(i)).

13 Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods The conjugate gradient algorithm does not use pre-specified conjugate directions, but instead computes the directions as the algorithm progresses. At each stage, the direction is calculated as a linear combination of the previous direction and the current gradient. The idea is that the direction is Q-conjugate to all previous directions.

14 Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods The direction is calculated as a linear combination of the previous direction and the current gradient, i.e., d(k+1)=g(k+1)+kd(k). Then k It can be proved that with such an approach, the obtained d(0), d(1), …, d(n-1) are Q-conjugate.

15 Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods

16 Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods

17 Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods

18 Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods

19 Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods

20 Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods

21 Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods Note that the conjugate direction methods are obtained from the quadratic form. Thus, for non-quadratic problem, for a starting point, a quadratic approximation for the objective function at this point is obtained. When the point is near the solution, its quadratic approximation behaves approximately. But now in the process, Q is no longer a constant matrix.

22 Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods If Q is calculated each time, it may be computationally expensive. It can be found that Q is only needed for the calculation of k and k. Since k =arg min0 f(x(k)+d(k)), the value of k can be obtained by a numerical line search. For k, it can be approximated by only using the gradients. Three modifications are introduced here.

23 Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods The Hestenes-Stiefel formula: Recall that k The Hestenes-Stiefel formula is to replace the term Qd(k) by (g(k+1)g(k))/k. (because x(k+1)=x(k)+kd(k), then pre-multiply Q. With the fact that g(k)=Qx(k) b and g(k+1)=Qx(k+1) b, we can have Qd(k)=(g(k+1)g(k))/k). In other words, k is true for quadratic forms, but now non-quadratic forms are considered.

24 Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods The Polak-Ribiere formula: Starting from the Hestenes-Stiefel formula k From the Lemma, we have g(k+1)Td(i) =0, for all 0kn-1 and 0ik and d(k)=g(k)+k-1d(k-1). Then g(k)Td(k)=g(k)Tg(k)+k-1g(k)Td(k-1)=g(k)Tg(k).  k Lemma is true for quadratic forms, but now non-quadratic forms are considered.

25 Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods The Fletcher-Reeves formula: Starting from the Polak-Ribiere formula k Similarly, g(k+1)Td(k)=g(k+1)Tg(k)+k-1g(k+1)Td(k-1). With the Lemma, g(k+1)Td(i) =0, for all 0kn-1 and 0ik, then we have g(k+1)Tg(k)=0.  k Lemma is true for quadratic forms, but now non-quadratic forms are considered.

26 Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods With the above modifications, Q is not needed. But, there are still some slight modifications: The termination criterion g(k+1)=0 is not practical. The algorithm will not stop in n steps. The Q- conjugacy of those vectors will deteriorate in the process. Thus, need to re-initiaize the direction vector after a few iterations(usually, n or n+1). If the line search is not accurate, the Hestenes- Stiefel formula is recommended.

27 Conjugate Gradient Methods
®Copyright of Shun-Feng Su Conjugate Gradient Methods In general, the choice of which formula to use depends on the objective function. There is no definite superior for any formula. Nevertheless, a global convergence analysis suggests that the Fletcher-Reeves formula may be better. There is another suggestion for k Home work for Prob-5, 10.2, 10.5 and 10.7


Download ppt "Conjugate Direction Methods"

Similar presentations


Ads by Google