Conjugate Direction Methods

Slides:



Advertisements
Similar presentations
Optimization.
Advertisements

Optimization of thermal processes
Optimization 吳育德.
Steepest Decent and Conjugate Gradients (CG). Solving of the linear equation system.
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Function Optimization Newton’s Method. Conjugate Gradients
Tutorial 12 Unconstrained optimization Conjugate gradients.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Gradient Methods May Preview Background Steepest Descent Conjugate Gradient.
Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn
Optimization Methods One-Dimensional Unconstrained Optimization
Function Optimization. Newton’s Method Conjugate Gradients Method
Why Function Optimization ?
Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Computational Optimization
UNCONSTRAINED MULTIVARIABLE
Collaborative Filtering Matrix Factorization Approach
Diophantine Approximation and Basis Reduction
Gram-Schmidt Orthogonalization
ENCI 303 Lecture PS-19 Optimization 2
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
Chapter 3 Determinants Linear Algebra. Ch03_2 3.1 Introduction to Determinants Definition The determinant of a 2  2 matrix A is denoted |A| and is given.
Solution of Nonlinear Functions
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Chapter 10 Minimization or Maximization of Functions.
1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.
Chapter 2 Determinants. With each square matrix it is possible to associate a real number called the determinant of the matrix. The value of this number.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Maximum Norms & Nonnegative Matrices  Weighted maximum norm e.g.) x1x1 x2x2.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Linear Algebra Engineering Mathematics-I. Linear Systems in Two Unknowns Engineering Mathematics-I.
Numerical Methods for Inverse Kinematics Kris Hauser ECE 383 / ME 442.
Optimal Control.
MAT 322: LINEAR ALGEBRA.
Function Optimization
Bounded Nonlinear Optimization to Fit a Model of Acoustic Foams
7.7 Determinants. Cramer’s Rule
Elementary Linear Algebra Anton & Rorres, 9th Edition
Non-linear Minimization
CHAPTER 8.9 ~ 8.16 Matrices.
CS5321 Numerical Optimization
Chap 3. The simplex method
Conjugate Gradient Problem: SD too slow to converge if NxN H matrix is ill-conditioned. SD: dx = - g (slow but no inverse to store or compute) CG: dx =
Collaborative Filtering Matrix Factorization Approach
Chapter 10. Numerical Solutions of Nonlinear Systems of Equations
Numerical Analysis Lecture 16.
Conjugate Gradient Method
CS5321 Numerical Optimization
Ch2: Adaline and Madaline
§1-3 Solution of a Dynamical Equation
Linear Algebra Lecture 39.
Numerical Analysis Lecture 17.
Solving Linear Systems: Iterative Methods and Sparse Systems
Least Squares Now, we go back to consider a simple problem: Ax=b
Nonlinear programming
Performance Optimization
Outline Preface Fundamentals of Optimization
Outline Preface Fundamentals of Optimization
Section 3: Second Order Methods
Linear Algebra Lecture 16.
NULL SPACES, COLUMN SPACES, AND LINEAR TRANSFORMATIONS
Vector Spaces COORDINATE SYSTEMS © 2012 Pearson Education, Inc.
Presentation transcript:

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods The class of conjugate direction methods can be viewed as intermediate between the steepest descent method and Newton’s method. The conjugate direction methods have the following properties: Solve quadratics of n variables in n steps. The usual implementation does not require the Hessian matrix. No operation (inverse or even storage) on nn matrices are required.

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods The conjugate direction methods typically can perform better than the steepest descent method, but worse than Newton’s method. The crucial factor in the efficiency of an iterative search algorithm is the direction of search at each iteration. Thus, the conjugate direction methods are to define the so-called conjugate direction in the search.

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods Definition: Let Q be a real symmetric matrix. The directions d(0), d(1), …, d(m) are Q-conjugate, if for all ij, we have d(i)TQd(j)=0. Lemma: Let Q be a symmetric positive definite nn matrix. If the directions d(0), d(1), …, d(k) are non- zero and Q-conjugate, then they are linearly independent. Proof: Let 0, … k, be scalars such that 0d(0)+1d(1) … +kd(k)=0.  Pre-multiply d(i)TQ. d(i)TQd(i)=0 (other terms are 0 by Q-conjugate) Since d(i)0, i=0, for i=0, 1, …, k.  L.I.

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods Example: Q= (symmetric positive definite) All leading principal minors are all positive. 1=3, 2=det( )=12, 3=det(Q)=20 Let d(0)=[1, 0, 0]T. Find d(1) in d(0)TQd(1)=0. 3d1(1)+ d3(1)=0; select d1(1)=1, d2(1)=0, d3(1)=3. Find d(2) with d(0)TQd(2)=0 and d(1)TQd(2)=0.  3d1(2)+d3(2)=0 and 6d2(2)8d3(2)=0 d3=[1, 4, -3]T

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods A systematic procedure of finding Q-conjugate vectors is the Gram-Schmidt process (finding an orthonormal basis) as follows. Given a set of linearly independent vectors, p(0), p(1), …, p(n-1), the Gram-Schmidt process is d(0)=p(0), and d(k+1)=p(k+1) . then d(0), d(1), …, d(n-1) are Q-conjugate.

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods Consider a quadratic function as f(x)=1/2 xTQx-bTx Q is a symmetric positive definite matrix. It is easy to see the global minimizer satisfies Qx=b. Basic conjugate direction algorithm: Given a starting point x(0) and Q-conjugate vector d(0), d(1), …, d(n-1), x(k+1)=x(k)+kd(k) with k , where f(x(k))=Qx(k)-b.

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods For any starting point x(0), the basic conjugate direction algorithm (Q-conjugate vector d(0), d(1), …, d(n-1)) converges to the unique x* in n steps. Since d(0), d(1), …, d(n-1) are linearly independent, x*  x(0)= 0d(0)+1d(1) … +n-1d(n-1) (basis) Pre-multiply d(k)TQ, for k=0, 1, …, n-1. We have d(i)TQ(x*  x(0))=kd(k)TQd(k). Then k .

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods x(i+1)=x(i)+id(i), then after k steps, x(k)=x(0)+1d(1)+ … +k-1d(k-1). Then, x*x(0)=(x*x(k))+(x(k)x(0)) Pre-multiply d(k)TQ. d(k)TQ(x*x(0))=d(k)TQ(x*x(k))+ 0 (orthogonal) =d(k)Tf(x(k)) (note f(x(k))= Qx(k)b and Qx*=b) d(0), d(1), …, d(n-1)) x* in n steps. Then k = k and x*=x(n).

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods Example: f(x1, x2)=1/2xT xxT Let x(0)=[0, 0]T. It is easy to very that d(0)=[1, 0]T and d(1)=[-3/8, 3/4]T is Q-conjugate. Then f(x(0))=g(0)= [1, -1]T. 0= x(1)=x(0)+0d(0)= [-1/4, 0]T Next, f(x(1))=g(1)b=Qx(1)= [0, -3/2]T. 1= , then x(2)=x(1)+1d(1)= [-1, 3/2]T. It is easy to see x(2)=x*.

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods Lemma: In the conjugation direction algorithm, g(k+1)Td(i) =0 for all 0kn-1 and 0ik. Proof: Q(x(k+1)x(k))=g(k+1)g(k) (since g(k)=Qx(k)b) Thus, we have g(k+1)=g(k)+kQd(k). Then, we can proof the lemma by induction. <induction> Basic rules: 1. prove it is true for k=0. g(1)Td(0) =0 2. Assume it is true for k=i. Prove k=i+1 is also true. 3. Then by induction, it is true for all 0kn-1 and 0ik.

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods For g(1)Td(0)=0, g(1)Td(0)=(Qx(1)b)Td(0), where x(1)=x(0)+0d(0) and 0= , Then we have g(1)Td(0)=0. This in fact implies 0=arg min f(x(0)+d(0)). because f(x(0)+d(0))/=g(1)Td(0). With the lemma, in the conjugate direction algorithm, we also have k=arg min f(x(k)+d(k)).

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods For the induction step: assume g(k)Td(i)=0. To prove g(k+1)Td(i)=0 for 0ik. To use g(k+1)=g(k)+kQd(k), it is easy to prove g(k+1)Td(i)=0 for 0ik-1. To prove g(k+1)Td(k)=0, by use the detailed item, all terms are canceled. From this lemma, we have f(x(k+1))=minf(x(k)+d(k)). Also f(x(k+1))=min{all i}f(x(0)+ id(i)).

Conjugate Gradient Methods ®Copyright of Shun-Feng Su Conjugate Gradient Methods The conjugate gradient algorithm does not use pre-specified conjugate directions, but instead computes the directions as the algorithm progresses. At each stage, the direction is calculated as a linear combination of the previous direction and the current gradient. The idea is that the direction is Q-conjugate to all previous directions.

Conjugate Gradient Methods ®Copyright of Shun-Feng Su Conjugate Gradient Methods The direction is calculated as a linear combination of the previous direction and the current gradient, i.e., d(k+1)=g(k+1)+kd(k). Then k . . It can be proved that with such an approach, the obtained d(0), d(1), …, d(n-1) are Q-conjugate.

Conjugate Gradient Methods ®Copyright of Shun-Feng Su Conjugate Gradient Methods

Conjugate Gradient Methods ®Copyright of Shun-Feng Su Conjugate Gradient Methods

Conjugate Gradient Methods ®Copyright of Shun-Feng Su Conjugate Gradient Methods

Conjugate Gradient Methods ®Copyright of Shun-Feng Su Conjugate Gradient Methods

Conjugate Gradient Methods ®Copyright of Shun-Feng Su Conjugate Gradient Methods

Conjugate Gradient Methods ®Copyright of Shun-Feng Su Conjugate Gradient Methods

Conjugate Gradient Methods ®Copyright of Shun-Feng Su Conjugate Gradient Methods Note that the conjugate direction methods are obtained from the quadratic form. Thus, for non-quadratic problem, for a starting point, a quadratic approximation for the objective function at this point is obtained. When the point is near the solution, its quadratic approximation behaves approximately. But now in the process, Q is no longer a constant matrix.

Conjugate Gradient Methods ®Copyright of Shun-Feng Su Conjugate Gradient Methods If Q is calculated each time, it may be computationally expensive. It can be found that Q is only needed for the calculation of k and k. Since k =arg min0 f(x(k)+d(k)), the value of k can be obtained by a numerical line search. For k, it can be approximated by only using the gradients. Three modifications are introduced here.

Conjugate Gradient Methods ®Copyright of Shun-Feng Su Conjugate Gradient Methods The Hestenes-Stiefel formula: Recall that k . . The Hestenes-Stiefel formula is to replace the term Qd(k) by (g(k+1)g(k))/k. (because x(k+1)=x(k)+kd(k), then pre-multiply Q. With the fact that g(k)=Qx(k) b and g(k+1)=Qx(k+1) b, we can have Qd(k)=(g(k+1)g(k))/k). In other words, k . is true for quadratic forms, but now non-quadratic forms are considered.

Conjugate Gradient Methods ®Copyright of Shun-Feng Su Conjugate Gradient Methods The Polak-Ribiere formula: Starting from the Hestenes-Stiefel formula k From the Lemma, we have g(k+1)Td(i) =0, for all 0kn-1 and 0ik and d(k)=g(k)+k-1d(k-1). Then g(k)Td(k)=g(k)Tg(k)+k-1g(k)Td(k-1)=g(k)Tg(k).  k . Lemma is true for quadratic forms, but now non-quadratic forms are considered.

Conjugate Gradient Methods ®Copyright of Shun-Feng Su Conjugate Gradient Methods The Fletcher-Reeves formula: Starting from the Polak-Ribiere formula k Similarly, g(k+1)Td(k)=g(k+1)Tg(k)+k-1g(k+1)Td(k-1). With the Lemma, g(k+1)Td(i) =0, for all 0kn-1 and 0ik, then we have g(k+1)Tg(k)=0.  k . Lemma is true for quadratic forms, but now non-quadratic forms are considered.

Conjugate Gradient Methods ®Copyright of Shun-Feng Su Conjugate Gradient Methods With the above modifications, Q is not needed. But, there are still some slight modifications: The termination criterion g(k+1)=0 is not practical. The algorithm will not stop in n steps. The Q- conjugacy of those vectors will deteriorate in the process. Thus, need to re-initiaize the direction vector after a few iterations(usually, n or n+1). If the line search is not accurate, the Hestenes- Stiefel formula is recommended.

Conjugate Gradient Methods ®Copyright of Shun-Feng Su Conjugate Gradient Methods In general, the choice of which formula to use depends on the objective function. There is no definite superior for any formula. Nevertheless, a global convergence analysis suggests that the Fletcher-Reeves formula may be better. There is another suggestion for k Home work for Prob-5, 10.2, 10.5 and 10.7