Outline Preface Fundamentals of Optimization

Slides:



Advertisements
Similar presentations
1 OR II GSLM Outline  some terminology  differences between LP and NLP  basic questions in NLP  gradient and Hessian  quadratic form  contour,
Advertisements

Optimization of thermal processes
Optimization 吳育德.
Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.
Steepest Decent and Conjugate Gradients (CG). Solving of the linear equation system.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Function Optimization Newton’s Method. Conjugate Gradients
Tutorial 12 Unconstrained optimization Conjugate gradients.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Optimization Methods One-Dimensional Unconstrained Optimization
Gradient Methods May Preview Background Steepest Descent Conjugate Gradient.
Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn
Optimization Methods One-Dimensional Unconstrained Optimization
12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.
Function Optimization. Newton’s Method Conjugate Gradients Method
Advanced Topics in Optimization
Why Function Optimization ?
Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.
Optimization Methods One-Dimensional Unconstrained Optimization
Tier I: Mathematical Methods of Optimization

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Computational Optimization
UNCONSTRAINED MULTIVARIABLE
ENCI 303 Lecture PS-19 Optimization 2
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.
Nonlinear programming Unconstrained optimization techniques.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Variations on Backpropagation.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Matrices CHAPTER 8.9 ~ Ch _2 Contents  8.9 Power of Matrices 8.9 Power of Matrices  8.10 Orthogonal Matrices 8.10 Orthogonal Matrices 
Optimal Control.
Eigenvalues and Eigenvectors
Function Optimization
Elementary Linear Algebra Anton & Rorres, 9th Edition
Eigenvalues and Eigenvectors
CHAPTER 8.9 ~ 8.16 Matrices.
Vector Calculus and Quadratic function
Chapter 14.
CS5321 Numerical Optimization
Chap 3. The simplex method
Non-linear Least-Squares
Variations on Backpropagation.
CS5321 Numerical Optimization
Chapter 10. Numerical Solutions of Nonlinear Systems of Equations
Numerical Analysis Lecture 16.
CS5321 Numerical Optimization
Optimization Methods TexPoint fonts used in EMF.
~ Least Squares example
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
~ Least Squares example
Variations on Backpropagation.
Least Squares Now, we go back to consider a simple problem: Ax=b
Nonlinear programming
Performance Optimization
Eigenvalues and Eigenvectors
Outline Preface Fundamentals of Optimization
Section 3: Second Order Methods
1 Newton’s Method.
Vector Spaces COORDINATE SYSTEMS © 2012 Pearson Education, Inc.
Eigenvalues and Eigenvectors
Conjugate Direction Methods
Presentation transcript:

Outline Preface Fundamentals of Optimization ®Copyright of Shun-Feng Su Outline Preface Fundamentals of Optimization Unconstrained Optimization Ideas of finding solutions One-Dimensional Search Gradient Methods Newton’s Method and Its Variations

®Copyright of Shun-Feng Su Newton’s Method In the above, only the first derivatives (gradients) are used to define a suitable search direction. This is referred to as the incremental search. If direct approaches are considered, the task is to find solution for df(x)/dx=0 or f(x)=0. Recall in one-dimension, we also mentioned about the newton’s method, which is to find f(x)=0 as x(k+1)=x(k)- f’(x(k))/f’’(x(k)). In other works, we can use the second derivatives to define the search.

®Copyright of Shun-Feng Su Newton’s Method This is referred to as Newton’s method or Newton- Raphson method. Note that the Newton’s formula is obtained from the quadratic form. Thus, the idea is that given a starting point, a quadratic approximation for the objective function at this point is obtained. By using the Newton’s method, the minimizer of the quadratic function is obtained. Then this minimizer is used as the next starting point to obtain the related quadratic function and its corresponding minimizer. This kind of procedure is repeated.

®Copyright of Shun-Feng Su Newton’s Method

®Copyright of Shun-Feng Su Newton’s Method In fact, the approach can be stated as the following:, given an objective function f(x): Define f(x)=g(x) and 2f(x)=H(x). Then the search algorithm is x(k+1)=x(k)H-1(x(k))g(x(k)) For the requirement of minimum, H(x(k))>0 This recursive formula is referred to as Newton’s method. Newton’s method indeed performs better than the steepest descent method does if the initial point is close to the minimizer. Matrix, > means positive definite

®Copyright of Shun-Feng Su Newton’s Method Example: consider the starting point [3, -1, 0, 1]T for Ans:

Newton’s Method The search algorithm is x(k+1)=x(k)H-1(x(k))g(x(k)) ®Copyright of Shun-Feng Su Newton’s Method The search algorithm is x(k+1)=x(k)H-1(x(k))g(x(k)) x(1)=x(0)H-1(x(0))g(x(0)), g(x(0))=[306, -144, -2, -310]T H(x(0))= , H-1(x(0))=  x(1)=[1.5873, -0.1587, 0.2540, 0.2540]T , f(x(1))=31.8

Newton’s Method With similar process, we can get ®Copyright of Shun-Feng Su Newton’s Method With similar process, we can get x(2)=[1.0582, -0.1058, 0.1694, 0.1694]T and , f(x(2))=6.28 Conduct the process again… x(3)=[0.7037, -0.0704, 0.11121, 0.1111]T and , f(x(2))=1.24 Continue… The approach seems promising for finding the minimizer.

®Copyright of Shun-Feng Su Newton’s Method In fatc, the approach can be stated as the following:, given an objective function f(x): Define f(x)=g(x) and 2f(x)=H(x). Then the search algorithm is x(k+1)=x(k)H-1(x(k))g(x(k)) For the requirement of minimum, H(x(k))>0 This recursive formula is referred to as Newton’s method. Newton’s method indeed performs better than the steepest descent method does if the initial point is close to the minimizer. Matrix, > means positive definite

®Copyright of Shun-Feng Su Newton’s Method As in a one variable case, there is no guarantee that newton’s algorithm heads in the direction of decreasing values of the object function if H(x(k)) is not positive definite. Even if H(x(k))>0, Newton’s method may not be a descent method (i.e., maybe f(x(k+1)) f(x(k))). Again, Newton’s method has superior convergence properties when the starting point is near the solution.

®Copyright of Shun-Feng Su Newton’s Method When f(x) is a quadratic function, Newton’s method reaches f(x)= 0 (the minimizer) in just one step. Consider a quadratic function as f(x)=1/2 xTQx-bTx Assuming Q is a symmetric matrix. Then, f(x)=g(x)=Qx-b and 2f(x)=H(x)=Q Given an initial x(0), x(1)=x(0)H-1(x(0))g(x(0)) =x(0)Q-1(Qx(0)-b) =Q-1b=x*

®Copyright of Shun-Feng Su Newton’s Method Definition: Given a sequence {x(k)}, that converges to x* (limk||x(k)x*||=0), we say that the order of convergence is p, where p , if If for all p>0, then we say the order of convergence is .

®Copyright of Shun-Feng Su Newton’s Method Theorem: Suppose that fC 3, and x* is a point such that f(x)= 0 and H(x*) is invertible. Then for all x(0) sufficiently close to x*, Newton’s method is well defined for all k and converges to x* with the order at least 2. Note that the order of convergence of Newton’s algorithm for any is  initial point. Theorem: The order of convergence of the steepest descent algorithm is 1 in the worst case.

Convergence of Newton’s Method ®Copyright of Shun-Feng Su Convergence of Newton’s Method

Convergence of Steepest Descent ®Copyright of Shun-Feng Su Convergence of Steepest Descent

®Copyright of Shun-Feng Su Newton’s Method Theorem: let {x(k)} be sequence generated by Newton’s method. If H(x(k))>0 and g(x(k))0, then the direction d(k)=H-1(x(k))g(x(k))= x(k+1)x(k) is the descent direction. The descent direction means there exists an 0 such that for any [0, 0], f(x(k)+d(k))<f(x(k)). Proof: Define ()=f(x(k)+d(k)). Then ’()= f(x(k)+d(k))Td(k). ’(0)=f(x(k))Td(k)=gT(x(k))H-1(x(k))g(x(k))<0 In other words, f(x(k)+d(k))<f(x(k)) for a small .

x(k+1)=x(k)kH-1(x(k))g(x(k)) ®Copyright of Shun-Feng Su Newton’s Method Since the direction d(k)=H-1(x(k))g(x(k)) is the descent direction, it is then possible to have the following modification of Newton’s method: x(k+1)=x(k)kH-1(x(k))g(x(k)) where k =arg min0 f(x(k)H-1(x(k))g(x(k))) Similar to the steepest descent, we can perform line search on the H-1(x(k))g(x(k)) direction. It can be concluded that this modified Newton’s method has the descent property.

®Copyright of Shun-Feng Su Newton’s Method A drawback for Newton’s method is we need to calculate H and then H-1, which may be computational expensive and may have some problems in finding inverse when the number of variables is large. Another problem is that the Hessian matrix may not be positive definite. We will discuss some approaches for those problems in the following.

x(k+1)=x(k)(H (x(k))+kI)-1g(x(k)) where k>0. ®Copyright of Shun-Feng Su Newton’s Method If the Hessian matrix is not positive definite, the search may not be in a descent direction. To overcome this problem, the Levenberg- Marquardt modification is considered. x(k+1)=x(k)(H (x(k))+kI)-1g(x(k)) where k>0. The idea is to make it positive definite. Not positive definite means some eigenvalues are not positive, then by adding some sufficient large k it will make the matrix become (H (x(k))+kI) positive definite.

®Copyright of Shun-Feng Su Newton’s Method The Levenberg-Marquardt modification of Newton’s method becomes Newton’s methods when k0 and become a gradient method with a small step size when k. In practice, we can start with a small value of k, and then slowly increase it until the iteration is descent (i.e., f(x(k+1))< f(x(k))). Homework for prob-4: 9.1 and 9.3.

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods The class of conjugate direction methods can be viewed as intermediate between the steepest descent method and Newton’s method. The conjugate direction methods have the following properties: Solve quadratics of n variables in n steps. The usual implementation does not require the Hessian matrix. No operation (inverse or even storage) on nn matrices are required.

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods The conjugate direction methods typically can perform better than the steepest descent method, but worse than Newton’s method. The crucial factor in the efficiency of an iterative search algorithm is the direction of search at each iteration. Thus, the conjugate direction methods are to define the so-called conjugate direction in the search.

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods Definition: Let Q be a real symmetric matrix. The directions d(0), d(1), …, d(m) are Q-conjugate, if for all ij, we have d(i)TQd(j)=0. Lemma: Let Q be a symmetric positive definite nn matrix. If the directions d(0), d(1), …, d(k) are non- zero and Q-conjugate, then they are linearly independent. Proof: Let 0, … k, be scalars such that 0d(0)+1d(1) … +kd(k)=0.  Pre-multiply d(i)TQ. d(i)TQd(i)=0 (other terms are 0 by Q-conjugate) Since d(i)0, i=0, for i=0, 1, …, k.  L.I.

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods Example: Q= (symmetric positive definite) All leading principal minors are all positive. 1=3, 2=det( )=12, 3=det(Q)=20 Let d(0)=[1, 0, 0]T. Find d(1) in d(0)TQd(1)=0. 3d1(1)+ d3(1)=0; select d1(1)=1, d2(1)=0, d3(1)=3. Find d(2) with d(0)TQd(2)=0 and d(1)TQd(2)=0.  3d1(2)+d3(2)=0 and 6d2(2)8d3(2)=0 d3=[1, 4, -3]T

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods A systematic procedure of finding Q-conjugate vectors is the Gram-Schmidt process (finding an orthonormal basis) as follows. Given a set of linearly independent vectors, p(0), p(1), …, p(n-1), the Gram-Schmidt process is d(0)=p(0), and d(k+1)=p(k+1) . then d(0), d(1), …, d(n-1) are Q-conjugate.

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods Consider a quadratic function as f(x)=1/2 xTQx-bTx Q is a symmetric positive definite matrix. It is easy to see the global minimizer satisfies Qx=b. Basic conjugate direction algorithm: Given a starting point x(0) and Q-conjugate vector d(0), d(1), …, d(n-1), x(k+1)=x(k)+kd(k) with k , where f(x(k))=Qx(k)-b.

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods For any starting point x(0), the basic conjugate direction algorithm (Q-conjugate vector d(0), d(1), …, d(n-1)) converges to the unique x* in n steps. Since d(0), d(1), …, d(n-1) are linearly independent, x*  x(0)= 0d(0)+1d(1) … +n-1d(n-1) (basis) Pre-multiply d(k)TQ, for k=0, 1, …, n-1. We have d(i)TQ(x*  x(0))=kd(k)TQd(k). Then k .

Conjugate Direction Methods ®Copyright of Shun-Feng Su Conjugate Direction Methods x(i+1)=x(i)+id(i), then after k steps, x(k)=x(0)+1d(1)+ … +k-1d(k-1). Then, x*x(0)=(x*x(k))+(x(k)x(0)) Pre-multiply d(k)TQ. d(k)TQ(x*x(0))=d(k)TQ(x*x(k))+ 0 (orthogonal) =d(k)Tf(x(k)) (note f(x(k))= Qx(k)b and Qx*=b) d(0), d(1), …, d(n-1)) x* in n steps. Then k = k and x*=x(n).