Tutorial 12 Unconstrained optimization Conjugate gradients.

Slides:



Advertisements
Similar presentations
Instabilities of SVD Small eigenvalues -> m+ sensitive to small amounts of noise Small eigenvalues maybe indistinguishable from 0 Possible to remove small.
Advertisements

Optimization.
Optimization : The min and max of a function
Optimization of thermal processes
Optimization 吳育德.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.
PARTIAL DERIVATIVES 14. PARTIAL DERIVATIVES 14.6 Directional Derivatives and the Gradient Vector In this section, we will learn how to find: The rate.
Steepest Decent and Conjugate Gradients (CG). Solving of the linear equation system.
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
Visual Recognition Tutorial
Jonathan Richard Shewchuk Reading Group Presention By David Cline
1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)
Numerical Optimization
Function Optimization Newton’s Method. Conjugate Gradients
Unconstrained Optimization Rong Jin. Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in.
1cs542g-term Notes  Extra class this Friday 1-2pm  If you want to receive s about the course (and are auditing) send me .
Optimization Methods One-Dimensional Unconstrained Optimization
1 Chapter 8: Linearization Methods for Constrained Problems Book Review Presented by Kartik Pandit July 23, 2010 ENGINEERING OPTIMIZATION Methods and Applications.
Constrained Optimization
Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn
Optimization Methods One-Dimensional Unconstrained Optimization
Tutorial 10 Iterative Methods and Matrix Norms. 2 In an iterative process, the k+1 step is defined via: Iterative processes Eigenvector decomposition.
Function Optimization. Newton’s Method Conjugate Gradients Method
Advanced Topics in Optimization
Why Function Optimization ?
Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.
Optimization Methods One-Dimensional Unconstrained Optimization
Unconstrained Optimization Rong Jin. Logistic Regression The optimization problem is to find weights w and b that maximizes the above log-likelihood How.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Computational Optimization
UNCONSTRAINED MULTIVARIABLE
ENCI 303 Lecture PS-19 Optimization 2
84 b Unidimensional Search Methods Most algorithms for unconstrained and constrained optimisation use an efficient unidimensional optimisation technique.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Functions of Two Random.
Nonlinear programming Unconstrained optimization techniques.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.
Elementary Linear Algebra Anton & Rorres, 9th Edition
Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Chapter 10 Minimization or Maximization of Functions.
1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.
Gradient Methods In Optimization
Survey of unconstrained optimization gradient based algorithms
Signal & Weight Vector Spaces
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
METHOD OF STEEPEST DESCENT ELE Adaptive Signal Processing1 Week 5.
Joint Moments and Joint Characteristic Functions.
Optimization in Engineering Design 1 Introduction to Non-Linear Optimization.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Maximum Norms & Nonnegative Matrices  Weighted maximum norm e.g.) x1x1 x2x2.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Regularized Least-Squares and Convex Optimization.
Function Optimization
CS5321 Numerical Optimization
Chap 3. The simplex method
Conjugate Gradient Method
~ Least Squares example
~ Least Squares example
Performance Optimization
Outline Preface Fundamentals of Optimization
Outline Preface Fundamentals of Optimization
Section 3: Second Order Methods
Conjugate Direction Methods
Presentation transcript:

Tutorial 12 Unconstrained optimization Conjugate gradients

M4CS Tutorial 11 Suppose that we want to minimize the quadratic function where Q is a symmetric positive definite matrix, and x has n components. As we well know, the minimum x* is the solution to the linear system The explicit solution of this system (Newton’s Method) requires about O(n 3 ) operations and O(n 2 ) memory, which is very expensive. Method of Conjugate Gradients

M4CS Tutorial 11 We now consider an alternative solution method that does not need the inversion of Q, but only the gradient of f(x k ) (just like SD, but better) evaluated at n different points x 1, …, x n. Conjugate Gradients 2 Conjugate Gradient Gradient

M4CS Tutorial 11 Conjugate Gradients 3 Consider, for example, the case n=3, in which the variable x in f(x) is a three-dimensional vector. Then the quadratic function f(x) is constant over 3D ellipsoids, called isosurfaces, centered at the minimum x*. How can we start from a point x 0 on one of these ellipsoids and reach x* by a finite sequence of one-dimensional searches? In the steepest descent, for the poorly conditioned Hessians, orthogonal directions lead to many small steps, that lead to slow convergence.

M4CS Tutorial 11 Conjugate Gradients: Spherical Case In the spherical case, the very first step in the direction of the gradient takes us to x* right away. Suppose however that we cannot afford to compute this special direction p 1 orthogonal to p 0, but that we can only compute some direction p 1 orthogonal to p 0 (there is an n-1 dimensional space of such directions!) and reach the minimum of f(x) in this direction. In that case n steps will take us to x* of the sphere, since coordinate of the minimum in each on the n directions is independent of others.

M4CS Tutorial 11 Conjugate Gradients: Elliptical Case Any set of orthogonal directions, with a line search in each direction, will lead to the minimum for spherical isosurfaces. Given an arbitrary set of ellipsoidal isosurfaces, there is a one-to-one mapping with a spherical system: if Q = UΣU T is the SVD of the symmetric positive definite matrix Q, then we can write,where

M4CS Tutorial 11 Elliptical Case 2 Consequently, there must be a condition for the original problem (in terms of Q ) that is equivalent to orthogonality for the spherical problem. If two directions y i and y j are orthogonal in the spherical context, that is, if, where What does this translate into in terms of the directions x i and x j for the ellipsoidal problem? We have This condition is called Q -conjugacy, or Q -orthogonality : if this equation holds, then x i and x j are said to be Q -conjugate or Q - orthogonal to each other, or simply "conjugate". (2) (1)

M4CS Tutorial 11 Elliptical Case 3 In summary, if we can find n directions p 0,…,p n-1, that are mutually conjugate, i.e. comply with (2), and if we do line minimization along each direction p i we reach the minimum in at most n steps. Such algorithm, would be named “Conjugate Direction (CD)”. A special case we will consider is to drive the construction of these directions from the local gradients, thus giving birth to the “Conjugate Gradients”. Of course, we cannot use the transformation (1) in the algorithm, because Σ and especially U T are too large. So for computational efficiency we need to find a method for generating n conjugate directions without using SVD or other complex processes on the Hessian Q.

M4CS Tutorial 11 Hestenes Stiefel Procedure Here First step is like NSD with optimal line search Once we have the new solution, the gradient is evaluated there The next search direction is built by taking the current gradient vector and Q-orthogonalizing it with all previous directions

M4CS Tutorial 11 Hestenes Stiefel Procedure 2 Let us check that the directrions are indeed Q-conjugate. First, it is easy to see that p 1 and p 0 are conjugate. Now assume that p 0,…, p k are already mutually conjugate and let us verify that p k+1 is conjugate to each of them, i.e. for arbitrary j: One can see that the vectors p k are found by a generalization of Gram- Schmidt to produce conjugate rather than orthogonal vectors. In practical cases, it can be worth to dismiss except for. For the sake of simplicity, we will assume that this is the case.

M4CS Tutorial 11 Removing the Hessian In the described algorithm the expression for p k contains the Hessian Q, which is too large. We now show that p k can be rewritten in terms of the gradient values g k and g k+1 only. To this end, we notice that or Proof: So that

M4CS Tutorial 11 We can therefore write and Q has disappeared. This expression for p k can be further simplified by noticing that because the line along p k is tangent to an isosurface at x k+l, while the gradient g k+l is orthogonal to the isosurface at x k+l. Similarly, Then, the denominator of becomes: Removing the Hessian 2

M4CS Tutorial 11 In conclusion, we obtain the Polak-Ribiere formula Polak-Ribiere formula

M4CS Tutorial 11 When the function f(x) is arbitrary, the same algorithm can be used, but n iterations will not suffice, since the Hessian, which was constant for the quadratic case, now is a function of x k. Strictly speaking, we then lose conjugacy, since p k and p k+l are associated to different Hessians. That is the reason why it is worth to keep conjugacy only between p k+1 and p k, setting ). However, as the algorithm approaches the minimum x*, the quadratic approximation becomes more and more valid, and a few cycles of n iterations each will achieve convergence. General Case

M4CS Tutorial 11 Consider the elliptic function: f(x,y)=(x-1) 2 +2(y-1) 2 and find the first three terms of Taylor expansion. Find the first step of Steepest Descent, from (0,0). Conjugate gradients: example 1 2 -f’(0)

M4CS Tutorial 11 Conjugate gradients: example 1 1 -f’(0)