Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.

Slides:

Advertisements

Similar presentations

Curved Trajectories towards Local Minimum of a Function Al Jimenez Mathematics Department California Polytechnic State University San Luis Obispo, CA

Advertisements

ESSENTIAL CALCULUS CH11 Partial derivatives

Optimization Methods TexPoint fonts used in EMF.

Optimization : The min and max of a function

Introducción a la Optimización de procesos químicos. Curso 2005/2006 BASIC CONCEPTS IN OPTIMIZATION: PART II: Continuous & Unconstrained Important concepts.

Optimization of thermal processes

Optimization 吳育德.

Optimisation The general problem: Want to minimise some function F(x) subject to constraints, a i (x) = 0, i=1,2,…,m 1 b i (x)  0, i=1,2,…,m 2 where x.

Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.

PARTIAL DERIVATIVES 14. PARTIAL DERIVATIVES 14.6 Directional Derivatives and the Gradient Vector In this section, we will learn how to find: The rate.

Linear Discriminant Functions

1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)

Function Optimization Newton’s Method. Conjugate Gradients

Tutorial 12 Unconstrained optimization Conjugate gradients.

Optimization Methods One-Dimensional Unconstrained Optimization

Constrained Optimization

Optimization Mechanics of the Simplex Method

Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn

Optimization Methods One-Dimensional Unconstrained Optimization

12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.

Function Optimization. Newton’s Method Conjugate Gradients Method

Advanced Topics in Optimization

Why Function Optimization ?

Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.

Contents Optimisation Perceptron Convergence Conclusions CS 476: Networks of Neural Computation, CSD, UOC, 2009 WK1 - Introduction CS 476: Networks of.

Optimization Methods One-Dimensional Unconstrained Optimization

Tier I: Mathematical Methods of Optimization

What is Optimization? Optimization is the mathematical discipline which is concerned with finding the maxima and minima of functions, possibly subject.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.

UNCONSTRAINED MULTIVARIABLE

ENCI 303 Lecture PS-19 Optimization 2

Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.

84 b Unidimensional Search Methods Most algorithms for unconstrained and constrained optimisation use an efficient unidimensional optimisation technique.

Nonlinear programming Unconstrained optimization techniques.

Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.

1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.

1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.

Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh Dr. M. N. H. MOLLAH.

Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.

Sec 15.6 Directional Derivatives and the Gradient Vector

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Derivation Computational Simplifications Stability Lattice Structures.

Engineering Optimization Chapter 3 : Functions of Several Variables (Part 1) Presented by: Rajesh Roy Networks Research Lab, University of California,

559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.

Section 15.6 Directional Derivatives and the Gradient Vector.

Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.

Directional Derivatives and Gradients

Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

Tangent Planes and Normal Lines

Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.

Chapter 10 Minimization or Maximization of Functions.

Variations on Backpropagation.

Signal & Weight Vector Spaces

Performance Surfaces.

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

METHOD OF STEEPEST DESCENT ELE Adaptive Signal Processing1 Week 5.

Advanced Computer Graphics Optimization Part 2 Spring 2002 Professor Brogan.

INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.

Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.

D Nagesh Kumar, IISc Water Resources Systems Planning and Management: M2L2 Introduction to Optimization (ii) Constrained and Unconstrained Optimization.

Non-linear Minimization

Computational Optimization

Non-linear Least-Squares

~ Least Squares example

~ Least Squares example

Performance Surfaces.

Performance Optimization

Section 3: Second Order Methods

Multivariable optimization with no constraints

Presentation transcript:

Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct method such as Golden Section: b a yx One dimension Two dimensions Number of function evaluations increases as e n, where n is number of dimensions.

The Polytope Algorithm This is a direct search method. Also known as “simplex” method. In n dimensional case, at each stage we have n+1 points x 1, x 2,…,x n+1 such that: F(x 1 )  F(x 2 )   F(x n+1 ) The algorithm seeks to replace the worst point, x n+1, with a better one. The x i lie at the vertices of an n- dimensional polytope.

The Polytope Algorithm 2 The new point is formed by reflecting the worst point through the centroid of the best n vertices: Mathematically the new point can be written: x r = c +  (c-x n+1 ) where  >0 is the reflection coefficient. In two dimensions polytope is a triangle; in three dimensions it is a tetrahedron.

Polytope Example For n = 2 we have three points at each step. x3x3 x1x1 x2x2 cc-x3c-x3 (c-x3)(c-x3) (worst point) xrxr

Detailed Polytope Algorithm 1.Evaluate F(x r )  F r. If F 1  F r  F n, then x r replaces x n+1. 2.If F r < F 1 then x r is new best point and we assume direction of reflection is “good” and attempt to expand polytope in that direction by defining the point, x e = c +  (x r -c) where  >1. If F e < F r then x e replaces x n+1 ; otherwise x r replaces x n+1.

Detailed Polytope Algorithm 2 3.If F r > F n then the polytope is too big and we attempt to contract it by defining: x c = c +  (x n+1 -c) if F r  F n+1 x c = c +  (x r -c) if F r < F n+1 where 0<  <1. If F c < min(F r,F n+1 ) then x c replaces x n+1 ; otherwise a further contraction is done.

MATLAB Example Polytope >> banana >> [x,fval] = fminsearch(banana,[-1.2, 1],optimset('Display','iter'))

Polytope Example by Hand

Polytope Example Start with equilateral triangle: x 1 = (0,0)x 2 =(0,0.5) x 3 =(  3,1)/4 Take  =1,  =1.5, and  =0.5

Polytope Example: Step 1 Polytope is i123 xixi (0,0)(0,0.5)(0.433,0.25) F(x i ) Worst point is x 1, c = (x 2 + x 3 )/2 = (0.2165,0.375) Relabel points: x 3  x 1, x 1  x 3 x r = c +  (c- x 3 ) = (0.433,0.75) and F(x r )= F(x r )< F(x 1 ) so x r is best point so try to expand. x e = c +  (x r -c) = (0.5413,0.9375) and F(x e )= F(x e )< F(x r ) so accept expand

After Step 1

Polytope Example: Step 2 Polytope is i123 xixi (0.433,0.25)(0,0.5)(0.5413,0.9375) F(x i ) Worst point is x 2, c = (x 1 + x 3 )/2 = (0.4871,0.5938) Relabel points: x 3  x 1, x 2  x 3, x 1  x 2 x r = c +  (c- x 3 ) = (0.9743,0.6875) and F(x r )= F(x r )< F(x 1 ) so x r is best point so try to expand. x e = c +  (x r -c) = (1.2179,0.7344) and F(x e )= F(x e )>F(x r ) so reject expand.

After Step 2

Polytope Example: Step 3 Polytope is i123 xixi (0.5413,0.9375)( )(0.9743,0.6875) F(x i ) Worst point is x 2, c = (x 1 + x 3 )/2 = (0.7578,0.8125) Relabel points: x 3  x 1, x 2  x 3, x 1  x 2 x r = c +  (c- x 3 ) = (1.0826,1.375) and F(x r )= F(x r )>F(x 2 ) so polytope is too big. Need to contract. x c = c +  (x r -c) = (0.9202,1.0938) and F(x c )= F(x c )<F(x r ) so accept contraction.

After Step 3

Polytope Example: Step 4 Polytope is i123 xixi (0.9743,0.6875)(0.5413,0.9375)(0.9202,1.0938) F(x i ) Worst point is x 2, c = (x 1 + x 3 )/2 = (0.9472,0.8906) Relabel points: x 3  x 2, x 2  x 3 x r = c +  (c-x 3 ) = (1.3532,0.8438) and F(x r )= F(x r )>F(x 2 ) so polytope is too big. Need to contract. x c = c +  (x r -c) = (1.1502,0.8672) and F(x c )= F(x c )<F(x r ) so accept contraction.

After Step 4

Polytope Example: Step 5 Polytope is i123 xixi (0.9743,0.6875)(0.9202,1.0938)(1.1502,0.8672) F(x i ) Worst point is x 2, c = (x 1 + x 3 )/2 = (1.0622,0.7773) Relabel points: x 3  x 2, x 2  x 3 x r = c +  (c- x 3 ) = (1.2043,0.4609) and F(x r )= F(x r )  F(x 3 ) so polytope is too big. Need to contract. x c = c +  (x 3 -c) = (0.9912,0.9355) and F(x c )= F(x c )<F(x r ) so accept contraction.

After Step 5

Polytope Example: Step 6 Polytope is i123 xixi (0.9743,0.6875)(1.1502,0.8672)(0.9912,0.9355) F(x i ) Worst point is x 2, c = (x 1 + x 3 )/2 = (0.9827,0.8117) Relabel points: x 3  x 2, x 2  x 3 x r = c +  (c- x 3 ) = (0.8153,0.7559) and F(x r )= F(x r )>F(x 2 ) so polytope is too big. Need to contract. x c = c +  (x r -c) = (0.8990,0.7837) and F(x c )= F(x c )<F(x r ) so accept contraction.

Polytope Example: Final Result So after 6 steps the best estimate of the minimum is x = (0.8990,0.7837) for which F(x)=

Alternating Variables Method Start from point x = (a 1, a 2,…, a n ). Take first variable x 1, and minimise F(x 1, a 2,…, a n ) with respect to x 1. This gives x 1 = a 1. Take second variable x 2, and minimise F(a 1, x 2,…, a n ) with respect to x 2. This gives x 2 = a 2. Continue with each variable in turn until minimum is reached.

AVM in Two Dimensions Start Method of minimisation over each variable can be any univariate method.

AVM Example in 2D Minimise F(x,y)=x 2 +y 2 +xy-2x-4y Start at (0,0).

AVM Example in 2D xyF(x,y)|error|

Definition of Gradient Vector The gradient vector is: The gradient vector is also written as  F(x).

Definition of Hessian Matrix The Hessian matrix is defined as: The Hessian matrix is symmetric, and is also written as  2 F(x).

Conditions for a Minimum of a Multivariate Function 1.|g(x*)| = 0. That is, all partial derivatives are zero. 2.G(x*) is positive definite. That is, x T G(x*)x > 0 for all vectors x  0. The second condition implies that the eigenvalues of G(x*) are strictly positive.

Stationary Points If g(x*)=0 then x* is said to be a stationary point. There are 3 types of stationary point: 1.Minimum, e.g., x 2 +y 2 at (0,0) 2.Maximum, e.g., 1-x 2 -y 2 at (0,0) 3.Saddle Point, e.g., x 2 -y 2 at (0,0)

Definition: Level Surface F(x)=constant defines a “level surface”. For different values of the constant we generate different level surfaces. For example, in 3-D suppose F(x,y,z) = x 2 /4 + y 2 /9 + z 2 /4 F(x,y,z) = constant is an ellipsoid surface centred on the origin. Thus, the level surfaces are a series of concentric ellipsoidal surfaces. The gradient vector at point x is normal to the level surface passing through x.

Definition: Tangent Hyperplane For a differentiable multivariate function, F, the tangent hyperplane at the point x t on the surface F(x)=constant is normal to the gradient vector.

Definition: Quadratic Function If the Hessian matrix of F is constant then F is said to be a quadratic function. In this case F can be expressed as: F(x) = (1/2)x T Gx + c T x +  for a constant matrix G, vector c, and scalar .  F(x) = Gx + c and  2 F(x) = G.

Example Quadratic Function F(x,y) = x 2 + 2y 2 + xy – x + 2y Gradient vector is zero at stationary point, so Gx + c = 0 at stationary point Need to solve Gx = -c to find stationary point: x* = G -1 c  x* = (6/7 -5/7) T

Hessian Matrix Again We can predict the behaviour of a general nonlinear function near a stationary point, x*, by looking at the eigenvalues of the Hessian matrix. Let u j and j denote the jth eigenvector and eigenvalue of G. If j > 0 the function will increase as we move away from x* in direction u j. If j < 0 the function will decrease as we move away from x* in direction u j. If j = 0 the function will stay constant as we move away from x* in direction u j.

Example Again 1 = and 2 = , so F increases as we move away from the stationary point at (6/7 -5/7) T. So the stationary point is a minimum.

Example in 4D In MATLAB: >> c = [ ]’; >>G = [ ; ; ; ]; >>x = G\(-c) >>[u,lambda] = eigs(G)

Descent Methods Seek a general algorithm for unconstrained minimisation of a smooth multivariate function. Require that F decreases at each iteration. A method that imposes this type of condition is called a descent method.

A General Descent Algorithm Let x k be current iterate. If converged then quit; x k is estimate of minimum. Compute a nonzero vector p k giving direction of search. Compute a positive scalar step length,  k for which F(x k +  k p k ) < F(x k ). New estimate of minimum is x k+1 = x k +  k p k. Increment k by 1, and go to step 1.

Method of Steepest Descent Direction in which F decreases most steeply is -  F, so we use this as the search direction. New iterate is x k+1 = x k -  k  F, where  k is non-negative scalar chosen so that x k+1 is the minimum point along the line from x k in the direction -  F. Thus,  k minimises F(x k -  F) with respect to .

Steepest Descent Algorithm Initialise: x 0, k=0 Loop: u =  F(x k ) if |u|=0 then quit else minimise h(  )=F(x k -  u) to get  k x k+1 = x k -  k u k = k+1 if (not finished) go to Loop

Example F(x,y) = x 3 + y 3 - 2x 2 + 3y  F(x,y) = 0 gives 3x 2 -4x=0 so x = 0 or 4/3; and, 3y 2 +6y=0 so y=0 or -2. (x,y)GType (0,0)IndefiniteSaddle point (0,-2)Negative definiteMaximum (4/3,0)Positive definiteMinimum (4/3,-2)IndefiniteSaddle point

Solve with Steepest Descent Take x 0 = (1 -1) T, then  F(x 0 )=(-1 -3) T. h(  )  F(x 0 -   F(x 0 )) = F(1+ ,-1+3  ) = (1+  ) 3 +(3  -1) 3 -2(1+  ) 2 +3(3  -1) 2 -8 Minimise h(  ) with respect to .  h/  = 3(1+  ) 2 +9(3  -1) 2 - 4(1+  )+18(3  -1) = 84   -10 = 0 So  = 1/3 or -5/14.  must be nonnegative so  = 1/3.

Solve with Steepest Descent x 1 = x 0 -  F(x 0 ) = (1 -1) T – (-1/3 -1) T = (4/3 0) T. This is the exact minimum. We were lucky that the search direction at x 0 points directly towards (4/3 0) T. Usually we would need to do more than one iteration to get a good solution.

Newton’s Method Approximate F locally by a quadratic function and minimise this exactly. Taylor’s Theorem: F(x)  F(x k )+(g(x k )) T (x-x k )+ (1/2)(x-x k ) T G(x k )(x-x k ) = F(x k )-(g(x k )) T x k + (1/2)x k T G(x k )x k + (g(x k )-G(x k )x k ) T x+(1/2)x T G(x k )x RHS is minimum when g(x k ) – G(x k )x k +G(x k )x k+1 =0 So x k+1 = x k – [G(x k )] -1 g(x k ) Search direction is - [G(x k )] -1 g(x k ) and step length is 1.

Newton’s Method Example Rosenbrock’s function: F(x,y) = 10(y-x 2 ) 2 + (1-x) 2 Use Newton’s Method starting at (-1.2 1) T.

MATLAB Solution >> >>x=[-1.2;1] >>x=x-inv([G11(x(1),x(2)) -40*x(1);-40*x(1) 20])*[fgrad1(x(1),x(2)) fgrad2(x(1),x(2))]’

MATLAB Iterations xyF(x,y)  

Notes on Newton’s Method Newton’s Method converges quadratically if the quadratic model is a good fit to the objective function. Problems arise if the quadratic model is not a good fit outside a small neighbourhood of the current point.