Nonlinear programming Unconstrained optimization techniques.

Slides:



Advertisements
Similar presentations
Optimization.
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
CSE 330: Numerical Methods
Optimization : The min and max of a function
Introducción a la Optimización de procesos químicos. Curso 2005/2006 BASIC CONCEPTS IN OPTIMIZATION: PART II: Continuous & Unconstrained Important concepts.
Optimization of thermal processes
Optimization methods Review
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Nonlinear programming: One dimensional minimization methods
ENGINEERING OPTIMIZATION
1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)
Function Optimization Newton’s Method. Conjugate Gradients
Unconstrained Optimization Rong Jin. Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in.
Tutorial 12 Unconstrained optimization Conjugate gradients.
Design Optimization School of Engineering University of Bradford 1 Numerical optimization techniques Unconstrained multi-parameter optimization techniques.
Optimization Methods One-Dimensional Unconstrained Optimization
1 Chapter 8: Linearization Methods for Constrained Problems Book Review Presented by Kartik Pandit July 23, 2010 ENGINEERING OPTIMIZATION Methods and Applications.
Gradient Methods May Preview Background Steepest Descent Conjugate Gradient.
Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn
Optimization Methods One-Dimensional Unconstrained Optimization
Nonlinear programming
Function Optimization. Newton’s Method Conjugate Gradients Method
Advanced Topics in Optimization
Boyce/DiPrima 9th ed, Ch 11.2: Sturm-Liouville Boundary Value Problems Elementary Differential Equations and Boundary Value Problems, 9th edition, by.
Why Function Optimization ?
Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.
Optimization Methods One-Dimensional Unconstrained Optimization

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
UNCONSTRAINED MULTIVARIABLE
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
ENCI 303 Lecture PS-19 Optimization 2
84 b Unidimensional Search Methods Most algorithms for unconstrained and constrained optimisation use an efficient unidimensional optimisation technique.
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.
Chapter 7 Optimization. Content Introduction One dimensional unconstrained Multidimensional unconstrained Example.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
Method of Hooke and Jeeves
Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.
Numerical Methods.
Engineering Optimization Chapter 3 : Functions of Several Variables (Part 1) Presented by: Rajesh Roy Networks Research Lab, University of California,
559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Lecture 13. Geometry Optimization References Computational chemistry: Introduction to the theory and applications of molecular and quantum mechanics, E.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Chapter 10 Minimization or Maximization of Functions.
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.
Exam 1 Oct 3, closed book Place ITE 119, Time:12:30-1:45pm
Gradient Methods In Optimization
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Optimization in Engineering Design 1 Introduction to Non-Linear Optimization.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Linear & Nonlinear Programming -- Basic Properties of Solutions and Algorithms.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Matrices CHAPTER 8.9 ~ Ch _2 Contents  8.9 Power of Matrices 8.9 Power of Matrices  8.10 Orthogonal Matrices 8.10 Orthogonal Matrices 
Ch 9.6: Liapunov’s Second Method In Section 9.3 we showed how the stability of a critical point of an almost linear system can usually be determined from.
CSE 330: Numerical Methods. What is true error? True error is the difference between the true value (also called the exact value) and the approximate.
Non-linear Minimization
Chapter 14.
Performance Optimization
What are optimization methods?
Section 3: Second Order Methods
Presentation transcript:

Nonlinear programming Unconstrained optimization techniques

Introduction This chapter deals with the various methods of solving the unconstrained minimization problem: It is true that rarely a practical design problem would be unconstrained; still, a study of this class of problems would be important for the following reasons: –The constraints do not have significant influence in certain design problems. –Some of the powerful and robust methods of solving constrained minimization problems require the use of unconstrained minimization techniques. –The unconstrained minimization methods can be used to solve certain complex engineering analysis problems. For example, the displacement response (linear or nonlinear) of any structure under any specified load condition can be found by minimizing its potential energy. Similarly, the eigenvalues and eigenvectors of any discrete system can be found by minimizing the Rayleigh quotient.

Classification of unconstrained minimization methods Direct search methods Random search method Grid search method Univariate method Pattern search methods –Powell’s method –Hooke-Jeeves method Rosenbrock’s method Simplex method Descent methods Steepest descent (Cauchy method) Fletcher-Reeves method Newton’s method Marquardt method Quasi-Newton methods –Davidon-Fletcher-Powell method –Broyden-Fletcher-Goldfarb- Shanno method

Direct search methods They require only the objective function values but not the partial derivatives of the function in finding the minimum and hence are often called the nongradient methods. The direct search methods are also known as zeroth-order methods since they use zeroth-order derivatives of the function. These methods are most suitable for simple problems involving a relatively small numbers of variables. These methods are in general less efficient than the descent methods.

Descent methods The descent techniques require, in addition to the function values, the first and in some cases the second derivatives of the objective function. Since more information about the function being minimized is used (through the use of derivatives), descent methods are generally more efficient than direct search techniques. The descent methods are known as gradient methods. Among the gradient methods, those requiring only first derivatives of the function are called first-order methods; those requiring both first and second derivatives of the function are termed second-order methods.

General approach All unconstrained minimization methods are iterative in nature and hence they start from an initial trial solution and proceed toward the minimum point in a sequential manner. Different unconstrained minimization techniques differ from one another only in the method of generating the new point X i+1 from X i and in testing the point X i+1 for optimality.

Univariate method In this method, we change only one variable at a time and seek to produce a sequence of improved approximations to the minimum point. By starting at a base point X i in the ith iteration, we fix the values of n-1 variables and vary the remaining variable. Since only one variable is changed, the problem becomes a one-dimensional minimization problem and any of the methods discussed in the previous chapter on one dimensional minimization methods can be used to produce a new base point X i+1. The search is now continued in a new direction. This new direction is obtained by changing any one of the n-1 variables that were fixed in the previous iteration.

Univariate method In fact, the search procedure is continued by taking each coordinate direction in turn. After all the n directions are searched sequentially, the first cycle is complete and hence we repeat the entire process of sequential minimization. The procedure is continued until no further improvement is possible in the objective function in any of the n directions of a cycle. The univariate method can be summarized as follows: 1.Choose an arbitrary starting point X 1 and set i=1 2.Find the search direction S as

Univariate method 3.Determine whether i should be positive or negative. For the current direction S i, this means find whether the function value decreases in the positive or negative direction. For this, we take a small probe length (  ) and evaluate f i =f (X i ), f + =f(X i +  S i ), and f - =f(X i -  S i ). If f + < f i, S i will be the correct direction for decreasing the value of f and if f - < f i, -S i will be the correct one. If both f + and f – are greater than f i, we take X i as the minimum along the direction S i.

Univariate method 4. Find the optimal step length i * such that where + or – sign has to be used depending upon whether S i or -S i is the direction for decreasing the function value. 5. Set X i+1 = X i ± i *S i depending on the direction for decreasing the function value, and f i+1 = f (X i+1 ). 6. Set the new value of i=i+1, and go to step 2. Continue this procedure until no significant change is achieved in the value of the objective function.

Univariate method The univariate method is very simple and can be implemented easily. However, it will not converge rapidly to the optimum solution, as it has a tendency to oscillate with steadily decreasing progress towards the optimum. Hence it will be better to stop the computations at some point near to the optimum point rather than trying to find the precise optimum point. In theory, the univariate method can be applied to find the minimum of any function that possesses continuous derivatives. However, if the function has a steep valley, the method may not even converge.

Univariate method For example, consider the contours of a function of two variables with a valley as shown in figure. If the univariate search starts at point P, the function value can not be decreased either in the direction ±S 1, or in the direction ±S 2. Thus, the search comes to a halt and one may be misled to take the point P, which is certainly not the optimum point, as the optimum point. This situation arises whenever the value of the probe length  needed for detecting the proper direction ( ±S 1 or ±S 2 ) happens to be less than the number of significant figures used in the computations.

Example Minimize With the starting point (0,0). Solution: We will take the probe length  as 0.01 to find the correct direction for decreasing the function value in step 3. Further, we will use the differential calculus method to find the optimum step length i * along the direction ± S i in step 4.

Example Iteration i=1 Step 2: Choose the search direction S 1 as Step 3: To find whether the value of f decreases along S 1 or –S 1, we use the probe length . Since -S 1 is the correct direction for minimizing f from X 1.

Example Step 4: To find the optimum step length 1 *, we minimize Step 5: Set

Example Iteration i=2 Step 2: Choose the search direction S 2 as Step 3: Since S 2 is the correct direction for decreasing the value of f from X 2.

Example Step 4: We minimize f (X S 2 ) to find 2 *. Here Step 5: Set

Powell’s method Powell’s method is an extension of the basic pattern search method. It is the most widely used direct search method and can be proved to be a method of conjugate directions. A conjugate directions method will minimize a quadratic function in a finite number of steps. Since a general nonlinear function can be approximated reasonably well by a quadratic function near its minimum, a conjugate directions method is expected to speed up the convergence of even general nonlinear objective functions.

Powell’s method Definition: Conjugate Directions Let A=[A] be an n x n symmetric matrix. A set of n vectors (or directions) {S i } is said to be conjugate (more accurately A conjugate) if It can be seen that orthogonal directions are a special case of conjugate directions (obtained with [A]=[I]) Definition: Quadratically Convergent Method If a minimization method, using exact arithmetic, can find the minimum point in n steps while minimizing a quadratic function in n variables, the method is called a quadratically convergent method.

Powell’s method Theorem 1: Given a quadratic function of n variables and two parallel hyperplanes 1 and 2 of dimension k < n. Let the constrained stationary points of the quadratic function in the hyperplanes be X 1 and X 2, respectively. Then the line joining X 1 and X 2 is conjugate to any line parallel to the hyperplanes. The meaning of this theorem is illustrated in a two-dimensional space in the figure. If X 1 and X 2 are the minima of Q obtained by searching along the direction S from two different starting points X a and X b, respectively, the line (X 1 - X 2 ) will be conjugate to the search direction S.

Powell’s method Theorem 2: If a quadratic function is minimized sequentially, once along each direction of a set of n mutually conjugate directions, the minimum of the function Q will be found at or before the nth step irrespective of the starting point.

Example Consider the minimization of the function If denotes a search direction, find a direction S 2 which is conjugate to the direction S 1. Solution: The objective function can be expressed in matrix form as:

Example The Hessian matrix [A] can be identified as The direction will be conjugate to if

Example which upon expansion gives 2s 2 = 0 or s 1 = arbitrary and s 2 = 0. Since s 1 can have any value, we select s 1 = 1 and the desired conjugate direction can be expressed as

Powell’s Method: The Algorithm The basic idea of Powell’s method is illustrated graphically for a two variable function in the figure. In this figure, the function is first minimized once along each of the coordinate directions starting with the second coordinate direction and then in the corresponding pattern direction. This leads to point 5. For the next cycle of minimization, we discard one of the coordinate directions (the x 1 direction in the present case) in favor of the pattern direction.

Powell’s Method: The Algorithm Thus we minimize along u 2 and S 1 and point 7. Then we generate a new pattern direction as shown in the figure. For the next cycle of minimization, we discard one of the previously used coordinate directions (the x 2 direction in this case) in favor of the newly generated pattern direction.

Powell’s Method: The Algorithm Then by starting from point 8, we minimize along directions S 1 and S 2, thereby obtaining points 9 and 10, respectively. For the next cycle of minimization, since there is no coordinate direction to discard, we restart the whole procedure by minimizing along the x 2 direction. This procedure is continued until the desired minimum point is found.

Powell’s Method: The Algorithm

Note that the search will be made sequentially in the directions S n ; S 1, S 2, S 3,…., S n-1, S n ; S p (1) ; S 2, S 3,…., S n-1, S n, S p (1) ; S p (2) ; S 3,S 4,…., S n-1, S n, S p (1), S p (2) ; S p (3),….until the minimum point is found. Here S i indicates the coordinate direction u i and S p (j) the jth pattern direction. In the flowchart, the previous base point is stored as the vector Z in block A, and the pattern direction is constructed by subtracting the previous base point from the current one in Block B. The pattern direction is then used as a minimization direction in blocks C and D.

Powell’s Method: The Algorithm For the next cycle, the first direction used in the previous cycle is discarded in favor of the current pattern direction. This is achieved by updating the numbers of the search directions as shown in block E. Thus, both points Z and X used in block B for the construction of the pattern directions are points that are minima along S n in the first cycle, the first pattern direction S p (1) in the second cycle, the second pattern direction S p (2) in the third cycle, and so on.

Quadratic convergence It can be seen from the flowchart that the pattern directions S p (1), S p (2), S p (3),….are nothing but the lines joining the minima found along the directions S n, S p (1), S p (2),….respectively. Hence by Theorem 1, the pairs of direction (S n, S p (1) ), (S p (1), S p (2) ), and so on, are A conjugate. Thus all the directions S n, S p (1), S p (2),…. are A conjugate. Since by Theorem 2, any search method involving minimization along a set of conjugate directions is quadratically convergent, Powell’s method is quadratically convergent. From the method used for constructing the conjugate directions S p (1), S p (2),…., we find that n minimization cycles are required to complete the construction of n conjugate directions. In the ith cycle, the minimization is done along the already constructed i conjugate directions and the n-i nonconjugate (coordinate) directions. Thus, after n cycles, all the n search directions are mutually conjugate and a quadratic will theoretically be minimized in n 2 one-dimensional minimizations. This proves the quadratic convergence of Powell’s method.

Quadratic Convergence of Powell’s Method It is to be noted that as with most of the numerical techniques, the convergence in many practical problems may not be as good as the theory seems to indicate. Powell’s method may require a lot more iterations to minimize a function than the theoretically estimated number. There are several reasons for this: 1.Since the number of cycles n is valid only for quadratic functions, it will take generally greater than n cycles for nonquadratic functions. 2.The proof of quadratic convergence has been established with the assumption that the exact minimum is found in each of the one – dimensional minimizations. However, the actual minimizing step lengths i * will be only approximate, and hence the subsequent directions will not be conjugate. Thus the method requires more number of iterations for achieving the overall convergence.

Quadratic Convergence of Powell’s Method 3. Powell’s method described above can break down before the minimum point is found. This is because the search directions S i might become dependent or almost dependent during numerical computation. Example: Minimize From the starting point using Powell’s method.

Example Cycle 1: Univariate search We minimize f along from X 1. To find the correct direction (+S 2 or – S 2 ) for decreasing the value of f, we take the probe length as  =0.01. As f 1 =f (X 1 )=0.0, and f decreases along the direction +S 2. To find the minimizing step length * along S 2, we minimize As df/d = 0 at * =1/2, we have

Example Next, we minimize f along f decreases along –S 1. As f (X 2 - S 1 ) = f (-,0.50) = , df/d =0 at *=1/2. Hence

Example Now we minimize f along f decreases along +S 2 direction. Since This gives

Example Cycle 2: Pattern Search Now we generate the first pattern direction as: and minimize f along S p (1) from X 4. Since f decreases in the positive direction of S p (1). As

Example The point X 5 can be identified to be the optimum point. If we do not recognize X 5 as the optimum point at this stage, we proceed to minimize f along the direction. This shows that f can not be minimized along S 2, and hence X 5 will be the optimum point. In this example, the convergence has been achieved in the second cycle itself. This is to be expected in this case as f is a quadratic function, and the method is a quadratically convergent method.

Steepest descent (Cauchy method) The use of the negative of the gradient vector as a direction for minimization was first made by Cauchy in In this method, we start from an initial trial point X 1 and iteratively move along the steepest descent directions until the optimum point is found. The steepest descent method can be summarized by the following steps: 1.Start with an arbitrary initial point X 1. Set the iteration number as i=1. 2.Find the search direction S i as 3.Determine the optimal step length * in the direction S i and set

Steepest descent (Cauchy method) 4.Test the new point, X i+1, for optimality. If X i+1 is optimum, stop the process. Otherwise go to step 5. 5.Set the new iteration number i=i+1 and go to step 2. The method of steepest descent may appear to be the best unconstrained minimization technique since each one-dimensional search starts in the ‘best direction’. However, owing to the fact that the steepest descent direction is a local property, the method is not really effective in most problems.