Nonlinear programming Unconstrained optimization techniques
Introduction This chapter deals with the various methods of solving the unconstrained minimization problem: It is true that rarely a practical design problem would be unconstrained; still, a study of this class of problems would be important for the following reasons: –The constraints do not have significant influence in certain design problems. –Some of the powerful and robust methods of solving constrained minimization problems require the use of unconstrained minimization techniques. –The unconstrained minimization methods can be used to solve certain complex engineering analysis problems. For example, the displacement response (linear or nonlinear) of any structure under any specified load condition can be found by minimizing its potential energy. Similarly, the eigenvalues and eigenvectors of any discrete system can be found by minimizing the Rayleigh quotient.
Classification of unconstrained minimization methods Direct search methods Random search method Grid search method Univariate method Pattern search methods –Powell’s method –Hooke-Jeeves method Rosenbrock’s method Simplex method Descent methods Steepest descent (Cauchy method) Fletcher-Reeves method Newton’s method Marquardt method Quasi-Newton methods –Davidon-Fletcher-Powell method –Broyden-Fletcher-Goldfarb- Shanno method
Direct search methods They require only the objective function values but not the partial derivatives of the function in finding the minimum and hence are often called the nongradient methods. The direct search methods are also known as zeroth-order methods since they use zeroth-order derivatives of the function. These methods are most suitable for simple problems involving a relatively small numbers of variables. These methods are in general less efficient than the descent methods.
Descent methods The descent techniques require, in addition to the function values, the first and in some cases the second derivatives of the objective function. Since more information about the function being minimized is used (through the use of derivatives), descent methods are generally more efficient than direct search techniques. The descent methods are known as gradient methods. Among the gradient methods, those requiring only first derivatives of the function are called first-order methods; those requiring both first and second derivatives of the function are termed second-order methods.
General approach All unconstrained minimization methods are iterative in nature and hence they start from an initial trial solution and proceed toward the minimum point in a sequential manner. Different unconstrained minimization techniques differ from one another only in the method of generating the new point X i+1 from X i and in testing the point X i+1 for optimality.
Univariate method In this method, we change only one variable at a time and seek to produce a sequence of improved approximations to the minimum point. By starting at a base point X i in the ith iteration, we fix the values of n-1 variables and vary the remaining variable. Since only one variable is changed, the problem becomes a one-dimensional minimization problem and any of the methods discussed in the previous chapter on one dimensional minimization methods can be used to produce a new base point X i+1. The search is now continued in a new direction. This new direction is obtained by changing any one of the n-1 variables that were fixed in the previous iteration.
Univariate method In fact, the search procedure is continued by taking each coordinate direction in turn. After all the n directions are searched sequentially, the first cycle is complete and hence we repeat the entire process of sequential minimization. The procedure is continued until no further improvement is possible in the objective function in any of the n directions of a cycle. The univariate method can be summarized as follows: 1.Choose an arbitrary starting point X 1 and set i=1 2.Find the search direction S as
Univariate method 3.Determine whether i should be positive or negative. For the current direction S i, this means find whether the function value decreases in the positive or negative direction. For this, we take a small probe length ( ) and evaluate f i =f (X i ), f + =f(X i + S i ), and f - =f(X i - S i ). If f + < f i, S i will be the correct direction for decreasing the value of f and if f - < f i, -S i will be the correct one. If both f + and f – are greater than f i, we take X i as the minimum along the direction S i.
Univariate method 4. Find the optimal step length i * such that where + or – sign has to be used depending upon whether S i or -S i is the direction for decreasing the function value. 5. Set X i+1 = X i ± i *S i depending on the direction for decreasing the function value, and f i+1 = f (X i+1 ). 6. Set the new value of i=i+1, and go to step 2. Continue this procedure until no significant change is achieved in the value of the objective function.
Univariate method The univariate method is very simple and can be implemented easily. However, it will not converge rapidly to the optimum solution, as it has a tendency to oscillate with steadily decreasing progress towards the optimum. Hence it will be better to stop the computations at some point near to the optimum point rather than trying to find the precise optimum point. In theory, the univariate method can be applied to find the minimum of any function that possesses continuous derivatives. However, if the function has a steep valley, the method may not even converge.
Univariate method For example, consider the contours of a function of two variables with a valley as shown in figure. If the univariate search starts at point P, the function value can not be decreased either in the direction ±S 1, or in the direction ±S 2. Thus, the search comes to a halt and one may be misled to take the point P, which is certainly not the optimum point, as the optimum point. This situation arises whenever the value of the probe length needed for detecting the proper direction ( ±S 1 or ±S 2 ) happens to be less than the number of significant figures used in the computations.
Example Minimize With the starting point (0,0). Solution: We will take the probe length as 0.01 to find the correct direction for decreasing the function value in step 3. Further, we will use the differential calculus method to find the optimum step length i * along the direction ± S i in step 4.
Example Iteration i=1 Step 2: Choose the search direction S 1 as Step 3: To find whether the value of f decreases along S 1 or –S 1, we use the probe length . Since -S 1 is the correct direction for minimizing f from X 1.
Example Step 4: To find the optimum step length 1 *, we minimize Step 5: Set
Example Iteration i=2 Step 2: Choose the search direction S 2 as Step 3: Since S 2 is the correct direction for decreasing the value of f from X 2.
Example Step 4: We minimize f (X S 2 ) to find 2 *. Here Step 5: Set
Powell’s method Powell’s method is an extension of the basic pattern search method. It is the most widely used direct search method and can be proved to be a method of conjugate directions. A conjugate directions method will minimize a quadratic function in a finite number of steps. Since a general nonlinear function can be approximated reasonably well by a quadratic function near its minimum, a conjugate directions method is expected to speed up the convergence of even general nonlinear objective functions.
Powell’s method Definition: Conjugate Directions Let A=[A] be an n x n symmetric matrix. A set of n vectors (or directions) {S i } is said to be conjugate (more accurately A conjugate) if It can be seen that orthogonal directions are a special case of conjugate directions (obtained with [A]=[I]) Definition: Quadratically Convergent Method If a minimization method, using exact arithmetic, can find the minimum point in n steps while minimizing a quadratic function in n variables, the method is called a quadratically convergent method.
Powell’s method Theorem 1: Given a quadratic function of n variables and two parallel hyperplanes 1 and 2 of dimension k < n. Let the constrained stationary points of the quadratic function in the hyperplanes be X 1 and X 2, respectively. Then the line joining X 1 and X 2 is conjugate to any line parallel to the hyperplanes. The meaning of this theorem is illustrated in a two-dimensional space in the figure. If X 1 and X 2 are the minima of Q obtained by searching along the direction S from two different starting points X a and X b, respectively, the line (X 1 - X 2 ) will be conjugate to the search direction S.
Powell’s method Theorem 2: If a quadratic function is minimized sequentially, once along each direction of a set of n mutually conjugate directions, the minimum of the function Q will be found at or before the nth step irrespective of the starting point.
Example Consider the minimization of the function If denotes a search direction, find a direction S 2 which is conjugate to the direction S 1. Solution: The objective function can be expressed in matrix form as:
Example The Hessian matrix [A] can be identified as The direction will be conjugate to if
Example which upon expansion gives 2s 2 = 0 or s 1 = arbitrary and s 2 = 0. Since s 1 can have any value, we select s 1 = 1 and the desired conjugate direction can be expressed as
Powell’s Method: The Algorithm The basic idea of Powell’s method is illustrated graphically for a two variable function in the figure. In this figure, the function is first minimized once along each of the coordinate directions starting with the second coordinate direction and then in the corresponding pattern direction. This leads to point 5. For the next cycle of minimization, we discard one of the coordinate directions (the x 1 direction in the present case) in favor of the pattern direction.
Powell’s Method: The Algorithm Thus we minimize along u 2 and S 1 and point 7. Then we generate a new pattern direction as shown in the figure. For the next cycle of minimization, we discard one of the previously used coordinate directions (the x 2 direction in this case) in favor of the newly generated pattern direction.
Powell’s Method: The Algorithm Then by starting from point 8, we minimize along directions S 1 and S 2, thereby obtaining points 9 and 10, respectively. For the next cycle of minimization, since there is no coordinate direction to discard, we restart the whole procedure by minimizing along the x 2 direction. This procedure is continued until the desired minimum point is found.
Powell’s Method: The Algorithm
Note that the search will be made sequentially in the directions S n ; S 1, S 2, S 3,…., S n-1, S n ; S p (1) ; S 2, S 3,…., S n-1, S n, S p (1) ; S p (2) ; S 3,S 4,…., S n-1, S n, S p (1), S p (2) ; S p (3),….until the minimum point is found. Here S i indicates the coordinate direction u i and S p (j) the jth pattern direction. In the flowchart, the previous base point is stored as the vector Z in block A, and the pattern direction is constructed by subtracting the previous base point from the current one in Block B. The pattern direction is then used as a minimization direction in blocks C and D.
Powell’s Method: The Algorithm For the next cycle, the first direction used in the previous cycle is discarded in favor of the current pattern direction. This is achieved by updating the numbers of the search directions as shown in block E. Thus, both points Z and X used in block B for the construction of the pattern directions are points that are minima along S n in the first cycle, the first pattern direction S p (1) in the second cycle, the second pattern direction S p (2) in the third cycle, and so on.
Quadratic convergence It can be seen from the flowchart that the pattern directions S p (1), S p (2), S p (3),….are nothing but the lines joining the minima found along the directions S n, S p (1), S p (2),….respectively. Hence by Theorem 1, the pairs of direction (S n, S p (1) ), (S p (1), S p (2) ), and so on, are A conjugate. Thus all the directions S n, S p (1), S p (2),…. are A conjugate. Since by Theorem 2, any search method involving minimization along a set of conjugate directions is quadratically convergent, Powell’s method is quadratically convergent. From the method used for constructing the conjugate directions S p (1), S p (2),…., we find that n minimization cycles are required to complete the construction of n conjugate directions. In the ith cycle, the minimization is done along the already constructed i conjugate directions and the n-i nonconjugate (coordinate) directions. Thus, after n cycles, all the n search directions are mutually conjugate and a quadratic will theoretically be minimized in n 2 one-dimensional minimizations. This proves the quadratic convergence of Powell’s method.
Quadratic Convergence of Powell’s Method It is to be noted that as with most of the numerical techniques, the convergence in many practical problems may not be as good as the theory seems to indicate. Powell’s method may require a lot more iterations to minimize a function than the theoretically estimated number. There are several reasons for this: 1.Since the number of cycles n is valid only for quadratic functions, it will take generally greater than n cycles for nonquadratic functions. 2.The proof of quadratic convergence has been established with the assumption that the exact minimum is found in each of the one – dimensional minimizations. However, the actual minimizing step lengths i * will be only approximate, and hence the subsequent directions will not be conjugate. Thus the method requires more number of iterations for achieving the overall convergence.
Quadratic Convergence of Powell’s Method 3. Powell’s method described above can break down before the minimum point is found. This is because the search directions S i might become dependent or almost dependent during numerical computation. Example: Minimize From the starting point using Powell’s method.
Example Cycle 1: Univariate search We minimize f along from X 1. To find the correct direction (+S 2 or – S 2 ) for decreasing the value of f, we take the probe length as =0.01. As f 1 =f (X 1 )=0.0, and f decreases along the direction +S 2. To find the minimizing step length * along S 2, we minimize As df/d = 0 at * =1/2, we have
Example Next, we minimize f along f decreases along –S 1. As f (X 2 - S 1 ) = f (-,0.50) = , df/d =0 at *=1/2. Hence
Example Now we minimize f along f decreases along +S 2 direction. Since This gives
Example Cycle 2: Pattern Search Now we generate the first pattern direction as: and minimize f along S p (1) from X 4. Since f decreases in the positive direction of S p (1). As
Example The point X 5 can be identified to be the optimum point. If we do not recognize X 5 as the optimum point at this stage, we proceed to minimize f along the direction. This shows that f can not be minimized along S 2, and hence X 5 will be the optimum point. In this example, the convergence has been achieved in the second cycle itself. This is to be expected in this case as f is a quadratic function, and the method is a quadratically convergent method.
Steepest descent (Cauchy method) The use of the negative of the gradient vector as a direction for minimization was first made by Cauchy in In this method, we start from an initial trial point X 1 and iteratively move along the steepest descent directions until the optimum point is found. The steepest descent method can be summarized by the following steps: 1.Start with an arbitrary initial point X 1. Set the iteration number as i=1. 2.Find the search direction S i as 3.Determine the optimal step length * in the direction S i and set
Steepest descent (Cauchy method) 4.Test the new point, X i+1, for optimality. If X i+1 is optimum, stop the process. Otherwise go to step 5. 5.Set the new iteration number i=i+1 and go to step 2. The method of steepest descent may appear to be the best unconstrained minimization technique since each one-dimensional search starts in the ‘best direction’. However, owing to the fact that the steepest descent direction is a local property, the method is not really effective in most problems.