Optimization formulation Optimization methods help us find solutions to problems where we seek to find the best of something. This lecture is about how we formulate the problem mathematically. In this lecture we make the assumption that we have choices and that we can attach numerical values to the ‘goodness’ of each alternative. This is not always the case. We may have problems where the only thing we can do is compare pairs of alternatives and tell which one is better, but not by how much. Can you think of an example? Optimization is the process of finding the best alternative from all possible available alternatives. This lecture is about formulating optimization mathematically. In this lecture we assume that we can attach a numerical value to the ‘goodness’ of each alternative. This is not always the case. For example, in looking for the piece of jewelry we like best, we may be able to compare any two pieces on their appeal without being able to assign a numerical value to that appeal. The requirement of a numerical value that is attached to each alternative allows us to pose the optimization problem as a maximization problem or a minimization problem. For example, going back to the problem of looking for the best piece of jewelry, we may seek the cheapest jewelry that we still like well. In this case we seek to minimize the price considering attractiveness as a requirement to define the acceptable alternatives. I often go through the process of optimization like this when I am tested by an optometrist or an ophthalmologist for eyeglasses. I get presented a series of requests to compare option A to option B, in terms of which one is better, and the optimum prescription for the glasses is obtained without any objective function value.
Young Modulus Example The pairs (1,1), (2,2), (4,3) represent strain (millistrains) and stress (ksi) measurements. Our model is , where 𝜀 is strain and 𝜎 is stress. Beware, it is not standard deviation!!!!! We seek to find E that will minimize the differences between the data and the model. Here we can choose the maximum difference as our measure of goodness. An alternate measure is the root-mean-square difference. E is our “design variable” and the maximum or rms differences are possible “objective functions.” A common problem where we use optimization is to fit a curve to data, looking for the “best fit.” Here we assume that we conducted a stress-strain test, and we measured the following pairs of stress (in ksi) and strain (in milistrains): (1,1), (2,2), (4,3). We assume that the material we test is linearly elastic, so that the stress strain relation is given as 𝜎=𝐸𝜀, where 𝜎 is the stress, 𝜀 is the strain and E is Yong’s modulus. We seek the Young modulus that will fit the data best. This could be viewed as a curve fit problem, or it could be viewed as the best estimate of a physical property: Young’s modulus. When we try a value of E, we will obtain three errors between the data and the curve at the three data points. It is clear that when we say best fit, we want all three errors to be small. However, if we want a single measure of goodness we need to obtain a single measure of the smallness of the three errors. Here we will select the maximum of the three errors as our measure of smallness. We call this measure our objective function. The variable that we can change to achieve good objective function, E, is an example of a design variable. A different popular measure combining the three error is the root mean square error. The choice of the error measure would lead to alternate formulation of the optimization problem.
Unconstrained formulations Minimize maximum difference 𝑑 max (𝐸)= max 𝑖=1,2,3 | 𝜎 𝑖 −𝐸 𝜀 𝑖 | Minimize rms of error 𝑑 𝑟𝑚𝑠 (𝐸)= 1 3 𝑖=1 3 𝜎 𝑖 −𝐸 𝜀 𝑖 2 The two formulations lead to different optima: E=5/6 ,E=17/21. Rms formulation is popular because objective function is smooth in design variable. We first compare two formulations that minimize an objective functions without constraints. The maximum difference between the fit and the data may be written as 𝑑 max (𝐸)= max 𝑖=1,2,3 | 𝜎 𝑖 −𝐸 𝜀 𝑖 | Similarly, the root mean square of the differences may be written as 𝑑 𝑟𝑚𝑠 (𝐸)= 1 3 𝑖=1 3 𝜎 𝑖 −𝐸 𝜀 𝑖 2 With only one design variable, E, we can perform the optimization by plotting the two functions versus E. The right figure shows that the minimum of the maximum difference is about 0.33 at E=0.83, while the minimum rms is about 0.28 at E=0.81. The rms is a smooth function of E, while the maximum difference is not, which makes rms formulations more popular. The Matlab sequence used to generate the data and right figure is eps=[1 2 4]; sig=[1 2 3] e=linspace(0.5,1,101); sigmodel=e'*eps; sigr=ones(101,1)*sig; diff=abs(sigr-sigmodel); maxdiff=max(diff'); sumsquares=diag(diff*diff'); rms=sqrt(sumsquares/3) plot(e,maxdiff); xlabel('E'); ylabel('objective function') hold on; plot(e,rms,'r-'); legend('max error','rms error','Location','North')
Top Hat question Recall that the data is (1,1), (2,2) and (4,3) and that we minimize the maximum error and found E=5/6. If we estimate the error in this solution for predicting stress from strain by cross validation, what would the cross validation error be? 0.5? 0.75? 1?
Constrained formulation To avoid a non-smooth function, we can add a bound design variable b, as well as error bounds The objective function is equal to one design, b, variable. E appears only in the constraints. Not only is the new objective and constraints smooth, but they are also linear. Here we know the sign of the differences so we can replace with Even if we limit ourselves to minimizing the maximum error, we can have more than one formulation. In particular we can get rid of the non-smooth problem by adding one design variable b and constraints that the three errors are smaller than b. 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑏,𝐸 𝑏 𝑆𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑑 𝑖 =| 𝜎 𝑖 −𝐸𝜀|≤𝑏𝑖=1,2,3 Since the absolute value is still not a smooth function, we can replace the bounds on the absolute value with upper and lower bounds on the differences as 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑏,𝐸 𝑏 𝑆𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 −𝑏≤ 𝜎 𝑖 −𝐸 𝜀 𝑖 ≤𝑏 This gives us a problem where the objective function and constraints are linear in the two design variables, and such problems are easy to solve. In our case (see left figure on previous slide), it is clear that for minimizing the differences we need a larger E for the first two points than for the third one. So the optimal E, which is a compromise will underestimate the stress in the first two points and over estimate it in the third one. If we use this knowledge, the optimization problem becomes 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑏,𝐸 𝑏 𝑆𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝜎 1 −𝐸 𝜀 1 ≤𝑏, 𝜎 2 −𝐸 𝜀 2 ≤𝑏, −𝑏≤ 𝜎 3 −𝐸 𝜀 3
Graphical optimization Recall the linear constrained optimization formulation. With only two design variables we can plot it. Optimum at constraint boundary intersection. We have seen that with one variable, we can find the optimum by just plotting the function. With two design variables, the optimization can also be solved graphically, by plotting the constraint boundaries and the objective function contours. The figure shows the constraint boundaries with hatching marking the region where the constraint is violated, which is a standard way of marking an inequality. The marking were created with a Matlab routine crosshatch_poly available from Matlab Central. Matlab’s gtext may be used instead to add slashes by clicking on the figure. Since the objective function is equal to b, there is no need for objective function contours, we simply look for the lowest point in the region where no constraints are violated (called the feasible domain). It is seen that there two active constraints on the differences at point 2(red) and point 3 (green). It is easy to check that the difference at point 1 is half of that at point 2 for any value of E, so that constraint is not binding. The optimum is found therefore at the intersection, where the differences between the model and the data at point 2 and point 3 have same magnitude and opposite signs.
Standard notation The standard form of an optimization problem used by most textbooks and software is Standard form of Young’s modulus fit To facilitate the description of algorithms for solving optimization problems, there is a fairly standard notation for writing them. The letter f is typically used for objective functions, g for inequality constraints, and h for equality constraints. Lower and upper bounds on design variables, are often called side constraints and are written separately in a different form. Given an optimization problem that is not in standard form, we often have to do the following in order to convert it to standard form If the objective function is to be maximized, we change the sign and minimize the negative of the function. We rewrite constraints so that the right hand side is zero. If the inequality is g≥0 we replace it by -g≤0. So for example, an inequality of the form x+y ≥3, will be first replaced by x+y-3 ≥0, and then multiplied by -1 to get 3-(x+y) ≤0,
Column design example Height is fixed, design variables are D and t. Objective function is cross-sectional area. Three failure modes As another example of writing constraints well we take a problem from Vanderplaats, Numerical optimization techniques for engineering design, 3rd edition, Example 1-2. It involves the minimum weight design of a tubular column under compressive load where failure may occur in three modes shown in the figure. The two design variables are the diameter D and the thickness t of the tube. The weight, which is the objective function is proportional to the cross sectional area, so in the optimization formulation we will use the area as the objective function. The three modes of failure are stress failure, global buckling also called Euler buckling, and local buckling where the tube wrinkles into a diamond pattern.
Failure modes Stress failure 𝜎= 𝑃 𝐴 ≤ 𝜎 Global (Euler) buckling failure. 𝑃 𝑏 = 𝜋 2 𝐸𝐼 ℎ 2 ≥𝑃 Local buckling failure 𝜎 𝑠 = 2𝐸𝑡 𝐷 3(1− 𝜈 2 ≤𝜎 Geometrical relations 𝐴=𝜋𝐷𝑡, 𝐼=𝐴( 𝐷 2 + 𝑡 2 ) 4 Equations indicate that if buckling is critical, both modes will be active. The equations for the three failure modes are given in the slide. Each failure mode provides for different importance of the diameter and the thickness. The stress depends only on the area, which is also the objective function, and the area depends only on the product of the thickness and the diameter. This means that with a stress constraint only, any combination of the diameter and thickness that is at the stress limit 𝜎 will have the same weight. The global buckling load depends on the Moment of inertia I that depends on D more strongly than on t. So it will drive the design towards a thin tube. Local buckling pushes in the other direction in that it benefits from large t and small D. This indicates that if buckling is critical (happens if the column is slender enough, i.e. h is large), both failure modes will happen simultaneously.
Standard dimensionless formulation Normalized constraints are better both numerically and for communicating degree of satisfaction. The standard formulation requires us to write the failure constraints so that all are of the form g(D,t) ≤0. In the slide they are also written as ratios of response to allowable limit minus one. This provides a handy measure of constraint satisfaction. For example, if the constraint is equal to -0.1, it tells us that we have a 10% margin between the response and the allowable response. If the constraint is equal to 0.05, it means we exceed the allowable by 5%. This normalized and non-dimensional formulation of constraints is also typically better for the numerical performance of optimization algorithms. That is, using an optimization routine, it is likely to lead to faster convergence and increase the chance that we will find the true optimum.
Problems Provide two formulations for minimizing the surface area of a cylinder of a given volume when the diameter and height are the design variables. One formulation should use the volume as equality constraint, and another use it to reduce the number of design variables. You need to go from point A to point B in minimum time while maintaining a safe distance from point C. Formulate an optimization problem to find the path with no more than three design variables when A=(0,0), B=(10,10), C=(4,4), and the minimum safe distance is 7.