Download presentation
Presentation is loading. Please wait.
Published byDeddy Sutedja Modified over 6 years ago
1
Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Jan. 11, 2001
Dynamic Programming Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Jan. 11, 2001 Version 3.1 Date:
2
Outline Sample Problem General Formulation Linear-Quadratic Problem
General Problems
3
Path-Cost Problem Find the path with minimal cost 1 2 3 4 6 8 5 9 N
Here we have a 4-stage path-cost problem. The cost of each edge and the terminal costs are as stated. The goal is to find the path which incurs the minimal amount of cost. This is a multistage optimization problem or a dynamic optimization problem in that earlier decision will effect later costs. We will use this problem demonstrate how dynamic programming (DP) algorithm works. N
4
Principle of Optimality
“An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.” Another way to interpret the principle of optimality is that any part of the optimal path is optimal. This principle is the basis for dynamic programming. This suggests that we could solve the optimization problem in stages. More precisely, if we need to find the optimal path from stage 1 to stage 4, we also need to find the optimal path from from stage 2 to stage 4, assuming that we know the optimal initial state and decision. We can apply the principle iteratively. Thus, assuming that we know the optimal path to stage 3, the problem becomes finding the optimal path from stage 3 to 4. What does all this mean? It means that we can try to identify the optimal set of decisions backward one stage at a time. With this, it allows us to decompose the large optimization problem into a series of smaller optimization problems where we only need to find the component of the optimal path. The can be though of as divide-and-conquer. Let us demonstrate how dynamic programming and the principle of optimality is applied to the cost-path problem in the following slides. N
5
Path-Cost Problem (cont.)
7 1 2 3 4 5 6 8 9 6 6 5 At any intermediate stage of the dynamic programming, we assume that we have the optimum path from the next stage and onward. The goal is to identify the optimum choice of path for all the point at the current stage (however we arrived at that state and that stage) and calculate the cost-to-go which is the optimal cost from that point and onward. Since there are only 4 stages in this problem, we move back to the 3rd stage and try to identify the optimal “choice of path” from each of the points (state) at the 3rd stage. The arrows indicate the optimal choice of path from a particular point in the 3rd stage to the terminal stage. The number in the box is the cost-to-go. For example, from the top point in the third stage, there is only one choice of path to the terminal state. Thus, it is optimum by definition. Thus, the cost-to-go is 7 = As for the second point from the top on the third stage, both choices of path yield the same cost. Thus, both paths are optimum and the cost-to-go is 6. The cost-to-go summaries the “goodness” of each of the paths from this stage and onward. Note, the idea of cost-to-go only works when the total cost is additive of the cost from each stages. In other words, the total cost can be decomposed into independent parts. Some optimization problems do not fulfill this criterion. 6 N
6
Path-Cost Problem (cont.)
1 2 3 4 5 6 8 9 7 12 Now, we go back to stage 2 and repeat the process. 7 15 N
7
Path-Cost Problem (cont.)
15 1 2 3 4 5 6 8 9 9 11 Once the cost-to-go for each of the starting points is computed, the optimal choice of path from each of those points at stage 1 is determined. Thus, the optimal path to the problem is the starting point with the lowest cost-to-go value. The optimal path for the problem is indicated by the thick lines. In this case, either of the two paths between stage 3 and 4 would result in the same overall cost. Thus, both are optimal with a cost of 9. A word on the computational savings. There are a total of 24 paths. With DP, only 21 calculations are need to locate the optimal path. The savings is more obvious and dramatic when there are more states or more stages. On the other hand, if the problem is too large, no matter how much we save, we still can’t solve it, i.e., 50% of infinity is still infinity. One way this occurs is if the “state” which denotes the position in each stage is multi-dimensional. Then the optimal cost-to-go function and the decision at each state becomes a multi-dimensional function. The computational burden associated with multi-dimensional functions is known as the “curse of dimensionality” and impossibly difficult to deal with in practice. Question: Why? e.g. the optimal cost-to-go function and the decision at each state is 10-dimensional function. For each dimension, we assume it has 10 values to choose. Thus the decision function for each state at each stage has a selection set with the size of And the maximum possible number of the states at each stage is also 10!. Thus the calculation burden of the cost-to-go at each stage will possibly be 1010* 1010 Higher than in the case of one-dimension. With the increase of the dimension, the calculation burden increases with an exponential speed. Thus it will be impossibly difficult to deal with in practice. 15 N
8
Formulation for Cost-Path Pb.
Here is the formulation which will help us to generalize to other multistage optimization problem. As demonstrated earlier, we are solving the problem backward. The first step is to compute the cost-to-go at stage N-1, the second to the last stage, for each of the possible points and identify the optimum path from each of those points. N
9
Formulation (cont.) More generally, N
The cost-go-go can be rewritten more precisely as above. Then the cost-to-go is computed for each of the N-1 non-terminal stages, backward, and optimum choice of path identified for each points. The optimal path to the problem is the collection of optimal choices of path from the starting point leading to the terminal stage. The performance J here is not the performance index of the optimum choice but the performance index of that state at the stage. I.e. J(x(N)) is not the performance index of the whole optimum choice but of the terminal state. Thus J(x(N)) may be less than J(x(N-1)). Understand the process of the previous example helps to understand the illustration here. Thus J(.) is also called the Cost-to-Go. N
10
General Formulation Multistage optimization problem:
The cost-to-go with initial condition A general multistage optimization problem or optimal control problem can be stated as above, where (•) is the terminal cost and L(•) is the cost associated with a particular state x, control decision u and time i. Since L is a function of the time, it means the cost structure could vary over time. In other words, we could deal with a non-stationary problem. The above depicted the cost-to-go for the general multistage optimization problem with the initial condition of the cost-to-go. To deal with maximization problem, instead of minimizing over the possible actions in the cost-to-go, one maximize over the possible actions in the cost-to-go. For more details, please refer to Sec. 4.2 of Bryson and Ho. N
11
Multistage or Optimal Control Problem
Can be approached as static optimization problem with specialized equality (staircase) constraints See study guides titled “Dynamic Systems” These two equivalent ways will be made clear below in the solution of a specific class of problems The two equivalent ways mean: 1) the dynamic programming introduced previously 2) the static optimization problem with specialized equality constraints.
12
Linear-Quadratic Problem
subject to linear system dynamics given the initial state x(0) where x(i) is the state variables at time i u(i) is the control variable at time i a(i) and b(i) are the cost factor at time i For more details on the setup of a Linear-Quadratic (LQ) problem, please refer to the handout titled “Linear-Quadratic-Gaussian Problem.” N
13
LQ Problem (cont.) Mirroring the formulation shown on slide 9, we obtain the above. N
14
LQ Problem (cont.) Substitute into Set We get N
Substitute x(N) to obtain everything in time instance N-1. Since the cost is quadratic, there exist a unique optimal choice for u(N-1) which occurs where the derivative is 0. Obviously the the denominator should not be zero. N
15
LQ Problem (cont.) With some work, we have Let Then, we have N
With some algebraic manipulation, one would obtain the u(N-1) in the slide. N
16
LQ Problem (cont.) With N
Notice the form of the optimal control at time N-1 is a function of (N). This suggest that there might be a recursive way to write the solution for the LQ problem, i.e., u(i) as a function of (i+1). N
17
LQ Problem (cont.) Substitute the optimum u(N-1), then we have Define
In order to find the recursive form of the solution, we first need to guess the form of (). N
18
LQ Problem (cont.) This is exactly the same as what is on slide 14 with a different time index. Thus, we could write the optimal control for time N-2 as a function of (N-1). N
19
LQ Problem (cont.) By induction, we have the optimal solution to be where with boundary condition Having done the dynamic programming for the last two stages of the LQ problems, we could observe a pattern which is summarized in this slide. Here are some comments in regard to the computational burden. In general, we cannot obtain a close form solution for the optimal control. In such cases, we need to search through every signal point in the space. For the moment, assume that we have an N-dimensional problem with M possible points in each dimension. That means there are MN number of points that we have to search through for 1 stage of the dynamic programming. For a reasonable problem, we might have 100 points in each of the dimensions and there are 10 dimensions total, i.e., 10 different parameters. That mean there are a total of 1020 possible points to search in one stage of DP. The explosion in size grows exponential with the dimension of the problem. This is commonly know as the curse of dimensionality. N
20
General Problems Stochastic problems Combinatorial problems
Variable termination time Constraints in the problem We will elaborate a bit more on how DP works on stochastic and combinatorial problems. N
21
Stochastic Problem The cost-to-go with initial condition N
The only difference when applying dynamic programming to deterministic problem vs. stochastic problem, the expected value is in the performance criteria. Once the expected value is obtained, the solution process of the stochastic problems is the same as in the deterministic problems. N
22
Combinatorial Problem
The cost-to-go with initial condition Each of the x(i)’s can takes on a range of values. The Li(u(i)) is the associated reward or cost to assignment of x(i). Note, the decision variable, u(i), is missing since the states themselves represented the decision. Examples of combinatorial problems include traveling salesman problem, knapsack problem, and resource allocation problem. All of which fit into the model described above. N
23
References: Bellman, R., Dynamic Programming, Princeton University Press, 1957. Bryson, Jr., A. E. and Y.-C. Ho, Applied Optimal Control: Optimization, Estimation, and Control, Taylor & Francis, 1975. Dreyfus, S. E. and A. M. Law, The Art and Theory of Dynamic Programming, Academic Press, 1977. Ho, Y.-C., Lecture Notes, Harvard University, 1997.
24
References: National Institute of Standards and Technology, Dictionary of Algorithms, Data Structures, and Problems, Ortega, A. and K. Ramchandran, “Rate-Distortion Methods for Image and Video Compression: An Overview,” IEEE Signal Processing Magazine, Nov
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.