Session 5a.

Session 5a

Agenda Sequential decision making Deterministic dynamic programming
Important concepts state space value-to-go function principle of optimality Examples: Fishing problem and Little Bear Oil problems Shortest path Resource allocation

A Simple Motivating Example
What is the shortest travel time from A to D? “Myopic policy” chooses the shortest travel time for the current step, without considering its future impact. It is generally not optimal! B 5 min 25 min A D 10 min C 10 min There are only two paths from A to D. The first is to go through, and the second path is ACD. It is clear that we should choose ACD which takes only 20min. ABD takes 30 min. However, when we did the computation, we are considering not only the travel time of the current step, but also its impact on our future choice. For ABD, current travel time is 5min, but in the future (next step), it will take 35 min. For ACD, current travel time is 10min, in the future (next step) it will take 10min.

Sequential Decision Problems
Decisions must often be made in a sequential manner over time. Earlier decisions may affect the feasibility and performance of later decisions. Myopic decisions that optimize only the immediate impact are frequently suboptimal for the overall process. To find optimal strategies, one must consider current and future decisions simultaneously. In some cases, the sequential nature of the decision process is obvious, in other cases one reinterprets the original problem as a sequential decision problem.

Dynamic Programming A technique that can be used to solve many optimization problems, including sequential decision problems. Usually obtains solutions by working backward from the end of the problem toward the beginning, thus breaking up a large, unwieldy problem into a series of smaller, more tractable problems.

Fishing Problem You own both a lake and a fishing boat as an investment package. You plan to profit by taking fish from the lake. Each season you decide either to fish or not to fish. If you do not fish: population will double by the start of the next season. If you fish: extract 70% of the fish that were in the lake at the beginning of the season. The fish population at the beginning of the next season will be the same as at the beginning of the current season. The initial fish population is 10 tons. Your profit is $1 per ton. Interest rate is constant at 25%. You have only three seasons to fish.

Fishing Problem Each year we have two options: fish, not fish
There are 8 possible choices (alternatives) over three years: What is the cash flow for each alternative? What if you have 10 seasons to fish? We need to calculate the cash flow for each alternative. Consider Alternative 1: (Fish, Fish, Fish) The cash flow in year 1: $1 * 70% * current population =$7 The cash flow in year 2: $1 *70% * population =0.7*10=$7. Discounted ($7*0.8=5.6) Cash flow in Year 3: $1 * 70% *population =0.7*10=$7. Discounted ($7*0.8*0.8=4.48) Total =17.08 Alternative 3: (Fish, No, Fish) The cash flow in year 2: $0 Cash flow in Year 3: $1 * 70% *population =0.7*20=$14. (why multiply by 20?) Discounted ($14*0.8*0.8=$8.96) Total =15.96 Alternative 4: (No, Fish, Fish) The cash flow in year 1: $0 The cash flow in year 2: $1*70% * population =0.7*20=$14. (Discounted $11.2) Cash flow in Year 3: $1 * 70% *population =0.7*20=$14. Discounted ($14*0.8*0.8=$8.96) Total =$20.16 It seems what’s important in this calculation is to know what is the fish population at the beginning of each year, this quantity depends on the earlier decision. (Keep this in mind).

Decision Tree Representation
Draw a decision tree here. And do the calculation. Emphasize again the current population is important in the calculation and for decisions in the future.

Binomial Lattice Season 1 Season 2 Season 3 Season 4 10 20 40 80 7 14
10 20 40 80 7 14 28 10 20 40 7 14 10 20 7 10

Observations Only the fish population in the lake is relevant at any time The nodes are marked with the starting fish population The manner by which that population was achieved has no effect on future cash flow There are several paths to reach the same node. The value on an arc indicates the cash flow associated with the decision that arc represents. Note: the discount rate is 25%, so the one period time value of money is: =0.80

Backward Recursion We assign the value of 0 to each of the final nodes (start of Season 4), since once we are there we can no longer fish.

Backward Recursion In Season 3, no matter what the fish population, the optimal strategy is to fish. If fish population is 40, the max cash flow is $1/ton*28 tons = $28 If fish population is 20, the max cash flow is $1/ton*14 tons = $14 If fish population is 10, the max cash flow is $1/ton*7 tons = $7

Profit from Not Fishing
Backward Recursion In Season 2: If fish population is 20, then max cash flow is max((0.8 * 28), ( * 14)) = 25.2 If fish population is 10, then max cash flow is max((0.8 * 14), ( * 7)) = 12.6 Profit from Not Fishing Profit from Fishing

Profit from Not Fishing
Profit from Fishing

Backward Recursion In Season 1, the starting fish population is 10.
Backward Recursion In Season 1, the starting fish population is 10. The max cash flow is max((0.8 * 25.2), ( * 12.6)) =

Optimal Strategy Do not fish the first season (to let the fish population increase) Do fish the next two seasons (to harvest the population) Myopic policy is not optimal (for this example)! What is the optimal strategy if you have four seasons to fish? Optimal strategy depends on the length of planning horizon! Not to fish in the first two years and then fish. 7 0

Binomial Lattice Season 1 Season 2 Season 3 Season 4 10 20 40 80 7 14
10 20 40 80 7 14 28 10 20 40 7 14 10 20 7 10

Binomial Lattice: 4 Seasons
10 20 40 80 160 10 20 40 80 10 20 40 10 20 10

Characteristics of DP Applications
Characteristics of DP Applications Characteristic 1 The problem can be divided into stages with a decision required at each stage. Characteristic 2 Each stage has a number of states associated with it. By a state, we mean the information that is needed at any stage to make an optimal decision. Characteristic 3 The decision chosen at any stage describes how the state at the current stage is transformed into the state at the next stage (transition state).

Characteristics of DP Applications
Characteristics of DP Applications Characteristic 4 Given the current state, the optimal decision for each of the remaining stages must not depend on previously reached states or previously chosen decisions. This idea is known as the principle of optimality. Characteristic 5 If the states for the problem have been classified into on of T stages, there must be a recursion that related the cost or reward earned during stages t, t+1, …., T to the cost or reward earned from stages t+1, t+2, …. T (cost/value-to-go function).

Characteristics of Fishing Example
Characteristics of Fishing Example Characteristic 1: The problem can be divided into stages with a decision required at each stage. Problem divided into seasons (each stage is a season) At each season, a decision is to be made: to fish or not to fish Characteristic 2: Each stage has a number of states associated with it The state of each stage is the fish population: 10 tons, 20 tons, 40 tons, etc. This information is needed to make an optimal decision

Characteristics of Fishing Example Characteristic 3: The decision chosen at any stage describes how the state at the current stage is transformed into the state at the next stage The “Not to fish” decision at Year 0, transforms the current state (10 tons) to the state (20 tons) at the end of Year 1. The “To fish” decision at Year 0, transforms the current state (10 tons) to the state (10 tons) at the end of Year 1. Characteristic 4: Given the current state, the optimal decision for each of the remaining stages must not depend on previously reached states or previously chosen decisions. At each stage, the decision depends only on the state (fish population), not on how we reached that population.

Characteristics of Fishing Example Characteristic 5: A recursion that related the cost or reward earned during stages t, t+1, …., T to the cost or reward earned from stages t+1, t+2, …. T. (T is the end of the planning horizon) Let 𝑏 𝑡 be the population at the end of stage t = 0, 1, 2, … Let 𝑓 𝑡 (𝑏 𝑡 ) be the (discounted) cash flow (value function) from stage t to stage T Immediate value in current stage Future impact

Shortest Path Problem Ben plans to drive from NY to LA. Has friends in several cities After 1 day’s driving can reach Columbus, Nashville, or Louisville After 2 days of driving can reach Kansas City, Omaha, or Dallas After 3 days of driving can reach Denver or San Antonio After 4 days of driving can reach Los Angeles The actual mileages between cities are given in the figure (next slide) Where should Ben spend each night of the trip to minimize the number of miles traveled?

CMH 2 MCI 5 DEN 8 JFK 1 BNA 3 OMA 6 LAX 10 SAT 9 SDF 4 DFW 7 680 610
CMH 2 680 MCI 5 610 790 DEN 8 790 1050 550 1030 580 540 JFK 1 900 BNA 3 760 OMA 6 LAX 10 660 940 1390 SAT 9 770 510 790 Discuss the traveling salesman problem. And why it can not be solved efficiently by DP. 700 SDF 4 DFW 7 270 830 Today Day 1 Day 2 Day 3 Day 4

Shortest Path Example: DP
Shortest Path Example: DP Problem can be divided into five stages Stage 1 has one state (city 1) Stage 2 has three states (cities 2, 3, 4) Stage 3 has three states (cities 5, 6, 7) Stage 4 has two states (cities 8, 9) Stage 5 has one state (city 10) At each stage and each state, a decision must be made: where to go in the next stage Current state and decision determines the transition state in the next stage Given the current state (where Ben is), the decisions for the remaining stages do not depend on how he got to where he is now.

Shortest Path Example: DP
Shortest Path Example: DP Backward recursion Aspect 1: feasible solution (the next places Ben can go directly from here) Aspect 2: immediate cost is the cost of the arc that Ben chooses now Aspect 3: transition state depends on where Ben is and the arc he chooses next.

Shortest Path Example: Solution
Shortest Path Example: Solution Let 𝑐 𝑖𝑗 be the mileage between cities 𝑖 and j. Let 𝑓 𝑡 (𝑖) be the length of the shortest path from city 𝑖 to LAX (city 𝑖 is in stage t) Stage 4 computations are obvious: 𝑓 4 8 =1030 𝑓 4 9 =1390

Stage 3 Computation To determine 𝑓 3 5 , note that the shortest path from city 5 to LA must be one of the following: Path 1: Go from city 5 to city 8 and then take the shortest path from city 8 to city 10. Path 2: Go from city 5 to city 9 and then take the shortest path from city 9 to city 10. Similarly MCI-DEN-LAX MCI-SAT-LAX OMA-DEN-LAX OMA-SAT-LAX DFW-DEN-LAX DFW-SAT-LAX

Stage 2 Computation Work backward one stage: CMH-MCI-DEN-LAX
Stage 2 Computation Work backward one stage: CMH-MCI-DEN-LAX CMH-OMA-DEN-LAX CMH-DFW-SAT-LAX BNA-MCI-DEN-LAX BNA-OMA-DEN-LAX BNA-DFW-SAT-LAX SDF-MCI-DEN-LAX SDF-OMA-DEN-LAX SDF-DFW-SAT-LAX

Stage 1 Computation Shortest distance from JFK to LAX:
Stage 1 Computation Shortest distance from JFK to LAX: JFK-CMH-MCI-DEN-LAX JFK-BNA-MCI-DEN-LAX JFK-SDF-MCI-DEN-LAX Checking back our calculations, the shortest path is 1 – 2 – 5 – 8 – 10 that is, NY – Columbus – Kansas City – Denver – LA with total mileage 2870.

Capital Budgeting The Tatham Company is considering seven investments.
The cash required for each investment and the net present value (NPV) each investment adds to the firm are listed in the table. The cash available for investment is $15,000. Tatham wants to find the investment policy that maximizes its NPV. If Tatham wants to take part in any of these investments, it must buy 100%.

(Should do 1, 2, and 5; total NPV $46K, cost $14.5K.)
Calculate for each project the ratio NPV/Cash Required Choose projects with the highest ratios: Project 4 Project 1 Project 2 Project 5 (not enough fund for project 5) Total NPV: $43.5K. Not Optimal! (Should do 1, 2, and 5; total NPV $46K, cost $14.5K.) Stage i: allocating fund to investment i (whether or not to invest in i) State: amount available for {I, i+1, … 7}. We may consider investment 8 with no NPV. Decision: state in stage i state in stage i+1 F8(x)=0 Fi(x)=max{f_{i+1}(x), NPV_i + F_{i+1}(x-C_i)}

Resource Allocation Example
Resource Allocation Example A corporation has $5 million to allocate to its three plants for possible expansion. Each plant has submitted a number of proposals on how it intends to spend the money. Each proposal gives the cost of the expansion and the total profit expected. The following table gives the proposals generated:

Resource Allocation Example
Resource Allocation Example Each plant will only be permitted to enact one of its proposals. The goal is to maximize the firm's revenues resulting from the allocation of the $5 million. We will assume that any of the $5 million we don't spend is lost (you can work out how a more reasonable assumption will change the problem as an exercise). For those of you who have already learned linear programming: This problem cannot be formulated as a linear program, for the revenues returned are not linear functions. (But can be formulated as a binary integer program.) We can solve it using dynamic programming.

Resource Allocation Example: Enumeration
Resource Allocation Example: Enumeration A straightforward way to solve this is to try all possibilities and choose the best. In this case, there are only a relative few ways of allocating the money. Many of these are infeasible (e.g. proposals 3, 4, and 1 for the three plants costs $6 million). Other combinations are feasible, but very poor (like proposals 1, 1, and 2, which is feasible but returns only $4 million). Disadvantages of total enumeration: For larger problems the enumeration of all possible solutions may not be computationally feasible. Infeasible combinations cannot be detected a priori, leading to inefficiency. Information about previously investigated combinations is not used to eliminate inferior, or infeasible, combinations.

Resource Allocation Example: DP
Resource Allocation Example: DP Let's break the problem into three stages: each stage represents allocating money to a single plant. We will artificially place an ordering on the stages, saying that we will first allocate to plant 1, then plant 2, then plant 3. (We could use other ordering as well.) Stage 1 represents allocating money to plant 1 Stage 2 allocating money to plant 2, Stage 3 allocating money to plant 3.

Resource Allocation Example: DP
Resource Allocation Example: DP Each stage is divided into states. A state encompasses the information required to go from one stage to the next. State space for stage 1 is {5}: decide the amount of money spent on plant 1, denoted by 𝑥 1 States for stage 2 are {0,1,2,3,4,5}: decide the amount of money spent on plants 1 and 2, denoted by 𝑥 2 States for stage 3 are {0,1,2,3,4,5}: decide the amount of money spent on plants 1, 2, and 3: denoted by 𝑥 3

5 5 5 5 5 4 4 4 5 3 3 3 5 2 2 2 5 1 1 1 Discuss the traveling salesman problem. And why it can not be solved efficiently by DP. 2 Plant 1 Plant 2 Plant 3 “Plant 4”

0+5=5 5 5 5 5 3+5=8 4 4 9+5=14 5 3 3 3 5 2 2 2 5 1 1 1 Discuss the traveling salesman problem. And why it can not be solved efficiently by DP. 2 Plant 1 Plant 2 Plant 3 Plant 4

5 5 5 5 0+5=5 4 4 3+5=8 5 3 3 3 9+2=11 5 2 2 2 5 1 1 1 Discuss the traveling salesman problem. And why it can not be solved efficiently by DP. 2 Plant 1 Plant 2 Plant 3 Plant 4

5 5 5 5 4 4 0+5=5 5 3 3 3 3+5=8 5 2 2 2 9+0=9 5 1 1 1 Discuss the traveling salesman problem. And why it can not be solved efficiently by DP. 2 Plant 1 Plant 2 Plant 3 Plant 4

5 5 5 5 4 4 5 3 3 3 0+5=5 5 2 2 2 3+2=5 5 1 1 1 Discuss the traveling salesman problem. And why it can not be solved efficiently by DP. 2 Plant 1 Plant 2 Plant 3 Plant 4

5 5 5 5 4 4 5 3 3 3 5 2 2 2 0+2=2 5 1 1 1 Discuss the traveling salesman problem. And why it can not be solved efficiently by DP. 2 3+0=3 Plant 1 Plant 2 Plant 3 Plant 4

5 5 5 5 4 4 5 3 3 3 5 2 2 2 5 1 1 1 2 Plant 1 Plant 2 Plant 3 Plant 4
5 5 5 5 4 4 5 3 3 3 5 2 2 2 5 1 1 1 Discuss the traveling salesman problem. And why it can not be solved efficiently by DP. 2 Plant 1 Plant 2 Plant 3 Plant 4

5 5 14 5 5 4 4 11 5 3 3 3 9 5 2 2 2 5 5 1 1 1 3 2 Discuss the traveling salesman problem. And why it can not be solved efficiently by DP. Plant 1 Plant 2 Plant 3 Plant 4

5 5 14 4 11 3 9 2 2 5 5 1 1 3 2 Plant 1 Plant 2 Plant 3 Plant 4
5 5 14 4 11 3 9 2 2 5 5 1 1 3 2 Discuss the traveling salesman problem. And why it can not be solved efficiently by DP. Plant 1 Plant 2 Plant 3 Plant 4

0+14=14 5 5 14 2+11=13 4 11 4+9=13 3 9 10+3=13 2 2 5 5 1 1 3 2 Discuss the traveling salesman problem. And why it can not be solved efficiently by DP. Plant 1 Plant 2 Plant 3 “Plant 4”

0+14=14 5 5 14 2+11=13 4 11 4+9=13 3 9 10+3=13 2 5 1 1 3 2 Discuss the traveling salesman problem. And why it can not be solved efficiently by DP. Plant 1 Plant 2 Plant 3 “Plant 4”

14 5 5 14 2 5 Plant 1 Plant 2 Plant 3 “Plant 4”
14 5 5 14 2 5 Discuss the traveling salesman problem. And why it can not be solved efficiently by DP. Plant 1 Plant 2 Plant 3 “Plant 4”

Assignment 2 Assignment 2 will be posted tomorrow on Juran’s download page You should be able to solve problem 1 now. Problem 2 is about stochastic dynamic programming, which you will learn next time.

Session 5a.

Similar presentations

Presentation on theme: "Session 5a."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Session 5a.

Similar presentations

Presentation on theme: "Session 5a."— Presentation transcript:

Similar presentations

About project

Feedback