Download presentation
Presentation is loading. Please wait.
Published byJaheim Alvarez Modified over 10 years ago
2
A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING
3
PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each decision results in some reward or cost, and results in the system being moved to another state. Usually has a finite number of transitions. Transitions can be probabilistic, as can the rewards. Solution is a decision strategy that maximizes summed reward (minimizes cost)
4
Notation N = finite planning horizon S n (x) = cost of optimally operating from n to N given state x at time n. d n *(x) is the optimal policy at stage n given state x at time n. x(d n ) is the state resulting from deciding d at stage n. c(d n ) is the cost of taking decision d n
5
EXAMPLE You have moved to Singapore, and you need to operate a car for 3 yrs. You plan to sell the car when you leave Your QOL is not affected by your wheels Cost/resale of cars and operating costs are below 0123 sale price1000800450150 op cost200400600
6
MAPPING TO THE NOTATION State: Age of you car Stage: Years you have been in S-pore Policy: Car’s age you buy at the END of the year
7
COST EXAMPLE you have a 2yr old car you operate for the year ($600) you sell your 3 yr old car (-$150) you buy a new (to you) 1 yr old used car ($800) TOTAL: $1250
8
finish 0123 start0400200 1950750400 214501250900600
9
car age"cost" end of yr 3 0-1000 1-800 2-450 3-150
10
CONTINUED COST EXAMPLE It’s beginning yr 2, and you possess a 2 yr old car You can.... operate the car (600 + S 3 (3yr old car)) operate the car, sell it, buy new car (600 -150 + 1000 + S 3 (new)) operate the car, sell it, buy 1yr old car (600 -150 + 800 + S 3 (1 yr old car))...
11
123 "cost" end of yr 3 01200-200-600-1000 11550350-50-800 21700850450-450 3 -150 1450 1250 900
12
123 "cost" end of yr 3 01200-200-600-1000 11550350-50-800 21700850450-450 3 -150
13
BELLMAN’S EQUATION Sometimes its easy to get your name on something!
14
EXEMPLAR A specialized tool is available during the period 9am,..., 3pm Each hour, a bid for the asset is made according to the table below The asset is busy for 3 hr. if the bid is accepted 9101112123 100150160501754010
15
0 000 100 150 160 9 1112 1 10 2 end 00 50175 10 40 0
16
0 000 100 150 160 91112 1 10 2 end 00 50175 10 40 0 10
17
0 000 100 150 160 91112 1 10 2 end 00 50175 10 40 0 10
18
0 000 100 150 160 91112 1 10 2 end 00 50175 10 40 0 10 175
19
0 000 100 150 160 91112 1 10 2 end 00 50175 10 40 0 10 175
20
0 000 100 150 160 91112 1 10 2 end 00 50175 10 40 0 10 175
21
0 000 100 150 160 91112 1 10 2 end 00 50175 10 40 0 10 175 325
22
0 000 100 150 160 91112 1 10 2 end 00 50175 10 40 0 10 175 325 Note 1: Once the diagram is drawn, the problem can be solved by a shortest(longest) path algorithm Note 2: Dynamic Programming = Shortest Path
23
PROBABILISTIC TRANSITIONS 1.c(d) is a random variable 2.x(d) is random 3.the “trial” takes place after the decision
24
EXEMPLAR (Probabilistic) An “asset” is available during the period 8pm, 9pm,..., 3am Each hour, a bid for the asset is made according to the discrete probability density below The asset is busy for 3 hr. if the bid is accepted
25
MANY APPROACHES TO FORMULATION N = 4am S n (x) = profit of optimally operating from n to N given state x at time n. d n *(x) is the optimal policy at stage n given state x at time n (ACCEPT, REJECT) c(d n ) is the profit of taking decision d n x(d n ) is the proposed bid (3,6,9) or the number of hours left in the remaining engagement (1hr, 2hr)
26
RECURSION time hours before asset is available again See DP Example.xls
27
UNLOCKING THE JARGON x(d) can be governed by a Markov Chain a different P i,j matrix for each decision d Result is a Markov Decision Process
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.