Download presentation
Presentation is loading. Please wait.
Published byRichard Goodman Modified over 9 years ago
1
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 49 DP can give complete quantitative solution Example 1: Discrete, finite capacity, inventory control problem S k = C k = D k = {0, 1, 2} x k + u k 2: finite capacity x k+1 = max(0, x k + u k – w k ) x k + u k 2 u k 2 – x k Prob{w k =0}=0.1, Prob{w k =1}=0.7, Prob{w k =2}=0.2 no backlogging U(x k )={0,…,2-x k )
2
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 50 DP can give complete quantitative solution Example 1 continued: Inventory control problem N = 3 g n (x n ) = 0 g k (x k, u k, w k ) = u k + 1∙max(0, x k + u k – w k ) + 3 ∙ max(0, w k + x k – u k ) order holding lost demand
3
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 51 DP can give closed-form solution Example 2: A gambling model A gambler is going to bet in N successive plays. the gambler can bet any (nonnegative) amount up to his present fortune. What betting strategy maximizes his final fortune? P(lose) = p, P(win) = 1 – p = q: Bernoulli Solution: For convenience, and with no loss in generality, we look to maximize the log of the final fortune. The model is as follows. Utility of fortune 1 / wealth U(x) = log(x)
4
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 52 DP can give closed-form solution Example 2 continued: Variable definitions x k = fortune at beginning of kth play (after outcome of (k – 1)th play, before kth) u k = bet for kth play as a percentage of x k w k = 1: win w.p. p -1: lose w.p. q = 1 – p g k (x k, u k, w k ) = 0,0 k N – 1 g N (x N ) = -log(x N ) x k+1 = x k + w k u k x k to maximize
5
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 53 DP can give closed-form solution Example 2 continued: DP algorithm for the problem
6
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 54 DP can give closed-form solution Example 2 continued: Solving the DP at k=N-1 Thus, if p = 1 (q = 0) u * N-1 = 1 : bet it all! u * N-1 = p – q if 0 ≤ p < ½, then u * N-1 = 0 (p < q q log(1 – u N-1 ) dominates) p log(1 + u N-1 )+ q log(1 – u N-1 )< q log(1 – u 2 N-1 ) ≤ 0
7
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 55 DP can give closed-form solution Example 2 continued: Find critical points
8
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 56 DP can give closed-form solution Example 2 continued: Closed-form solution for k=N-1 Hence, C
9
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 57 DP can give closed-form solution Example 2 continued: Closed-form solution for k=N-1 Hence, can view these as constant functions (controls = %) or as feedback policies (total bet )
10
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 58 DP can give closed-form solution Example 2 continued: Solving the DP at k=N-2 Proceeding one stage (play) back:
11
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 59 DP can give closed-form solution Example 2 continued: General closed-from DP solution
12
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 60 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3: A stock option model x k : price of a given stock at beginning of kth day x k+1 = x k + w k = {w k }i.i.d.,w k ~ F( ) Random Walk
13
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 61 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: A stock option model Actions: Have an option to buy one share of the stock at fixed price c; N days to exercise option. If you buy when stock’s price is s: s – c = profit (can be negative) What strategy maximizes profit? Terminating Process (Bertsekas, Prob. 8, Ch. 1)
14
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 62 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: Solution
15
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 63 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: Solution However, process terminates (see prob. 8, ch. 1) when u k =B fictitious termination state T s.t. mixed symbolic and numeric states discrete event system
16
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 64 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: Solution Cost structure changed to: There is no simple analytical solution for J k (x k ) or u* k = *(x k ), but we can obtain some qualitative properties (structure) of solutions.
17
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 65 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: DP algorithm for the problem expected “profit-to-go”
18
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 66 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: Lemma (Ross) (i) J k (x k ) – x k + c is decreasing in x k after a certain value of stock price profit-to-go is negative buy none (ii) J k (x k ) is increasing and continuous in x k (backward induction) constant does not affect property
19
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 67 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: Theorem (Ross) There exists numbers s 1 ≤ s 2 ≤ … ≤ s N-k ≤ … ≤ s N such that where, These results can be used to solve the problem numberically, or to gain insight into the process. critical stock values k periods remaining
20
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 68 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: Remark For a deterministic situation, optimizing over policies (feedback) results in no advantage over optimizing over actions (sequences of controls/decisions) Hence, the optimization problem can be solved using linear/nonlinear programming. Furthermore, for a finite state and action deterministic problem, we can equivalently formulate the problem as a shortest path problem for an acyclic graph.
21
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 69 DP can be used to obtain qualitative properties (structure) of optimal solutions Example 3 continued: Forward search There are efficient ways to find shortest path, e.g. Branch and Bound algorithms. However, DP has some advantages: always leads to global optimum can handle difficult constraint sets c 01 c 02 c 03 c ij 1 2 3 0 0 0 k=0k=1k=2k=N-1k=N start End (Artificial)...
22
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 70 DP can handle difficult constraint sets Example 4: Integer-valued variables Remark: reachable set from x 0 = 1 is Z 2 : no cost at final stage N=2
23
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 71 Example 4 continued: Solution k = 2 k = 1 one-stage cost J2J2 singleton DP can handle difficult constraint sets
24
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 72 Example 4 continued: Solution k = 0 DP can handle difficult constraint sets
25
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 73 Example 4 continued: Optimal Policy k = 0 DP can handle difficult constraint sets
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.