Probabilistic Dynamic Programming
Gambling Problem Start with 3 chips Have 3 plays of the game P{win} = 2/3 Object: have 5 or more chips at game end
Gambling Problem Let n = play of the game (1, 2, 3) Sn = # chips on hand to begin play n Xn = # chips to bet at play n If s3 = 0, 1, 2, we can not bet enough chips to meet objective
Gambling Problem 3 4 5 2 1 P{win}=2/3 No bet P{Lose}=1/3
Gambling Problem Consider Bet 3 5 4 3 2 1 Consider Bet 3 If S3 > 5, we have met objective, x3 = 0, or x3 = s3 - 5
Gambling Problem Consider Bet 3 5 4 3 2 1 Consider Bet 3 If S3 > 5, we have met objective, x3 = 0, or x3 = s3 - 5 If S3 = 4, we could bet 1, 2, 3, or 4
Gambling Problem Consider Bet 3 5 4 3 2 1 Consider Bet 3 If S3 > 5, we have met objective, x3 = 0, or x3 = s3 - 5 If S3 = 4, we could bet 1, 2, 3, or 4 If S3 = 3, we could bet 2 or 3
Gambling Problem Consider Bet 3 5 4 3 2 1 Consider Bet 3 If S3 > 5, we have met objective, x3 = 0, or x3 = s3 - 5 If S3 = 4, we could bet 1, 2, 3, or 4 If S3 = 3, we could bet 2 or 3 If S3 < 2, we can’t win, no bet
Gambling Problem 3 4 5 2 1
Gambling Problem Consider Bet 2 5 4 3 2 1 Consider Bet 2 If S2 > 5, we have met objective, x2 = 0, or x2 = s2 - 5
Gambling Problem Consider Bet 2 5 4 3 2 1 Consider Bet 2 If S2 > 5, we have met objective, x2 = 0, or x2 = s2 - 5 If S2 = 4, we could bet 0, 1, 2, 3, or 4
Gambling Problem Consider Bet 2 5 4 3 2 1 Consider Bet 2 If S2 > 5, we have met objective, x2 = 0, or x2 = s2 - 5 If S2 = 4, we could bet 0, 1, 2, 3, or 4 If S2 = 3, we could bet 0, 1, 2 or 3 If S2 = 2, we can bet 1 or 2
Gambling Problem 5 4 3 2 1 Bet 2, If Sn = 0, 1 P{objective} = 0 Bet 2, If Sn = 0, 1 P{objective} = 0 If Sn = 2, x2 = 1 or 2
Gambling Problem 5 4 3 2 1 Bet 2, If Sn = 2, x2 = 1 Bet 2, If Sn = 2, x2 = 1 P{objective} = P{win stage 2 and win stage 3}
Gambling Problem 5 4 3 2 1 Bet 2, If xn = 1 Bet 2, If xn = 1 P{objective} = P{lose}P{win 3} + P{win}P{win 3} = 1/3(0) + 2/3(2/3) = 4/9
Gambling Problem 5 4 3 2 1 S2=4, x2=1 P{objective} = P{win2}P{win3} S2=4, x2=1 P{objective} = P{win2}P{win3} + P{lose2}P{win3} = (2/3)(1)+(1/3)(2/3) = 8/9
Gambling Problem 5 4 3 2 1 S1=3, X1=1 P(Objective) = 2/3P{meeting objective from S2=4} + 1/3P{meeting objective from S2=2} = (2/3)(8/9) + (1/3)(4/9) = 20/27 3 4 5 2 1
Recapture Optimal 5 4 3 2 1 x1* = 1 if win x2* = 1 x1* = 1 if win x2* = 1 if lose x2* = 1 or 2