Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax.

Similar presentations


Presentation on theme: "Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax."— Presentation transcript:

1 Probability CSE 473 – Autumn 2003 Henry Kautz

2 ExpectiMax

3 Hungry Monkey: 2-Ply Game Tree 0 0 1 0 0 0 1 0 1 1 2 1 0 0 1 0 jump shake 2/3 1/3 1/6 5/6 1/6 5/6

4 ExpectiMax 1 – Chance Nodes 0 2/3 0 0 1 0 0 1/6 0 0 1 0 1 7/6 1 1 2 1 0 1/6 0 0 1 0 jump shake 2/3 1/3 1/6 5/6 1/6 5/6

5 ExpectiMax 2 – Max Nodes 2/3 0 0 0 1 0 1/6 0 0 0 1 0 7/6 1 1 1 2 1 1/6 0 0 0 1 0 jump shake 2/3 1/3 1/6 5/6 1/6 5/6

6 ExpectiMax 3 – Chance Nodes 1/2 1/3 2/3 0 0 0 1 0 1/6 0 0 0 1 0 7/6 1 1 1 2 1 1/6 0 0 0 1 0 jump shake 2/3 1/3 1/6 5/6 1/6 5/6

7 ExpectiMax 4 – Max Node 1/2 1/3 2/3 0 0 0 1 0 1/6 0 0 0 1 0 7/6 1 1 1 2 1 1/6 0 0 0 1 0 jump shake 2/3 1/3 1/6 5/6 1/6 5/6

8 Policies The result of the ExpectiMax analysis is a conditional plan (also called a policy): –Optimal plan for 2 steps: jump; shake –Optimal plan for 3 steps: jump; if (ontable) {shake; shake} else {jump; shake} Probabilistic planning can be generalized in many ways, including: –Action costs –Hidden state The general problem is that of solving a Markov Decision Process (MDP)

9 2 Player Games of Chance

10 Backgammon Branching factor: –Chance node: 21 –Max node: about 20 on average –Size of tree: O(c k m k ) –In practice: can search 3 plies Neurogammon & TD-Gammon (Tesauro 1995) –Learned weights on static evaluation function by playing against itself –Use results of games to optimize weights: “Punish” features that were on in losing games “Reward” features that were on in winning games –A kind of reinforcement learning –Became world’s best backgammon player!


Download ppt "Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax."

Similar presentations


Ads by Google