Probability CSE 473 – Autumn 2003 Henry Kautz
ExpectiMax
Hungry Monkey: 2-Ply Game Tree jump shake 2/3 1/3 1/6 5/6 1/6 5/6
ExpectiMax 1 – Chance Nodes 0 2/ / / / jump shake 2/3 1/3 1/6 5/6 1/6 5/6
ExpectiMax 2 – Max Nodes 2/ / / / jump shake 2/3 1/3 1/6 5/6 1/6 5/6
ExpectiMax 3 – Chance Nodes 1/2 1/3 2/ / / / jump shake 2/3 1/3 1/6 5/6 1/6 5/6
ExpectiMax 4 – Max Node 1/2 1/3 2/ / / / jump shake 2/3 1/3 1/6 5/6 1/6 5/6
Policies The result of the ExpectiMax analysis is a conditional plan (also called a policy): –Optimal plan for 2 steps: jump; shake –Optimal plan for 3 steps: jump; if (ontable) {shake; shake} else {jump; shake} Probabilistic planning can be generalized in many ways, including: –Action costs –Hidden state The general problem is that of solving a Markov Decision Process (MDP)
2 Player Games of Chance
Backgammon Branching factor: –Chance node: 21 –Max node: about 20 on average –Size of tree: O(c k m k ) –In practice: can search 3 plies Neurogammon & TD-Gammon (Tesauro 1995) –Learned weights on static evaluation function by playing against itself –Use results of games to optimize weights: “Punish” features that were on in losing games “Reward” features that were on in winning games –A kind of reinforcement learning –Became world’s best backgammon player!