Download presentation
Presentation is loading. Please wait.
Published byBartholomew Gordon Modified over 9 years ago
1
Upper Confidence Trees for Game AI Chahine Koleejan
2
Background on Game AI For many years, computer chess was considered an ideal sandbox for testing AI algorithms Simple rules and clear benchmarks of performance against human intelligence Alpha-beta search programs domination over human players changed this
3
The Game of Go Researchers moved on to Go as their new challenge The game of Go is much harder to crack: 1.Massive search space – 19x19 board -> up to 361 possible moves per turn – More than 10 170 possible states 2. Game itself is very complex – Hard to find good heuristics
4
Example of a Game of Go Honinbo Shusaku(Black) vs Gennan Inseki(White), 1846
5
The Multi-arm Bandit Setting Hypothetical probability settting Gambler is at a row of k-”bandits” When a bandit is pulled the gambler gets some amount of money Each bandit has a different probability distribution The gambler must decide which bandits to pull to maximise his reward
6
Exploitation and Exploration We need to balance the exploitation of the action currently believed to be optimal with the exploration of other actions that may be better in the long run Upper Confidence Bound: – We want to maximise this value for an arm j: UCB1 = x̅ j + √[(2 ln n)/n j ]
7
Why do we care?
8
Sequential decision making games are basically a multi-arm bandit problem!
9
Why do we care? Sequential decision making games are basically a multi-arm bandit problem! …But worse.
10
Why do we care? Sequential decision making games are basically a multi-arm bandit problem! …But worse. …But it’s close enough so we can use the math.
11
Monte Carlo Tree Search(MCTS) A tree search method which has revolutionised computer Go Works by simulating thousands of random games Does not need any prior knowledge of the game Does not need heuristics or evaluation functions, just observes the outcome of the simulation
12
UCT Algorithm We have a tree where each node has a value given by the UCB1 bound Steps of the algorithm: 1.Selection 2.Expansion 3.Simulation 4.Backpropagation
13
Selection and Expansion Starting at root node, recursively choose the child with the highest value until we reach an expandable node A node is expandable if it is non-terminal and has unvisited children One child node is added to our tree
14
Simulation A simulation is run from the new node to the end of the game according to our defined default policy At the most basic level the default policy is just random legal play
15
Backpropagation The simulation result is “backed up” (i.e backpropagated) up the tree through the selected nodes to update their value For example, +1 if we won and -1 if we lost
16
Example
17
References A Survey of Monte Carlo Tree Search Methods, Cameron B. Browne and co. IEEE Transactions on Computational Intelligence and AI in Games, 2012 Monte-Carlo tree search and rapid action value estimation in computer Go, Sylvain Gelly & David Silver, Artificial Intelligence 175, 2011
18
If you’re interested in Go talk to me! It’s really cool!
19
Othello Demo
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.