Progressive Strategies For Monte-Carlo Tree Search Presenter: Ling Zhao University of Alberta November 5, 2007 Authors: G.M.J.B. Chaslot, M.H.M. Winands,

Slides:



Advertisements
Similar presentations
Artificial Intelligence Presentation
Advertisements

Alpha-Beta Search. 2 Two-player games The object of a search is to find a path from the starting position to a goal position In a puzzle-type problem,
Types of Algorithms.
Monte Carlo Tree Search: Insights and Applications BCS Real AI Event Simon Lucas Game Intelligence Group University of Essex.
CS 484 – Artificial Intelligence
Tic Tac Toe Architecture CSE 5290 – Artificial Intelligence 06/13/2011 Christopher Hepler.
Branch & Bound Algorithms
Artificial Intelligence in Game Design Heuristics and Other Ideas in Board Games.
Artificial Intelligence in Game Design
Chapter 12: Expert Systems Design Examples
This time: Outline Game playing The minimax algorithm
The Move Decision Strategy of Indigo Author: Bruno Bouzy Presented by: Ling Zhao University of Alberta March 7, 2007.
Combining Tactical Search and Monte-Carlo in the Game of Go Presenter: Ling Zhao University of Alberta November 1, 2005 by Tristan Cazenave & Bernard Helmstetter.
Solving Probabilistic Combinatorial Games Ling Zhao & Martin Mueller University of Alberta September 7, 2005 Paper link:
1 Solving Ponnuki-Go on Small Board Paper: Solving Ponnuki-Go on small board Authors: Erik van der Werf, Jos Uiterwijk, Jaap van den Herik Presented by:
Depth Increment in IDA* Ling Zhao Dept. of Computing Science University of Alberta July 4, 2003.
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
Recursion Chapter 7. Chapter 7: Recursion2 Chapter Objectives To understand how to think recursively To learn how to trace a recursive method To learn.
Recursion Chapter 7. Chapter 7: Recursion2 Chapter Objectives To understand how to think recursively To learn how to trace a recursive method To learn.
Alpha-Beta Search. 2 Two-player games The object of a search is to find a path from the starting position to a goal position In a puzzle-type problem,
SlugGo: A Computer Baduk Program Presenter: Ling Zhao April 4, 2006 by David G Doshay, Charlie McDowell.
Game Playing: Adversarial Search Chapter 6. Why study games Fun Clear criteria for success Interesting, hard problems which require minimal “initial structure”
1 Monte-Carlo Planning: Policy Improvement Alan Fern.
Heuristic Search Heuristic - a “rule of thumb” used to help guide search often, something learned experientially and recalled when needed Heuristic Function.
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Issues with Data Mining
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
CISC 235: Topic 6 Game Trees.
Monte-Carlo Tree Search
Parallel Monte-Carlo Tree Search with Simulation Servers H IDEKI K ATO †‡ and I KUO T AKEUCHI † † The University of Tokyo ‡ Fixstars Corporation November.
Artificial Intelligence in Game Design Lecture 22: Heuristics and Other Ideas in Board Games.
Upper Confidence Trees for Game AI Chahine Koleejan.
Recursion Chapter 7. Chapter Objectives  To understand how to think recursively  To learn how to trace a recursive method  To learn how to write recursive.
Renju Presented by JungYun Lo National Dong Hwa University Department of Computer Science and Information Engineering Artificial Intelligence Laboratory.
October 3, 2012Introduction to Artificial Intelligence Lecture 9: Two-Player Games 1 Iterative Deepening A* Algorithm A* has memory demands that increase.
1 Near-Optimal Play in a Social Learning Game Ryan Carr, Eric Raboin, Austin Parker, and Dana Nau Department of Computer Science, University of Maryland.
Algorithms & Data Structures for Games
Game Playing. Towards Intelligence? Many researchers attacked “intelligent behavior” by looking to strategy games involving deep thought. Many researchers.
1 N -Queens via Relaxation Labeling Ilana Koreh ( ) Luba Rashkovsky ( )
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Evaluation-Function Based Monte-Carlo LOA Mark H.M. Winands and Yngvi Björnsson.
Senior Project Poster Day 2007, CIS Dept. University of Pennsylvania Reversi Meng Tran Faculty Advisor: Dr. Barry Silverman Strategies: l Corners t Corners.
Monte-Carlo methods for Computation and Optimization Spring 2015 Based on “N-Grams and the Last-Good-Reply Policy Applied in General Game Playing” (Mandy.
Randomized Parallel Proof-Number Search ACG 12, Pamplona, May 2009.
RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.
Types of Algorithms. 2 Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We’ll talk about a classification.
Asymptotic Behavior Algorithm : Design & Analysis [2]
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
1 Monte-Carlo Tree Search Alan Fern. 2 Introduction  Rollout does not guarantee optimality or near optimality  It only guarantees policy improvement.
Game tree search Chapter 6 (6.1 to 6.3 and 6.6) cover games. 6.6 covers state of the art game players in particular. 6.5 covers games that involve uncertainty.
Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
The Standard Genetic Algorithm Start with a “population” of “individuals” Rank these individuals according to their “fitness” Select pairs of individuals.
RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.
February 25, 2016Introduction to Artificial Intelligence Lecture 10: Two-Player Games II 1 The Alpha-Beta Procedure Can we estimate the efficiency benefit.
Artificial Intelligence in Game Design Board Games and the MinMax Algorithm.
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Stochastic tree search and stochastic games
Iterative Deepening A*
AlphaGo with Deep RL Alpha GO.
Alpha-Beta Search.
Alpha-Beta Search.
The Alpha-Beta Procedure
Introduction to Artificial Intelligence Lecture 9: Two-Player Games I
Alpha-Beta Search.
BEST FIRST SEARCH -OR Graph -A* Search -Agenda Search CSE 402
(1) Breadth-First Search  S Queue S
Alpha-Beta Search.
Alpha-Beta Search.
Presentation transcript:

Progressive Strategies For Monte-Carlo Tree Search Presenter: Ling Zhao University of Alberta November 5, 2007 Authors: G.M.J.B. Chaslot, M.H.M. Winands, J.W.H.M. Uiterwijk, H.J. van den Herik and B. Bouzy

2 Outlines Monte-Carlo Tree Search (MCTS) and the implementation in MANGO. Progressive strategies: progressive bias and progressive unpruning. Experiments. Conclusions and future work.

3 MCTS

4 Selection Process: select moves in UCT tree for the best balance between exploitation and exploration. A multi-armed bandit problems. UCB formula: k: No. k child of node n, v i : value of node i n i : visit count of node i, n p : visit count of node p C: const Selection precondition: n p >= T (= 30)

5 Expansion Process: For a given leaf node, determine whether it will be expanded by storing one or more of its children in UCT tree. Simple rule: expand one node per simulated game (the first node encountered not in UCT tree). In MANGO, if n p = T (= 30), all its children will be expanded.

6 Simulation Process: self-play until the end of the game. Rules: 1. Disallow play in its eyes 2. Stop the game after a certain number of moves. In MANGO, the probability of a move being selected in simulation is proportional to its urgency, a sum of capture value, 3x3 pattern value and proximity modification.

7 Backpropagation Process: using the result of a simulated game to update the nodes it traverses. Result: +1 for win, -1 for loss, 0 for draw v i of node i is computed by averaging the result of all simulated games made through it.

8 Progressive Strategies Soft transition between selection strategy and simulation strategy. Intuition: Selection strategy becomes more accurate than simulation one only when the number of games simulated is large. Progress strategy uses the information available for the selection strategy, and some expensive domain knowledge. Progress strategy is similar to the simulation strategy when a few games have been played, and converges to selection strategy when numerous games have been played.

9 Progressive Bias Direct search using possibly expensive heuristic knowledge. Modify the selection strategy, and make sure the influence decreases fast when many games have been played.

10 Progressive Bias Formula H i is a coefficient representing knowledge For children with n i =0, is replaced by M with M>>any v i, thus the children with the highest f(n i ) is selected. If n p  [30, 100], f(n i ) is dominant. If n p  (100, 500], f(n i ) has partial impact. When n p > 500, f(n i ) is dominated, but can be used for tie breaker.

11 Alternative Approach Using prior knowledge (Gelly and Silver): “Scalability of this approach to larger board sizes is an open question”.

12 Progressive Unpruning Reducing the branching factor artificially when the selection strategy is used. Increase the branching factor progressively when more games are simulated. Pruning or unpruning is done according to the heuristic value of the children.

13 Progressive Unpruning (Details) If n p = T, only k 0 (=5) children with highest heuristic values are not pruned. If n p > T, k = lg( n p /40) * k 0, children will be left unpruned. k = 5 ( n p = 40), 7 ( n p = 80), 10 ( n p = 120) Similar idea used by Coulom (progressive widening).

14 Heuristic Values Pattern value: learned offline using pattern matching (89,119 patterns from 2000 pro games). Capture value: the number of stones to be captured or to escape a capture with the move. Proximity value: Euclidean distance to the last move.

15 Heuristic Value Formula C i : Capture value P i : pattern value D k,i : distance to the k th last move  k = k/2 Computing P i the time consuming part

16 Time For Computing Heuristics Computing H is around 1000 times slower than playing a move in simulated game. So H is computed only once per node, when T (=30) games is played through it. Speed reduction is only 4%, since the number of nodes with visit count >= 30 is low compared to the total number of moves in simulated games.

17 Domain Knowledge Calls Vs. T

18 Visit Count Vs. Number of Nodes

19 Experiments Self played games on 13x13 board (10 sec per move): MANGO with progressive strategies won 91% of the 500 games against MANGO without progressive strategies. MANGO : 20,000 simulated games, 1 sec on 9x9, 2 sec on 13x13, 5 sec on 19x19. GNU Go: level 10 on 9x9 and 13x13, 0 on 19x19.

20 MANGO Vs. GNU Go

21 MANGO Vs. GNU Go Plain MCTS does not scale well to 13x13 or 19x19 board. Progressive strategies are useful on every board size. The two progressive strategies combined are most powerful, esp. in 19x19.

22 Tournament Results Always in the top half. But were negative results removed?

23 Conclusions and Future Work Two progressive strategies are useful by providing a soft transition between selection and simulation. Overhead is negligible. Combine with RAVE and UCT with prior knowledge. Combine with the advanced knowledge developed by Coulom. Using life and death information. Better progressive bias. P-A. Coquelin and R. Munos. Bandit Algorithm for Tree Search. Technical Report 6141, INRIA, 2007.