Combining Tactical Search and Monte-Carlo in the Game of Go Presenter: Ling Zhao University of Alberta November 1, 2005 by Tristan Cazenave & Bernard Helmstetter
Outline Monte-Carlo Go Motivations Tactical search Gather statistics Combining search and Monte-Carlo Experimental results
Monte-Carlo Go Invented in 1993 by Bruegmann using simulated annealing. Based on Abramson’s expected- outcome model (1990). Achieved a moderate success in 9x9.
Basic idea Play a number of random games Choose a move by 1-ply search, maximizing expected score The only domain-dependent information is eye.
Weakness Scalability due to large computation Blunder due to lack of knowledge
Framework Tactical search: capture, connection, eye, life and death. Play random games and gather statistics for goals. Goal evaluation: mean score of the game when the goal is achieved minus that of the game when the goal fails. Pick the move associated with the best goal.
Tactical search Capture search: for any string, find if it can be captured or saved. Connection search: for any two strings, find if they can be connected. Empty connection search: find if a string can be connected to an empty point. Eye search: find if an eye can be made on an empty point or its neighbors. Life and death search: use generalized widening for groups of strings.
Statistics on random games Compute the mean for the random games where a goal is achieved and the mean for those where a goal has failed. Two new goals for intersections: 1. The goal of playing first on an intersection. 2. The goal of owning an intersection at the end of a game.
Selecting problems Strings cannot be disconnected will form groups. Select the simplest problem for a goal: Avoid over-estimating goals.
Gather statistics Play random games For each selected goal, find the mean of the game when it succeeds and the mean when it fails. Score of a life problem of a string: mean of the game when an intersection of the string keeps its color.
Choose a move Find the goal with the maximum difference of two mean scores. Choose the move associated with the goal.
Why is it useful? High level classifications of points on the board. Successful incorporation of tactical search.
Positive and negative goals Positive goals: confidence with search results. Negative goals: less confidence. Example: save a string (a string is consider safe when it has more than 4 liberties). Fix over-estimation.
Experimental results New enhancement vs. standard MC Each plays 10,000 random games to choose a move on 20 9x9 games result: 52.1 (+-34.2). First one plays 1,000 games, and the second one plays 10,000 games. result: 24.6 (+-40).
Experimental results (cont’d) New enhancement vs. Golois Both use the same tactical search. The second one uses global search and hand tuned heuristics. 40 games were played. result: 26 points.
Conclusions A creative idea to incorporate tactical search and Monte-Carlo. Nice extension to the authors’ previous work. The experimental results are good. The program should be tested against the strongest program.