Evaluation-Function Based Monte-Carlo LOA Mark H.M. Winands and Yngvi Björnsson.

Slides:



Advertisements
Similar presentations
Anthony Cozzie. Quite possibly the nerdiest activity in the world But actually more fun than human chess Zappa o alpha-beta searcher, with a lot of tricks.
Advertisements

Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Types of Algorithms.
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
For Friday Finish chapter 5 Program 1, Milestone 1 due.
Games & Adversarial Search
How to Win a Chinese Chess Game Reinforcement Learning Cheng, Wen Ju.
Monte Carlo Tree Search: Insights and Applications BCS Real AI Event Simon Lucas Game Intelligence Group University of Essex.
CS 484 – Artificial Intelligence
Adversarial Search Chapter 6 Section 1 – 4.
Adversarial Search Chapter 5.
Adversarial Search: Game Playing Reading: Chapter next time.
Adversarial Search CSE 473 University of Washington.
Adversarial Search Chapter 6.
MINIMAX SEARCH AND ALPHA- BETA PRUNING: PLAYER 1 VS. PLAYER 2.
An Introduction to Artificial Intelligence Lecture VI: Adversarial Search (Games) Ramin Halavati In which we examine problems.
1 Adversarial Search Chapter 6 Section 1 – 4 The Master vs Machine: A Video.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Lecture 13 Last time: Games, minimax, alpha-beta Today: Finish off games, summary.
Trade off between Exploration and Exploitation in Satisficing Planning Fan Xie.
CPSC 322 Introduction to Artificial Intelligence October 25, 2004.
This time: Outline Game playing The minimax algorithm
Progressive Strategies For Monte-Carlo Tree Search Presenter: Ling Zhao University of Alberta November 5, 2007 Authors: G.M.J.B. Chaslot, M.H.M. Winands,
Solving Probabilistic Combinatorial Games Ling Zhao & Martin Mueller University of Alberta September 7, 2005 Paper link:
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6: Adversarial Search Fall 2008 Marco Valtorta.
How computers play games with you CS161, Spring ‘03 Nathan Sturtevant.
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
Games & Adversarial Search Chapter 6 Section 1 – 4.
Alpha-Beta Search. 2 Two-player games The object of a search is to find a path from the starting position to a goal position In a puzzle-type problem,
Group 1 : Ashutosh Pushkar Ameya Sudhir From. Motivation  Game playing was one of the first tasks undertaken in AI  Study of games brings us closer.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
CSC 412: AI Adversarial Search
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
Monte-Carlo Tree Search
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Game Playing. Introduction Why is game playing so interesting from an AI point of view? –Game Playing is harder then common searching The search space.
Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.
Game Playing.
Upper Confidence Trees for Game AI Chahine Koleejan.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.
University of Amsterdam Search, Navigate, and Actuate – Search through Game Trees Arnoud Visser 1 Game Playing Search the action space of 2 players Russell.
Adversarial Search Chapter 6 Section 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Instructor: Vincent Conitzer
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Monte-Carlo methods for Computation and Optimization Spring 2015 Based on “N-Grams and the Last-Good-Reply Policy Applied in General Game Playing” (Mandy.
Game Playing. Introduction One of the earliest areas in artificial intelligence is game playing. Two-person zero-sum game. Games for which the state space.
CSCI 4310 Lecture 6: Adversarial Tree Search. Book Winston Chapter 6.
For Friday Finish chapter 6 Program 1, Milestone 1 due.
GAME PLAYING 1. There were two reasons that games appeared to be a good domain in which to explore machine intelligence: 1.They provide a structured task.
1 Monte-Carlo Tree Search Alan Fern. 2 Introduction  Rollout does not guarantee optimality or near optimality  It only guarantees policy improvement.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
The Standard Genetic Algorithm Start with a “population” of “individuals” Rank these individuals according to their “fitness” Select pairs of individuals.
Adversarial Search. Regular Tic Tac Toe Play a few games. –What is the expected outcome –What kinds of moves “guarantee” that?
Adversarial Search 2 (Game Playing)
Adversarial Search and Game Playing Russell and Norvig: Chapter 6 Slides adapted from: robotics.stanford.edu/~latombe/cs121/2004/home.htm Prof: Dekang.
Parallel Programming in Chess Simulations Part 2 Tyler Patton.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 5 Adversarial Search (Thanks Meinolf Sellman!)
Luca Weibel Honors Track: Competitive Programming & Problem Solving Partisan game theory.
Adversarial Search Chapter 5 Sections 1 – 4. AI & Expert Systems© Dr. Khalid Kaabneh, AAU Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
ADVERSARIAL SEARCH Chapter 6 Section 1 – 4. OUTLINE Optimal decisions α-β pruning Imperfect, real-time decisions.
1 Chapter 6 Game Playing. 2 Chapter 6 Contents l Game Trees l Assumptions l Static evaluation functions l Searching game trees l Minimax l Bounded lookahead.
Artificial Intelligence AIMA §5: Adversarial Search
Stochastic tree search and stochastic games
Last time: search strategies
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
PN, PN2 and PN* in Lines of Action
Presentation transcript:

Evaluation-Function Based Monte-Carlo LOA Mark H.M. Winands and Yngvi Björnsson

Department of Knowledge Engineering – Maastricht University Outline Background MC-LOA Evaluators and Simulations Experiments Conclusions

Department of Knowledge Engineering – Maastricht University Introduction Monte-Carlo Tree Search (Kocsis et al., 2006, Coulom 2007) ‏ Revolution in Go Applied in Amazons, Clobber, Havannah, Hex, GGP etc Here we focus on the sudden-death game Lines of Action (LOA) ‏

Department of Knowledge Engineering – Maastricht University Test Domain: Lines of Action (LOA) ‏ Two-person zero-sum chess-like connection game A move takes place in a straight line, exactly as many squares as there are pieces of either colour anywhere along the line of movement Goal is to group your pieces into one connected unit

Department of Knowledge Engineering – Maastricht University Computer LOA 1 st LOA program, Stanford AI laboratory 1975 Since 2000, it is also played at the Computer Olympiad Several programs: YL (Björnnson), Mona (Billings), Bing (Helmstetter), T-T (JAIST Lab), Lola (Coulom) ‏ All programs use αβ search + evaluation function αβ search: null-move, multi-cut, RPS Evaluation functions are quite advanced

Department of Knowledge Engineering – Maastricht University Monte-Carlo Tree Search (MCTS) ‏ A best-first search guided by the results of Monte-Carlo simulations We will discuss the steps for MC-LOA Play-out

Department of Knowledge Engineering – Maastricht University Selection Step Traversal from the root until we reach a position which is not part of the tree Selection Strategy: Controls exploitation and exploration balance UCT (Kocsis, 2006) and Progressive BIAS (PB) (Chaslot et al. 2008) Selects the child k of the node p according: UCT: v i is the value of the node i, n i is the visit count of i, and n p is the visit count of p. C is a coefficient PB: P mc is the transition probability of a move category mc, W is a constant UCT PB

Department of Knowledge Engineering – Maastricht University Progressive Bias: Transition Probabilities Transition Probabilities originally used in Realization Probability Search (Tsuruoka, 2002) ‏ Transition Probability: Probability that a move belonging to a category mc will be played Statistics are retrieved from self-play

Department of Knowledge Engineering – Maastricht University Progressive Bias: Move Categories Move categories in LOA: –Captures or non-captures –Origin and destination of the move’s from and to squares –Number of squares traveled from- or towards to the centre-of-mass 277 move categories

Department of Knowledge Engineering – Maastricht University Simulation / Play-out If the node has been visited fewer times than a threshold T, the next move is selected pseudo-randomly according to the simulation strategy Their move category weights At a leaf node: –All moves are generated and checked for a mate in one Play-out step: outside the tree, self-play

Department of Knowledge Engineering – Maastricht University Backpropagation Step The result of a simulated game k is back- propagated from the leaf node L, through the previously traversed path (+1 win, 0 draw, -1 loss) ‏ The value v of a node is computed by taking the average of the results of all simulated games made through this node

Department of Knowledge Engineering – Maastricht University MCTS – Solver: Backpropagation MCTS-Solver (Winands et al. 2008) For terminal positions assign the values ∞ (Win) or -∞ (Loss) ‏ The internal nodes are assigned in a minimax way

Department of Knowledge Engineering – Maastricht University Simulation Strategies MCTS does not require a positional evaluation function, but is based on the results of simulations Not good enough for LOA ‏ How to use a positional evaluation function? Four different strategies –Evaluation Cut-off –Corrective –Greedy –Mixed

Department of Knowledge Engineering – Maastricht University Evaluation function: (MIA 2006) ‏ Concentration Centralisation Centre-of-mass position Quads Mobility Walls Connectedness Uniformity Variance

Department of Knowledge Engineering – Maastricht University Evaluation Cut-Off Stops a simulated game before a terminal state is reached if, according to a heuristic knowledge, the game is judged to be effectively over Starting at second ply in the play-out, every three ply the evaluation function is called –If the value exceeds a certain threshold, a win is scored (700) ‏ –If the value is below a certain threshold, a loss is scored (-700) ‏ Function can return a quite trustworthy score, more so than even elaborate simulation strategies The game can thus be safely terminated both earlier and with a more accurate score than if continuing the simulation

Department of Knowledge Engineering – Maastricht University Corrective Minimizing the risk of choosing an obviously bad move First, we evaluate the position for which we are choosing a move Next, we generate the moves and scan them to get their weights If the move leads to a successor which has a lower evaluation score than its parent, we set the weight of a move to a preset minimum value (close to zero) ‏ If a move leads to a win, it will be immediately played

Department of Knowledge Engineering – Maastricht University Greedy The move leading to the position with the highest evaluation score is selected We evaluate only moves that have a good potential for being the best The k-best moves according to their move category weights are fully evaluated (k = 7) ‏ When a move leads to a position with an evaluation over a preset threshold, the play-out is stopped and scored as a win The remaining moves — which are not heuristically evaluated — are checked for a mate Evaluation function has small a random factor

Department of Knowledge Engineering – Maastricht University Mixed Greedy strategy could be too deterministic The Mixed strategy combines the Corrective strategy and the Greedy strategy The Corrective strategy is used in the selection step, i.e., at tree nodes where a simulation strategy is needed (i.e., n < T ) as well as in the first position entered in the play-out step For the remainder of the play-out the Greedy strategy is applied

Department of Knowledge Engineering – Maastricht University Experiments: MIA Test the strength against a non-MC program: MIA Won the 8 th, 9 th and 11 th Computer Olympiad αβ depth-first iterative-deepening search in the PVS frame work Enhancements: Transposition Table, ETC, killer moves, relative history heuristic etc. Variable depth: Multi-cut, (adaptive) null move, Enhanced realization probability search, quiescence search

Department of Knowledge Engineering – Maastricht University Parameter Tuning: Evaluation Cut-off Tune the bounds of Evaluation Cut-off Strategy 3 programs: No-Bound, MIA III and MIA game results, 1 sec per move

Department of Knowledge Engineering – Maastricht University Round-Robin Tournament Results 1000 Games 1 sec per move

Department of Knowledge Engineering – Maastricht University MC-LOA vs MIA Mixed Strategy 5 sec a move Root-Parallelization

Department of Knowledge Engineering – Maastricht University MC-LOA vs MIA No parallel version of MIA Two-threaded MC-LOA program competed against MIA, where MIA was given 50% more time Simulating a search-efficiency increase of 50% if MIA were to be given two processors A 1,000 game match resulted in a 52% winning percentage for MC-LOA

Department of Knowledge Engineering – Maastricht University Conclusions (1) ‏ Applying an evaluation function to stop simulations, increases the playing strength Mixed strategy of playing greedily in the play-out phase, but exploring more in the earlier selection phase, works the best

Department of Knowledge Engineering – Maastricht University Conclusions (2) ‏ MCTS using simple root-parallelization outperforms MIA MCTS programs are competitive with αβ-based programs in the game LOA