Solving Probabilistic Combinatorial Games Ling Zhao & Martin Mueller University of Alberta September 7, 2005 Paper link:

Slides:



Advertisements
Similar presentations
Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
For Friday Finish chapter 5 Program 1, Milestone 1 due.
Games & Adversarial Search
Games and adversarial search
For Monday Read chapter 7, sections 1-4 Homework: –Chapter 4, exercise 1 –Chapter 5, exercise 9.
On the Genetic Evolution of a Perfect Tic-Tac-Toe Strategy
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.
1 CSC 550: Introduction to Artificial Intelligence Fall 2008 search in game playing  zero-sum games  game trees, minimax principle  alpha-beta pruning.
Adversarial Search: Game Playing Reading: Chapter next time.
Artificial Intelligence for Games Game playing Patrick Olivier
Artificial Intelligence in Game Design Heuristics and Other Ideas in Board Games.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Artificial Intelligence in Game Design
State Space 4 Chapter 4 Adversarial Games. Two Flavors Games of Perfect Information ◦Each player knows everything that can be known ◦Chess, Othello Games.
Mahgul Gulzai Moomal Umer Rabail Hafeez
Search for Transitive Connections Ling Zhao University of Alberta October 27, 2003 Author: T. Cazenave and B. Helmstetter published in JCIS'03.
Progressive Strategies For Monte-Carlo Tree Search Presenter: Ling Zhao University of Alberta November 5, 2007 Authors: G.M.J.B. Chaslot, M.H.M. Winands,
Maximizing the Chance of Winning in Searching Go Game Trees Presenter: Ling Zhao March 16, 2005 Author: Keh-Hsun Chen Accepted by Information Sciences.
Combining Tactical Search and Monte-Carlo in the Game of Go Presenter: Ling Zhao University of Alberta November 1, 2005 by Tristan Cazenave & Bernard Helmstetter.
Game Playing CSC361 AI CSC361: Game Playing.
Games and adversarial search
Strategies Based On Threats Ling Zhao University of Alberta March 10, 2003 Comparative evaluation of strategies based on the values of direct threats by.
How computers play games with you CS161, Spring ‘03 Nathan Sturtevant.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2006.
1 Solving Ponnuki-Go on Small Board Paper: Solving Ponnuki-Go on small board Authors: Erik van der Werf, Jos Uiterwijk, Jaap van den Herik Presented by:
Adversarial Search: Game Playing Reading: Chess paper.
Games & Adversarial Search Chapter 6 Section 1 – 4.
SlugGo: A Computer Baduk Program Presenter: Ling Zhao April 4, 2006 by David G Doshay, Charlie McDowell.
Ocober 10, 2012Introduction to Artificial Intelligence Lecture 9: Machine Evolution 1 The Alpha-Beta Procedure Example: max.
Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining.
CSC 412: AI Adversarial Search
CISC 235: Topic 6 Game Trees.
Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.
Upper Confidence Trees for Game AI Chahine Koleejan.
Computer Go : A Go player Rohit Gurjar CS365 Project Proposal, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Traditional game playing 2 player adversarial (win => lose) based on search but... huge game trees can't be fully explored.
1 Near-Optimal Play in a Social Learning Game Ryan Carr, Eric Raboin, Austin Parker, and Dana Nau Department of Computer Science, University of Maryland.
Temperature Discovery Search Temperature Discovery Search (TDS) is a new minimaxbased game tree search method designed to compute or approximate the temperature.
Instructor: Vincent Conitzer
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
For Wednesday Read chapter 7, sections 1-4 Homework: –Chapter 6, exercise 1.
Amazons Experiments in Computer Amazons, Martin Mueller and Theodore Tegos, 2002 Exhaustive Search in the Game Amazons Raymond Georg Snatzke, 2002 Presented.
Game Playing. Introduction One of the earliest areas in artificial intelligence is game playing. Two-person zero-sum game. Games for which the state space.
INTELLIGENT SYSTEM FOR PLAYING TAROK
CSCI 4310 Lecture 6: Adversarial Tree Search. Book Winston Chapter 6.
CSE473 Winter /04/98 State-Space Search Administrative –Next topic: Planning. Reading, Chapter 7, skip 7.3 through 7.5 –Office hours/review after.
Adversarial Games. Two Flavors  Perfect Information –everything that can be known is known –Chess, Othello  Imperfect Information –Player’s have each.
Pedagogical Possibilities for the 2048 Puzzle Game Todd W. Neller.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
February 25, 2016Introduction to Artificial Intelligence Lecture 10: Two-Player Games II 1 The Alpha-Beta Procedure Can we estimate the efficiency benefit.
Artificial Intelligence in Game Design Board Games and the MinMax Algorithm.
Luca Weibel Honors Track: Competitive Programming & Problem Solving Partisan game theory.
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
Artificial Intelligence AIMA §5: Adversarial Search
Stochastic tree search and stochastic games
4. Games and adversarial search
AlphaGo with Deep RL Alpha GO.
Spatial Online Sampling and Aggregation
Kevin Mason Michael Suggs
Probabilistic Latent Preference Analysis
Minimax strategies, alpha beta pruning
Guess a random word with n errors
Games & Adversarial Search
PN, PN2 and PN* in Lines of Action
Minimax strategies, alpha beta pruning
Presentation transcript:

Solving Probabilistic Combinatorial Games Ling Zhao & Martin Mueller University of Alberta September 7, 2005 Paper link:

Motivations Ken Chen’s previous work to maximize winning chance in Go. Maximize points Vs. maximize winning probability How to solve the abstract game efficiently or play the abstract game well?

Motivations Results (black plays first): +15 (80%), -7 (20%)

Combinatorial Games

A Terminal node which is expressed as a probability distribution (d={[p 1, v 1 ], [p 2, v 2 ], …, [p n, v n ] }) is a PCG. If A 1, A 2, … A n, B 1, B 2, …, B n are PCGs, then { A 1, A 2, … A n | B 1, B 2, …, B n } is a PCG. A sum of PCGs is a PCG. Left options Probabilistic Combinatorial Games Right options A move in a sum game consists of a move to an option in exactly one subgame and leaves all other subgames unchanged.

Simple PCG (SPCG) Each PCG has exactly one left option and one right option. Each option leads immediately to a terminal node. Each distribution has exactly 2 values with associate probabilities.

Problems to Address How to solve PCGs efficiently? How to play PCGs well if resources are limited or fast play is required?

Game Tree Analysis Very regular game tree: a node at depth k has exactly n-k children, so n!/(n-k)! nodes in total at depth k. Very large number of transpositions: C(n, k) * C(k, k/2) distinct nodes at depth k.

Terminal Node Evaluation Terminal node is a sum of probability distributions. Winning probability

Monte-Carlo Terminal Evaluation (MCTE) Use Monte-Carlo methods to randomly collect k samples from 2 n data points in the sum of n distributions. Use the average winning percentage of samples (P’ w ) to approximate the overall winning probability (P w ). Theory from statistics: P w - P’ w is a normal distribution, with mean=0 and std dev = <= Experimental results:

Evaluation of a node is approximated by averaging the values of terminal nodes reached from it through random play. Proposed by Abramson in Using 4x4 tic-tac-toe and 6x6 Othello for experiments. Applied to Monte-Carlo Go by Bouzy and Helmstetter and several other researchers. Monte-Carlo Interior Evaluation (MCIE)

SPCG Solver and Player Solver: alpha-beta search, transposition tables, move ordering (MCIE & MCTE). Player: alpha-beta search to a certain depth and use Monte-Carlo interior evaluation for frontier nodes.

Experimental Results 100 randomly generated games, and each game has 14 subgames. Value distribution: probability from 0 to 1, value from –1000 to AMD 2400MHz CPUs 2 20 cache entries (terminal nodes have higher priority) About 8 seconds to solve a game.

Solver Performance Monte-Carlo move ordering: d m – depth limit for Monte-Carlo move ordering being used, otherwise history heuristic is used. Monte-Carlo interior evaluation: n t – percentage of all the current node’s descendant terminal nodes sampled. Monte-Carlo terminal evaluation: n c – number of data points sampled. Accurate terminal evaluation occupies 90% of the overall running time.

Solver Performance: Results

Monte-Carlo Player Test against the perfect player. Each game has two rounds: each side plays first once. Winning probability is the average of the two rounds. Parameters: search depth and n c.

Monte-Carlo Player: Results

Error in Move Ordering Average probability error: Move: A B Actual win prob: 0.18 < 0.19 Estimate: 0.32 > 0.30 Win prob lost: 0.01 Average probability error is the average of the winning probability lost in all move pairs of a node. Worst probability error: the probability lost due to the wrong best move chosen.

Results

Conclusions Efficient exact and heuristic solvers for SPCG. Successfully incorporate Monte-Carlo move ordering into alpha-beta search to SPCG. A heuristic evaluation techniques based on Monte-Carlo with performance close to the prefect player. Extensive experiments for the two solvers.

Future Work Better algorithms to accurately evaluate terminal nodes? Progressive pruning. Why does the simple Abramson ’ s Expected Outcome model perform so well in move ordering?

New Directions to Apply Monte-Carlo to Computer Go + Monte-Carlo Go: New Direction:

Future Work: Transformation PCG solver or player ??