Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.

Slides:



Advertisements
Similar presentations
Adversarial Search Chapter 6 Sections 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Advertisements

Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Development of the Best Tsume-Go Solver
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
How to Win a Chinese Chess Game Reinforcement Learning Cheng, Wen Ju.
Artificial Intelligence Adversarial search Fall 2008 professor: Luigi Ceccaroni.
CS 484 – Artificial Intelligence
Adversarial Search Chapter 5.
Adversarial Search: Game Playing Reading: Chapter next time.
Application of Artificial intelligence to Chess Playing Capstone Design Project 2004 Jason Cook Bitboards  Bitboards are 64 bit unsigned integers, with.
Adversarial Search CSE 473 University of Washington.
Adversarial Search Chapter 6.
Adversarial Search 對抗搜尋. Outline  Optimal decisions  α-β pruning  Imperfect, real-time decisions.
An Introduction to Artificial Intelligence Lecture VI: Adversarial Search (Games) Ramin Halavati In which we examine problems.
1 Adversarial Search Chapter 6 Section 1 – 4 The Master vs Machine: A Video.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Lecture 13 Last time: Games, minimax, alpha-beta Today: Finish off games, summary.
Progressive Strategies For Monte-Carlo Tree Search Presenter: Ling Zhao University of Alberta November 5, 2007 Authors: G.M.J.B. Chaslot, M.H.M. Winands,
Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.
Maximizing the Chance of Winning in Searching Go Game Trees Presenter: Ling Zhao March 16, 2005 Author: Keh-Hsun Chen Accepted by Information Sciences.
Combining Tactical Search and Monte-Carlo in the Game of Go Presenter: Ling Zhao University of Alberta November 1, 2005 by Tristan Cazenave & Bernard Helmstetter.
Learning Shape in Computer Go David Silver. A brief introduction to Go Black and white take turns to place down stones Once played, a stone cannot move.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6: Adversarial Search Fall 2008 Marco Valtorta.
Inside HARUKA Written by Ryuichi Kawa Surveyed by Akihiro Kishimto.
Adversarial Search: Game Playing Reading: Chess paper.
double AlphaBeta(state, depth, alpha, beta) begin if depth
Games & Adversarial Search Chapter 6 Section 1 – 4.
SlugGo: A Computer Baduk Program Presenter: Ling Zhao April 4, 2006 by David G Doshay, Charlie McDowell.
Wei Qi, Baduk, Go a game of strategy
Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining.
Go An ancient Oriental board game Andrew Simons. Introduction 2 player game of skill. Popular in the Far East, growing in the West. Simple rules, extremely.
Parallel Monte-Carlo Tree Search with Simulation Servers H IDEKI K ATO †‡ and I KUO T AKEUCHI † † The University of Tokyo ‡ Fixstars Corporation November.
Game Playing.
Upper Confidence Trees for Game AI Chahine Koleejan.
Computer Go : A Go player Rohit Gurjar CS365 Project Proposal, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
Development of a Machine-Learning-Based AI For Go By Justin Park.
 Summary  How to Play Go  Project Details  Demo  Results  Conclusions.
Adversarial Search Chapter 6 Section 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 6 –Adversarial Search Thursday –AIMA, Ch. 6 –More Adversarial Search The “Luke.
Instructor: Vincent Conitzer
INTELLIGENT SYSTEM FOR PLAYING TAROK
GAME PLAYING 1. There were two reasons that games appeared to be a good domain in which to explore machine intelligence: 1.They provide a structured task.
Neural Network Implementation of Poker AI
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
Adversarial Search Chapter Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time limits.
RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.
Parallelization in Computer Board Games Ian Princivalli.
Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007.
Pedagogical Possibilities for the 2048 Puzzle Game Todd W. Neller.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Adversarial Search Chapter 6 Section 1 – 4. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time.
A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine.
RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.
Applications of Machine Learning to the Game of Go David Stern Applied Games Group Microsoft Research Cambridge (working with Thore Graepel, Ralf Herbrich,
An AI Game Project. Background Fivel is a unique hybrid of a NxM game and a sliding puzzle. The goals in making this project were: Create an original.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 5 Adversarial Search (Thanks Meinolf Sellman!)
AI: AlphaGo European champion : Fan Hui A feat previously thought to be at least a decade away!!!
ConvNets for Image Classification
Adversarial Search Chapter 5 Sections 1 – 4. AI & Expert Systems© Dr. Khalid Kaabneh, AAU Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
ADVERSARIAL SEARCH Chapter 6 Section 1 – 4. OUTLINE Optimal decisions α-β pruning Imperfect, real-time decisions.
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
1 Chapter 6 Game Playing. 2 Chapter 6 Contents l Game Trees l Assumptions l Static evaluation functions l Searching game trees l Minimax l Bounded lookahead.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Artificial Intelligence AIMA §5: Adversarial Search
Stochastic tree search and stochastic games
AlphaGO from Google DeepMind in 2016, beat human grandmasters
Adversarial Search.
Game Playing Fifth Lecture 2019/4/11.
Adversarial Search Chapter 6 Section 1 – 4.
Presentation transcript:

Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University of Tokyo (*2) Future University Hakodate Adapted from the slides presented at AAAI 2006

Games in AI Ideal test bed for AI research Ideal test bed for AI research –Clear results –Clear motivation –Good challenge Success in search-based approach Success in search-based approach –chess (1997, Deep Blue) –and others Not successful in the game of Go Not successful in the game of Go –Go is to Chess as Poetry is to Double-entry accounting –It goes to the core of artificial intelligence, which involves the study of learning and decision-making, strategic thinking, knowledge representation, pattern recognition and, perhaps most intriguingly, intuition

The game of Go An 4,000 years old board game from China An 4,000 years old board game from China Standard size 19×19 Standard size 19×19 Two players, Black and White, place the stones in turns Two players, Black and White, place the stones in turns Stones can not be moved, but can be captured and taken off Stones can not be moved, but can be captured and taken off Larger territory wins Larger territory wins

Terminology of Go Block - connected stones of the same color Liberty - adjacent empty intersection Captured - when no liberty available Eye - surrounded region providing one or more safe liberties

Playing Strength $1.2M was set for beating a professional with no handicap (expired!!!) Handtalk in 1997 claimed $7,700 for winning an 11-stone handicap match against a 8-9 years old master

Difficulties in Computer Go Large search space Large search space –the game becomes progressively more complex, at least for the first 100 ply ChessGo Board size 8×8 19×19 Depth~80~300 Branching factor Search space

Difficulties in Computer Go Lack of good evaluation function Lack of good evaluation function –a material advantage does not mean a simple way to victory, and may just mean that short- term gain has been given priority –legal moves around 150 – 250, usually <50 acceptable (even <10), but computers have a hard time distinguishing them. Very high degree of pattern recognition involved in human capacity to play well. Very high degree of pattern recognition involved in human capacity to play well.

Why Monte Carlo Go? Success in other domains Success in other domains Bridge [Ginsberg:1999], Poker [Billings et al.:2002] Reasonable position evaluation based on sampling Reasonable position evaluation based on sampling search space from O(b d ) to O(Nbd) Easy to parallelize Easy to parallelize Can win against search-based approach Can win against search-based approach –Crazy Stone won the 11th Computer Olympiad in 9x9 Go –MoGo 19 th, 20 th KGS 9x9 winner, rated highest on CGOS Replace evaluation function by random sampling Brugmann:1993, Bouzy:2003

Basic idea of Monte Carlo Go Generate next moves by 1-ply search Generate next moves by 1-ply search Play a number of random games and compute the expected score Play a number of random games and compute the expected score Choose the move with the maximal score Choose the move with the maximal score The only domain-dependent information is eye. The only domain-dependent information is eye.

Terminal Position of Go Larger territory wins Territory = surrounded area + stones ▲ Black’s territory is 36 points × White’s territory is 45 points White wins by 9 points

Example Play many sample games –Each player plays randomly Compute average points for each move Select the move that has the highest average 9 points win for black 5 points win for black move A: (5 + 9) / 2 = 7 points Play rest of the game randomly

Monte Carlo Go and Sample Size Can reduce statistical errors with additional samples Can reduce statistical errors with additional samples Relationships between sample size and strength are not yet investigated Relationships between sample size and strength are not yet investigated – –Sampling error ~ –N: # of random games Diminishing returns must appear Diminishing returns must appear Monte Carlo with 1000 sample games Monte Carlo with 100 sample games Stronger than

Our Monte Carlo Go Implementation basic Monte Carlo Go basic Monte Carlo Go atari-50 enhancement: atari-50 enhancement: Utilization of simple go knowledge in move selection progressive pruning [Bouzy 2003]: progressive pruning [Bouzy 2003]: statistical move pruning in simulations

Atari-50 Enhancement Basic Monte Carlo: assign uniform probability for each move in sample game (no eye filling) Basic Monte Carlo: assign uniform probability for each move in sample game (no eye filling) Atari-50: higher probability for capture moves Atari-50: higher probability for capture moves –Capture is “ mostly ” a good move –50% Move A captures black stones

Progressive Pruning [Bouzy2003] Try sampling with smaller sample size Try sampling with smaller sample size Prune statistically inferior moves Prune statistically inferior moves score move Can assign more sample games to promising moves

Experimental Design Machine Machine –Intel Xeon Dual CPU at 2.40 GHz with 2 GB memory –Use 64 PCs (128 processors) connected by 1GB/s network Three versions of programs Three versions of programs –BASIC: Basic Monte Carlo Go –ATARI: BASIC + Atari-50 enhancement –ATARIPP: ATARI + Progressive Pruning Experiments Experiments –200 self-play games –Analysis of decision quality from 58 professional games

Diminishing Returns 4*N samples vs N samples for each move

Additional enhancements and Winning Percentage

Decision Quality of Each Move Evaluation score of “Oracle” (64 million sample games) Selected move for 100 sample game Monte Carlo Go Average error of one move is ((30 – 30) * 9 + ( ) * 1) / 10 = 1.5 points abc b -> 9 times 2c -> 1 times

Decision Quality of Each Move (Basic)

Decision Quality of Each Move (with Atari50 Enhancement)

Summary of Experimental Results Additional enhancements improve strength of Monte Carlo Go Additional enhancements improve strength of Monte Carlo Go Diminish returns eventually Diminish returns eventually Additional enhancements get quicker diminishing returns Additional enhancements get quicker diminishing returns Need to collect more samples in the early stage game of 9x9 Go Need to collect more samples in the early stage game of 9x9 Go

Conclusions and Future Work Conclusions Conclusions –Additional samples achieve only small improvements Not like search algorithm, e.g. chess Not like search algorithm, e.g. chess –Good at strategy, not tactics blunder due to lack of domain knowledge blunder due to lack of domain knowledge –Easy to evaluate –Easy to parallelize –The way for Monte Carlo Go to go Small sample games with many enhancements will be promising Future Work Future Work –Adjust probability with pattern matching –Learning –Search + Monte Carlo Go MoGo (exploration-exploitation in the search tree using UCT) MoGo (exploration-exploitation in the search tree using UCT) –Scale to 19×19

Reference: Go wiki Go wiki Gnu Go Gnu Go KGS Go Server KGS Go Server CGOS 9x9 Computer Go Server CGOS 9x9 Computer Go Server Questions?