AI techniques for the game of Go Erik van der Werf Universiteit Maastricht / ReSound Algorithm R&D.

Slides:



Advertisements
Similar presentations
Learning in Computer Go David Silver. The Problem Large state space  Approximately states  Game tree of about nodes  Branching factor.
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.
Artificial Intelligence for Games Game playing Patrick Olivier
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
The Move Decision Strategy of Indigo Author: Bruno Bouzy Presented by: Ling Zhao University of Alberta March 7, 2007.
Progressive Strategies For Monte-Carlo Tree Search Presenter: Ling Zhao University of Alberta November 5, 2007 Authors: G.M.J.B. Chaslot, M.H.M. Winands,
Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.
Solving Probabilistic Combinatorial Games Ling Zhao & Martin Mueller University of Alberta September 7, 2005 Paper link:
Learning Shape in Computer Go David Silver. A brief introduction to Go Black and white take turns to place down stones Once played, a stone cannot move.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 An Improved Safety Solver for Computer Go Presented by: Xiaozhen Niu Date: 2004/02/24.
Inside HARUKA Written by Ryuichi Kawa Surveyed by Akihiro Kishimto.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2006.
1 Game Playing Chapter 6 (supplement) Various deterministic board games Additional references for the slides: Luger’s AI book (2005). Robert Wilensky’s.
1 Solving Ponnuki-Go on Small Board Paper: Solving Ponnuki-Go on small board Authors: Erik van der Werf, Jos Uiterwijk, Jaap van den Herik Presented by:
A Heuristic Search Algorithm for Capturing Problems in Go Authors: Keh-Hsun Chen and Peigang Zhang Presenter: Ling Zhao August 8, 2006.
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
Othello Sean Farrell June 29, Othello Two-player game played on 8x8 board All pieces have one white side and one black side Initial board setup.
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
Linear Discriminant Functions Chapter 5 (Duda et al.)
1 Recognizing safe territories Presented by: Xiaozhen Niu Date: 2003/09/22.
1 An Open Boundary Safety-of- Territory Solver for the Game of Go Author: Xiaozhen Niu, Martin Mueller Dept of Computing Science University of Alberta.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
1 An Efficient Algorithm for Eyespace Classification in Go Author: Peter Drake, Niku Schreiner Brett Tomlin, Loring Veenstra Presented by: Xiaozhen Niu.
SlugGo: A Computer Baduk Program Presenter: Ling Zhao April 4, 2006 by David G Doshay, Charlie McDowell.
Radial Basis Function Networks
Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679
Summarized by Soo-Jin Kim
This week: overview on pattern recognition (related to machine learning)
Chapter 9 Neural Network.
Computer Go : A Go player Rohit Gurjar CS365 Project Proposal, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
 Summary  How to Play Go  Project Details  Demo  Results  Conclusions.
Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan Ciresan.
Connect Four AI Robert Burns and Brett Crawford. Connect Four  A board with at least six rows and seven columns  Two players: one with red discs and.
Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Senior Project Poster Day 2007, CIS Dept. University of Pennsylvania Reversi Meng Tran Faculty Advisor: Dr. Barry Silverman Strategies: l Corners t Corners.
Cilk Pousse James Process CS534. Overview Introduction to Pousse Searching Evaluation Function Move Ordering Conclusion.
CSE 185 Introduction to Computer Vision Face Recognition.
Neural Network Implementation of Poker AI
Parallelization in Computer Board Games Ian Princivalli.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007.
Solving Tsumego on Computers M2 Hirokazu Ishii Chikayama & Taura Lab.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine.
1 Evaluation Function for Computer Go. 2 Game Objective Surrounding most area on the boardSurrounding most area on the board.
ConvNets for Image Classification
Teaching Computers to Think:
Conflict Resolution of Chinese Chess Endgame Knowledge Base Bo-Nian Chen, Pangfang Liu, Shun-Chin Hsu, Tsan-sheng Hsu.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Game Playing Why do AI researchers study game playing?
Adversarial Search and Game-Playing
LECTURE 10: DISCRIMINANT ANALYSIS
Counter propagation network (CPN) (§ 5.3)
C nnect 4 Group 9-18 See Zhuo Rui Jorelle 3S3 (Leader)
Chapter 6 : Game Search 게임 탐색 (Adversarial Search)
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Principal Component Analysis
Instructor: Vincent Conitzer
The Alpha-Beta Procedure
Feature space tansformation methods
Generally Discriminant Analysis
Support Vector Machines
LECTURE 09: DISCRIMINANT ANALYSIS
Announcements Project 2 artifacts Project 3 due Thursday night
Feature Selection Methods
Presentation transcript:

AI techniques for the game of Go Erik van der Werf Universiteit Maastricht / ReSound Algorithm R&D

Contents Introduction Searching techniques The Capture Game Solving Go on Small Boards Learning techniques Move Prediction Learning to Score Predicting Life & Death Estimating Potential Territory Summary of results Conclusions

The game of Go Deceivingly simple rules Black and White move in turns A move places a stone on the board Surrounded stones are captured Direct repetition is forbidden (Ko-rule) The game is over when both players pass The player controlling most intersections wins

Some basic terminology Block- connected stones of one colour (no diagonal connections) Liberty- adjacent empty intersection Eye- surrounded region providing a safe liberty Group- stones of one colour controlling a local region Alive- group that cannot be captured Dead- group that can eventually be captured

Computer Go Even the best Go programs have no chance against strong amateurs Human players superior in area’s such as pattern recognition spatial reasoning Learning

Playing strength 29 stones handicap

Problem statement How can Artificial Intelligence techniques be used to improve the strength of Go programs? We focused on Searching techniques & Learning techniques

Searching techniques Very successful for other board games Evaluate positions by ‘thinking ahead’ Research Recognizing positions ‘that are irrelevant’ Fast heuristic evaluations Provably correct knowledge Move ordering (the best moves first) Re-use of partial results from the search process

The Capture Game Simplified version of Go First to capture a stone wins the game Passing not allowed  Detecting final positions trivial (unlike normal Go) Search method Iterative Deepening Principal Variation Search Enhanced transposition table Move ordering using shared tables for both colours for killer and history heuristic

Heuristic evaluation for the capture game Based on four principles: 1.Maximize liberties 2.Maximize territory 3.Connect stones 4.Make eyes Low order liberties (max. distance 3) Euler number (objects – holes) Fast computation using a bit-board representation

Solutions for the Capture Game All boards up to 5x5 were solved Winner decided by board-size parity Will initiative take over at 6 x 6? BoardWinnerDepthTime (s)Nodes (log 10 ) 2  2 W  3 B  4 W  5 B  6 ?>23>10 6 >12 Solution for 5  5 (Black wins) Solution for 4  4 (White wins)

Solutions for the Capture Game on 6x6 Starting positionStableCrosscut WinnerBlack Depth26 (+5)15 (+4) Nodes (log 10 )118.0 Time (s) 8.3  10 5 (10 days) 185 Initiative takes over at 6  6

Solving Go on Small Boards Iterative Deepening Principal Variation Search Enhanced transposition table Exploit board symmetry Internal unconditional bounds Effective move ordering Evaluation function Heuristic component Similar to the capture game Provably correct component Benson’s algorithm for recognizing unconditional life extended with detection of unconditional territory

Recognizing Unconditional Territory 1. Find regions surrounded by unconditionally alive stones of one colour 2. Find interior of the regions (eyespace) 3. Remove false eyes 4. Contract eyespace around defender stones 5. Count maximum sure liberties (MSL) MSL<2  Unconditionally territory. Otherwise  Play it out.

Solutions for Small Boards BoardResultDepthTime (s)Nodes (log 10 ) 2  2 draw5n.a  3 B+911n.a  4 B (s)5.8 5  5 B (h)9.2 Value of opening moves on 5x5 (3,2)(2,2)(3,3)

Learning techniques Successful in several related domains Heuristic knowledge can be ‘learned’ from analysis of human games Research Representation & Generalization Learn maximally from limited number of examples Pros and cons of different architectures Clever use of available domain knowledge

Move prediction Many moves in Go conform to local patterns which can be played almost reflexively Train a MLP network to rank moves Use move-pairs {expert, random} extracted from human game records Training attempts to rank expert moves first

Move Prediction - Representation Selection of raw features: Edge Liberties Captures Last move Stones Ko Liberties after Nearest stones Remove symmetry by canonical ordering & colour reversal High-dimensional representation suffers from curse of dimensionality => Apply linear feature extraction to reduce dimensionality

Move Prediction - Feature Extraction Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) Move-Pair Analysis (MPA) Linear projection maximizing the expected quadratic distance between pairs Weakness: ignores global features Modified Eigenspace Separation Transform (MEST) Linear projection on eigenvectors with largest absolute eigenvalues of the correlation difference matrix Good results using combination of MEST & MPA Standard techniques, sub-optimal for ranking

Human & Computer Performance Compared Game 1 Game 2 Game 3 Average 3 dan dan kyu MP* kyu n.a kyu kyu kyu kyu kyu Black must choose between two red intersections

Performance on professional 19×19 games RankingPerf. First25 % Top-345 % Top-2080 % moves Cumulative performance (%)

Learning to Score Using archives of (online) Go servers, such as NNGS, for ML is non-trivial because of : 1. Missing information: Only a single numeric result is given. The status of individual board-points is not available. 2. Unfinished games: Humans resign early or do not even finish the game at all 3. Bad moves To overcome 1&2, we need reliable final scores Large dataset created: 18k labeled final 9x9 positions Several tricks were used to identify dubious scores A few thousand positions scored/verified manually

The scoring method 1. Classify life & death for all blocks 2. Remove dead blocks 3. Mark empty intersections using flood-fills or distance to nearest remaining colour 4. (Optional) recursively update representation to take adjacent block status into account; return to 1

Blocks to Classify For final positions there are 3 types of blocks: 1.Alive (O): at border of own territory 2.Dead (X): inside the opponents territory 3.Irrelevant (?):removal does not change area score  We only train on blocks of type 1 and 2 !

Representation of the blocks Direct features of the block Size Perimeter Adjacent opponent stones 1 st, 2 nd, 3 rd - order liberties Protected liberties Auto-atari liberties Adjacent opponent blocks Local majority (MD < 3) Centre of mass Bounding box size Adjacent fully accessible CERs Number of regions Size Perimeter Split points Adjacent partially accessible CERs Number of partially accessible regions Accessible size Accessible perimeter Inaccessible size Inaccessible perimeter Inaccessible split points Disputed territory Direct liberties of the block in disputed territory Liberties of all friendly blocks in disputed territory Liberties of all enemy blocks in disputed territory Directly adjacent eyespace Size Perimeter Optimistic chain Number of blocks Size Perimeter Split points Adjacent CERs Adjacent CERs with eyespace Adjacent CERs, fully accessible from at least 1 block Size of adjacent eyespace Perimeter of adjacent eyespace External opponent liberties Opponent blocks (3x) (1) Weakest directly adjacent opponent block (weakest = block with the fewest liberties) (2) 2 nd weakest directly adjacent opponent block (3) Weakest opponent block adjacent or sharing liberties with the block’s optimistic chain Perimeter Liberties Shared liberties Split points Perimeter of adjacent eyespace Recursive features Predicted value of strongest adjacent friendly block Predicted value of weakest adjacent opponent block Predicted value of second weakest adjacent opponent block Average predicted value of weakest opponent block’s optimistic chain Adjacent eyespace size of the weakest opponent block’s optimistic chain Adjacent eyespace perimeter of the weakest opponent block’s optimistic chain

Scoring Performance Blocks (direct/recursive classification) Training Size (blocks) Direct error (%) 2-step error (%) 3-step error (%) 4-step error (%) 1, , , Full board (4-step recursive classification) Incorrect score: 1.1% = better than the average rated NNGS player (~7 kyu) Incorrect winner: 0.5% = comparable to the average NNGS player Average absolute score difference: 0.15 points

Life & Death during the game Predict whether blocks of stones can be captured Perfect predictions not possible in non-final positions!  Approximate the a posteriori probability that a block will be alive at the end of the game 4 Block types First 3 types identified from final position (as before) 4 th type: blocks captured during the game -> dead Irrelevant blocks not used during training! Representation extended with 5 additional features Player to move, Ko, Distance to ko, Nr. of black/white stones on the board Black blocks 50% alive

Performance over the game MLP, 25 hidden units, 175,000 training examples Average prediction error: 11.7%

Estimating Potential Territory Why estimate territory? 1. For predicting the score (potential territory) Main purpose: to build an evaluation function May also be used to adjust strategy (e.g., play safe when ahead) 2. To detect safe regions (secure territory) Main purpose: forward pruning (risky unless provably correct) Our main focus is on (1) potential territory We investigate: Direct methods, known or derived from literature ML methods, trained on game records Enhancements with (heuristic) knowledge of L&D

Direct methods 1. Explicit control 2. Direct control 3. Distance-based control 4. Influence based control (~ numerical dilations) 5. Bouzy’s method (numerical dilations + erosions) 6. Combinations 5+3 or 5+4 Enhancements use knowledge of Life & Death to remove dead stones (or reverse their colour)

ML methods Simple representation Intersections in ROI: Colour {+1 black, -1 white, 0 empty} Enhanced representation Intersections in ROI: Colour x Prob.(Alive) Edge Colour of nearest stone Colour of nearest living stone Prob.(Alive) obtained from pre-trained MLP predicted colour +1sure black 0neutral - 1sure white features

Performance at various levels of confidence

Predicting the winner (percentage correct)

Predicting the score (absolute error)

Summary: Searching Techniques The capture game Simplified Go rules (who captures the first stone wins) boards up to 6x6 solved Go on small boards Normal Go rules First program in the world to have solved 5x5 Go Perfect solutions up to ~30 intersections Heuristic knowledge required for larger boards

Summary: Learning Techniques 1 Move prediction Very good results (strong kyu level) Strong play is possible with limited selection of moves Scoring final positions Excellent classification Reliable training data

Summary: Learning Techniques 2 Predicting life and death Good results Most important ingredient for accurate evaluation of positions during the game Estimating potential territory Comparison of non-learning and learning methods Best results with learning methods

Conclusions Knowledge is the most important ingredient to improve Go programs Searching techniques Provably correct knowledge sufficient for solving small problems up to ~30 intersections Heuristic knowledge essential for larger problems Learning techniques Heuristic knowledge learned quite well from games Learned heuristic knowledge at least at the level of reasonably strong kyu players

Questions? ? More information at: