A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine.

Slides:



Advertisements
Similar presentations
Fuzzy Reasoning in Computer Go Opening Stage Strategy P.Lekhavat and C.J.Hinde.
Advertisements

Game Playing CS 63 Chapter 6
Learning in Computer Go David Silver. The Problem Large state space  Approximately states  Game tree of about nodes  Branching factor.
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
IGB GO —— A self-learning GO program Lin WU Information & Computer Science University of California, Irvine.
February 7, 2006AI: Chapter 6: Adversarial Search1 Artificial Intelligence Chapter 6: Adversarial Search Michael Scherger Department of Computer Science.
Games & Adversarial Search
Adversarial Search Chapter 5.
Adversarial Search CSE 473 University of Washington.
Artificial Intelligence for Games Game playing Patrick Olivier
An Introduction to Artificial Intelligence Lecture VI: Adversarial Search (Games) Ramin Halavati In which we examine problems.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
The Move Decision Strategy of Indigo Author: Bruno Bouzy Presented by: Ling Zhao University of Alberta March 7, 2007.
Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.
Learning Shape in Computer Go David Silver. A brief introduction to Go Black and white take turns to place down stones Once played, a stone cannot move.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Evolution and Coevolution of Artificial Neural Networks playing Go Thesis by Peter Maier, Salzburg, April 2004 Additional paper used Computer Go, by Martin.
RULES Each player begins the game with twelve normal pieces (either white or black). The pieces are automatically set in their proper positions. The object.
Games Search Neil Heffernan Some of these slides are screen shots from the the slides my professor at CMU (Andrew Moore) used. (Sorry for the low resolution)
1 Game Playing Chapter 6 (supplement) Various deterministic board games Additional references for the slides: Luger’s AI book (2005). Robert Wilensky’s.
Adversarial Search: Game Playing Reading: Chess paper.
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
Othello Sean Farrell June 29, Othello Two-player game played on 8x8 board All pieces have one white side and one black side Initial board setup.
Games & Adversarial Search Chapter 6 Section 1 – 4.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining.
CSC 412: AI Adversarial Search
Go An ancient Oriental board game Andrew Simons. Introduction 2 player game of skill. Popular in the Far East, growing in the West. Simple rules, extremely.
Introduction Many decision making problems in real life
Artificial Intelligence in Games CA107 Topics in Computing Dr. David Sinclair School of Computer Applications
Computer Go : A Go player Rohit Gurjar CS365 Project Proposal, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
Development of a Machine-Learning-Based AI For Go By Justin Park.
Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.
Othello Playing AI Matt Smith. Othello 8x8 Board game 8x8 Board game Try to outflank opponents pieces Try to outflank opponents pieces Winner ends up.
Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
Memory and Analogy in Game-Playing Agents Jonathan Rubin & Ian Watson University of Auckland Game AI Group
Games 1 Alpha-Beta Example [-∞, +∞] Range of possible values Do DF-search until first leaf.
Machine Learning for an Artificial Intelligence Playing Tic-Tac-Toe Computer Systems Lab 2005 By Rachel Miller.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
DEEP RED An Intelligent Approach to Chinese Checkers.
SAL: A Game Learning Machine Joel Paulson & Brian Lanners.
Parallelization in Computer Board Games Ian Princivalli.
Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007.
Applications of Machine Learning to the Game of Go David Stern Applied Games Group Microsoft Research Cambridge (working with Thore Graepel, Ralf Herbrich,
1 Evaluation Function for Computer Go. 2 Game Objective Surrounding most area on the boardSurrounding most area on the board.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
ConvNets for Image Classification
Supervise Learning Introduction. What is Learning Problem Learning = Improving with experience at some task – Improve over task T, – With respect to performance.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Game Playing Why do AI researchers study game playing?
Adversarial Environments/ Game Playing
Stochastic tree search and stochastic games
D1 Miwa Makoto Chikayama & Taura Lab
Mastering the game of Go with deep neural network and tree search
Artificial Intelligence and Society
Deep reinforcement learning
Intelligent Information System Lab
Adversarial Search Chapter 5.
Games & Adversarial Search
Training Neural networks to play checkers
Games & Adversarial Search
Games & Adversarial Search
Future of Artificial Intelligence
Games & Adversarial Search
Games & Adversarial Search
Bidirectional LSTM-CRF Models for Sequence Tagging
Artificial Intelligence Machine Learning
Presentation transcript:

A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine

Contents Introduction on Go Existing approaches Our approach Results Conclusion & Future work

What is Go?

Black & white play alternatively Stones with zero liberty will be removed The one who has more territory wins

Why is Go interested? Go is a hard game for computer. –The best Go computer programs are easily defeated by an average human amateur Board games have expert-level programs –Chess: Deep blue (1997) & F RITZ (2002) –Checker: Chinook (1994) –Othello (Reversi): Logistello (2002) –Backgammon: TD-G AMMON (1992)

Why is Go interested for AI? Poses unique opportunities and challenges for AI and machine learning –Hard to build high quality evaluation function –Big branching factor, , compared with for chess

Existing approaches Hard-coded programs Evaluate the next move by playing large number of random games Use machine learning to learn the evaluation functions

Existing approaches ── hard-coded programs Hand-tailored pattern libraries Hard-coded rules to choose among multiple hits Tactical search (or reading) E.g. “Many Faces of Go”, “GnuGo”

Existing approaches ── hard-coded programs Pros: –Good performance Cons: –Intensive manual work –Pattern library is not complete –Hard to manage and improve

Existing approaches ── Random games Play huge number of random games from given position Use the results of games to evaluate all the legal moves Choose the legal move with best evaluation E.g: Gobble, Go81

Existing approaches ── Random games Pros –Easy to implement –Reasonable performance Cons –Small boards only, cannot scale to normal board

Existing approaches ── Machine learning Schraudolph et al., 1994 –TD0 –Neural Network Graepel et al., 2001 –Condensed graph by common fate property –SVM Stern, Graepel, and MacKay, 2005 –Conditional Markov random field

Existing approaches ── Machine learning Pros: –Learn automatically Cons: –Poor performance

Out approach Use scalable algorithms to learn high quality evaluation functions automatically Imitate human evaluating process

Our approach ── Human evaluating process Three key components –The understanding of patterns –The ability to combine patterns –The ability to relate strategic rewards to tactical ones

Our approach ── System components 3x3 pattern library –Learn tactical patterns automatically A structure-rich Recursive Neural Network –Propagate interaction between patterns –Learn the correlation between strategic rewards (Targets) and tactical reward (Inputs)

Our approach ── RNN architecture Six planes –One input plane –One output plane –Four Hidden Planes

Our approach ── Update sequence

Our approach ── Provide relevant inputs For intersections –Intersection type: black, white, or empty –Influence: influence from the same & opposite color –Pattern stability: a statistical value calculated from 3x3 patterns For groups –Number of eyes –Number of 1 st, 2 nd, 3 rd, and 4 th order liberties –Number of liberties of the 1 st and 2 nd weakest opponents

Our approach ── Pattern stability (I) 9x9 board is split into 10 unique locations for 3x3 patterns with mirror and rotation symmetries considered Stability is measured for each intersection of each pattern within each unique location.

Our approach ── Pattern stability (II) Ten unique pattern locations

Our approach ── Pattern stability (III)

Our approach ── Pattern stability results (I)

Our approach ── Pattern stability results (II)

Results ── Validation error

Results ── Results on move predictions

Results ── Matched move (I)

Results ── Matched move (II)

Conclusion & Future work