Download presentation
Presentation is loading. Please wait.
Published byDebra Phillips Modified over 9 years ago
1
Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007
2
Outline Part I: Computer GO, a flavor taste Part II: Learn to Predict Move
3
Part I Computer GO a flavor taste
4
The Game of GO 19 by 19 grid Two players (black and white) Place stone at intersections of lines Object: maximize surrounding territories, or control a larger part of the board than your opponent
5
Why GO is special? Very simple rule Global winning NO two games the same Handicap system Current best program: handtalk, unable to beat experienced amateur GameComputer- Human CheckersChinook > H Othello Logistello > H Chess Deep Blue >= H GoHandtalk << H
6
Complexity results Polynomial-space hard [Lichtenstein & Sipser 80] Exponential-time complete [Robson 83]
7
Major approaches Tree Search based (minimax, alpha-beta) –Handtalk, GO++, GNU GO Monte-Carlo methods –Select best from random play Learning based –Neural network [Enzenberger 96] –SVM –Graphical model [Stern et al 06]
8
Search in Computer GO Tree search –Pattern matching –Heuristics, expert rules –Local search –Early stop –Alpha-beta pruning (which is successful for chess) –High-level abstract strategies
9
Major challenges Too many possible moves (361) How to evaluate a move (subtlety) Implicit control vs explicit control Connectivity viewpoint Local state vs global state
11
Local vs Global
13
Part II Learn to Predict Move
14
Move Prediction in the Learning Setting Given: –a database of professional games Goal: –learn the distribution of a move given the current board state –rank the moves Assumption: –Experts always make best moves
15
Learning features State explosion with full board –Full board state: configuration (c) –2 361 possibilities Local State (t): –Local pattern (within a region of size 64) –Plus 8 extra features on liveness (situation)
16
Local pattern region
17
Local Liveness Features Liberties of new chain: 1, 2, 3, >3 Liberties of opponent: 1, 2, 3, >3 Is there an active Ko? Is new chain captured immediately? Distance to board edge: 5
18
Move Distribtuion Model Move v, given configuration c – – u as a prior value of pattern, Normal(μ, σ) –Latent value of move, x|u ~ Normal(u, β) –Pick the move with the largest latent value: Learning posterior by sum-product algorithm
19
Results Real data: –181,000 game records –600 million patterns (with prune) 34% of expert moves ranked first –86% in top 20 –can be used to score or rank during search
20
Test in the real game Opening: rather good Weaker in later stages –missing pattern details –global state needed
21
Some ideas for discussion Iterative Search and Learning –Learning for move ranking/prediction –Use ranking to score in search tree –Search result as new data for learning Learn local region/global strategy –learn abstract strategy (e.g. fight, defense) –Group move sequence together?
22
Other questions Can computer learn from non-expert human? Can computer learn to play by playing with self?
23
References Graepel et al, Learning on graphs in the game of go, 2001 Stern et al, Modeling uncertainty in the game of go, 2004 Stern et al, Bayesian pattern ranking for move prediction in the game of go, 2006 Bouzy et al: Computer Go: an AI Oriented Survey, 2001
24
Thanks!
25
Combinatorial game theory
26
Some key features Live and death Number of eyes Liberty
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.