Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007.

Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007

Outline Part I: Computer GO, a flavor taste Part II: Learn to Predict Move

Part I Computer GO a flavor taste

The Game of GO 19 by 19 grid Two players (black and white) Place stone at intersections of lines Object: maximize surrounding territories, or control a larger part of the board than your opponent

Why GO is special? Very simple rule Global winning NO two games the same Handicap system Current best program: handtalk, unable to beat experienced amateur GameComputer- Human CheckersChinook > H Othello Logistello > H Chess Deep Blue >= H GoHandtalk << H

Complexity results Polynomial-space hard [Lichtenstein & Sipser 80] Exponential-time complete [Robson 83]

Major approaches Tree Search based (minimax, alpha-beta) –Handtalk, GO++, GNU GO Monte-Carlo methods –Select best from random play Learning based –Neural network [Enzenberger 96] –SVM –Graphical model [Stern et al 06]

Search in Computer GO Tree search –Pattern matching –Heuristics, expert rules –Local search –Early stop –Alpha-beta pruning (which is successful for chess) –High-level abstract strategies

Major challenges Too many possible moves (361) How to evaluate a move (subtlety) Implicit control vs explicit control Connectivity viewpoint Local state vs global state

Local vs Global

Part II Learn to Predict Move

Move Prediction in the Learning Setting Given: –a database of professional games Goal: –learn the distribution of a move given the current board state –rank the moves Assumption: –Experts always make best moves

Learning features State explosion with full board –Full board state: configuration (c) –2 361 possibilities Local State (t): –Local pattern (within a region of size 64) –Plus 8 extra features on liveness (situation)

Local pattern region

Local Liveness Features Liberties of new chain: 1, 2, 3, >3 Liberties of opponent: 1, 2, 3, >3 Is there an active Ko? Is new chain captured immediately? Distance to board edge: 5

Move Distribtuion Model Move v, given configuration c – – u as a prior value of pattern, Normal(μ, σ) –Latent value of move, x|u ~ Normal(u, β) –Pick the move with the largest latent value: Learning posterior by sum-product algorithm

Results Real data: –181,000 game records –600 million patterns (with prune) 34% of expert moves ranked first –86% in top 20 –can be used to score or rank during search

Test in the real game Opening: rather good Weaker in later stages –missing pattern details –global state needed

Some ideas for discussion Iterative Search and Learning –Learning for move ranking/prediction –Use ranking to score in search tree –Search result as new data for learning Learn local region/global strategy –learn abstract strategy (e.g. fight, defense) –Group move sequence together?

Other questions Can computer learn from non-expert human? Can computer learn to play by playing with self?

References Graepel et al, Learning on graphs in the game of go, 2001 Stern et al, Modeling uncertainty in the game of go, 2004 Stern et al, Bayesian pattern ranking for move prediction in the game of go, 2006 Bouzy et al: Computer Go: an AI Oriented Survey, 2001

Thanks!

Combinatorial game theory

Some key features Live and death Number of eyes Liberty

Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007.

Similar presentations

Presentation on theme: "Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007.

Similar presentations

Presentation on theme: "Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007."— Presentation transcript:

Similar presentations

About project

Feedback