Machine Learning for Go Jung-Yun Lo Dept. of computer science and information engineering National Dong Hwa University
Outline A survey of the application of machine learning to the game of Go A learning architecture for the game of Go
A survey Some possible directions of research Global approaches Learning in search Learning in the endgame Learning in the opening The representation language
Global approaches Learning a function from a board position and a move to a reward On large boards, probably a more specific method should be used for different subproblems of the game
Learning in search temperature (high → deeper search) candidate move ordering temperature (low → stop search) Leaf node static evaluation
Learning in the endgame Each local endgame positions is evaluated, then the whole game is considered as a sum of games Decomposition search
Learning in the opening Hard to quantify Using joseki Depend on the surrounding situation Learning a global rules for opening moves
The representation language Difficult to express more high-level concepts such as liberty, atari, ladder and eye Making the representation language more expressive
The representation language block( BlockID, Color, Size, LibertyCount) board( X, Y, GroupID) adjacent( BlockID1, BlockID2)
The representation language
The representation language The Common Fate Graph (Enzenberger, 1996)
The representation language projection projection
The representation language Loss of information in the CFG
Conclusions Learning result are promising, but the whole field is nearly unexplored and much opportunities to do research
A learning architecture for the game of Go Combinatorial Game Theory The HUGO Architecture Three Components of HUGO Choice of Subgames Initiative Engine Computing Game Value
Combinatorial game theory G = {F|O} F : the set of options that player Friend can reach with one legal move O : for player Opponent F can be 2 possible value W : win for Friend L : loss for Friend
Combinatorial game theory 4 possible outcomes for a combinatorial game : WW, WL, LL, and LW WW : won by Friend, irrespective of who moves first WL : an unsettled game, won by the player who moves first LL : lost for Friend even if Friend moves first LW : the player who moves first will loss the game
The HUGO architecture Can be applied to any 2-player, deterministic, full information, partizan, combination game
3 components of HUGO Choice of subgames Initiative engine Select a collection of well-defined subgames Ensure a high discriminative abilities Initiative engine Find the move that yields the most points Prefer holding initiative Computing game values Compute the game-theoretic value of a particular game
Future work Study more reference about machine learning
Reference A Survey of The Application of Machine Learning to The Game of Go Jan Ramon, Hendrik Blockeel / Katholieke Univ. Leuven A Learning Architecture for The Game of Go A.B. Meijer, H. Koppelaar / Delft Univ. of tech. Computer Go and Machine Learning Thore Graepal http://bbs2.xilubbs.com/cgi-bin/bbs/view?forum=godknows&message=105