Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 of 81 Evolution and Coevolution of ANNs playing Go Peter Mayer, 2004.

Similar presentations


Presentation on theme: "1 of 81 Evolution and Coevolution of ANNs playing Go Peter Mayer, 2004."— Presentation transcript:

1 1 of 81 Evolution and Coevolution of ANNs playing Go Peter Mayer, 2004

2 (81) 2 Outline  Computers and Games  The game of Go  Experimental Setup  Training of Go playing ANNs  Evolution of Go Playing ANNs  Summary and Outlook

3 (81) 3 Games  Algorithms designed since AIs onset  Clearly defined rules  Still complex  Chess received the most attention  More researched than Go  Two main approaches  Rely on expertise – directly programmed weighted features; Extensive knowledge  Use evolution – less knowledge; more versatility

4 (81) 4 The game of Go  Oldest (unaltered) strategic board game in the world  10,000,000 players in Japan “alone”  Fairly simple rules  BUT difficult to master  Immense tree (~200 opts)  Complex structures  Many concurrent goals

5 (81) 5 Go Rules  19x19 board  Empty in the beginning  Black & White “stones”  Black starts  Each turn  Place 1 stone  At an intersection  Never move stones  OR pass

6 (81) 6 Go Rules [2]  Objective - Get the most points !  Points are acquired by:  Securing Territories  Capturing opp’s pieces

7 (81) 7 Go Rules [3]  Stones at a vertically or horizontally adjacent intersection are called a group  An empty intersection adjacent to a stone or group is called a "liberty" of that groupliberty  1 Liberty = group in “atari”  No liberties -> CAPTURE ! Group is removed  Example – Black places stone in X resulting in right figure

8 (81) 8 Go Rules [4]  Stones can be placed anywhere, but cannot commit suicide (except Chinese)  Legal if stone simultaneously captures opponent’s group (2 right figures) Suicide – white cannot place at X White CAN place at X Result: capture

9 (81) 9 Go Rules [5]  Same position cannot occur more than once  Endless repetitions:  Black can capture at upper figure by placing at X  White - same by placing at Y  Black – repeat…  Ko rule  White may not place at Y before playing somewhere else first  Avoid any repetitions

10 (81) 10 Go Rules – Live and Dead groups  “Dead” groups if impossible to prevent capture  It is not necessary to do so  Group remains on board  At end of game, removed and added to captured stones  “Living” groups are impossible to capture  Group with 2 “eyes” – even if white surrounds it, playing at X or Y is suicide  Opponent must play elsewhere

11 (81) 11 Go Basics – End game  Play continues until both players pass  Players then alternatively play stones at “neutral” points – adjacent to both White and Black  Also known as “dame” (DAH-MAY)  Dead stones are removed from the board and counted with other prisoners (1 point per prisoner)  Also - 1 point for each intersection surrounded by player’s stones (“territory”)

12 (81) 12 Go Basics – End game example  Prisoners were removed already  All 4 points marked X are dame – worthless  Black has  7 points in UR (territory); 2 points in LL  1 removed prisoner  TOTAL = 10 points  White has  5 in UL; 2 in LR  2 prisoners  TOTAL = 9 points  Black wins unless komi (5.5 pts compensation) is due

13 (81) 13 Ranking and Handicaps  Determine Go players’ strength  Resemblance to martial arts  Both amateur and professional ranking system  Amateur  35 kyu to 1 kyu  THEN 1 dan to 7 dan  Pro  1 dan to 9 dan  Awarded only by Go institutions  Pro dans are much stronger than amateur dans

14 (81) 14 Ranking and Handicaps (2)  Handicaps  Weaker player starts with several stones on the board  Placed at specific places  Helps make games more even  Difference in ranks ~ number of handicap stones needed to win  2 stones to even 2 dan against 4 dan  4 to even 3 kyu and 2 dan  The most powerful Go programs reach only …  … 10 kyu!

15 (81) 15 Outline  Computers and Games  The game of Go  Experimental Setup  Training of Go playing ANNs  Evolution of Go Playing ANNs  Summary and Outlook

16 (81) 16 Experimental Setup  Opponent Go players  ANN player  Go board (input) representations  Move (output) representations  Coevolution  Hall of Fame coevolution  Cultural coevolution  General evolution setup

17 (81) 17 Go Players - Random  No strategy  Pass move also  “Knows” only the rules of go  legality of moves  Usually weakest opponent

18 (81) 18 Go Players – Naïve Player  Roughly human-beginner level  Able to save and capture stones  Knows about  Lost stones  Saving - connecting stones to living groups  Weak stones (not savable)

19 (81) 19 Go Players – Naïve Strategy  A subset of JaGo’s (main opponent) strategy  Outline (arranged by priority):  Attempt to save  Try to put opponent into atari  Connect weak stones  Capture opponent groups in atari  Check intersections for placing stones  In random order  Make sure no (own) liberties decrease below 2 as a result  Perform Random move

20 (81) 20 Go Players – JaGo Player  Java based program  Best computer player used  Not a strong player ~16 kyu  Knows standard techniques  Mainly save & capture  Uses pattern matching  Looks at entire board  32 patterns, with rotations and mirrors

21 (81) 21 Go Players – JaGo Strategy (1)  Save stones in atari  Try to decrease liberties of large groups  Find own savable larger groups  Attack opponent’s groups (decreasing order:)  With 2 or more liberties and attackable  With 2 or more stones & less than 3 liberties  With 2 or less liberties

22 (81) 22 Go Players – JaGo Strategy (2)  Save own groups with few liberties if savable  Start pattern matching – Response; Center  Random move order  Seek opponent’s groups to capture in 2 moves  Perform random move which isn’t of a bad pattern  Capture opponent’s single liberties  Connect own weak stones  PASS

23 (81) 23 Go Players – JaGo Patterns (1)

24 (81) 24 Go Players – JaGo Patterns (2)

25 (81) 25 Go Players – GNU Go  Advantages  5x5 to 19x19 boards  Handles handicaps well  Rated 10 kyu  Problems  5x5 solved – open an C3 for 18.5 points (komi=5.5) – always wins in Black  GNU Go passes on B3, C2-4, D3 (only correct at C3)  Premature convergence of evolution

26 (81) 26 ANN Player  Inform ANN about actual position  Evaluate ANN output to receive next move  Representation is important!  Intention maps  For each Go move (including PASS) – value between [0,1]  High value – high intention to make move (and v.v)  Select legal move with highest value To avoid predictability – consider sub optimal moves also (“creativity factor”)

27 (81) 27 Player Strength  Commonly to receive a rating unrated Go players play against rated players (same in Chess)  The strength s of a player is determined by  The score of 1000 double games  Against each of 3 opponents: R, N, JaGo  Divided by the number of games (6,000)  1 is perfect strength  3 opponents help resist over-fitting

28 (81) 28 Player Competence  Strength is not understanding of rules (legality)  E.g. 2 players receive same score but only one always tried legal moves first  The competence C of a player is defined as follows:  b i = games; i = moves; t ij = #tried illegal moves; k ij = #possible illegal move  C is the averaged on all games

29 (81) 29 Board Representations  19x19 boards  far too large  Even for evolved agents  Use only 5x5 boards

30 (81) 30 Board Representations  Should preprocess position to make ANNs life easier  Tested in training experiments  Standard Input Representation (SIR)  2 neurons at each intersection :-  1 per player’s piece; 1 per opponent’s  No distinction between B and W stones  Optional – 1 neuron to tell if B or W  (2*b^2) neurons (were b is board size) = 50

31 (81) 31 Representations - NIR  Naïve Input Representation  More compact  1 neuron per intersection  Set to -1 (player’s stone) or 1 (opponent’s)  0 if empty  Uses half of SIRs neurons = 25

32 (81) 32 Representations - LVIR  Limited View Input Representation  Splits the Go board into several quadratic areas of size 3x3  Idea – simplest way of capturing stones works within this area  E.g. capture of 1 stone by surrounding it  Areas overlap at middle row and middle column  Coding – similar to SIR  w is number of areas (=4)  72 Neurons  Could also be Naïve

33 (81) 33 Clever Representations  Based on image processing and circuits  We want less raw inputs to allow ANN to concentrate more on features  Manhattan distance  Used in integrated circuits where wires run parallel to X or Y axis  Got its name from Manhattan NY, where streets are aligned in grid  P1 = (x1, x2)  P2 = (y1, y2)

34 (81) 34 Clever Representations  Manhattan distance is related to distance of Go stones (no diagonals)  distance = [#(separating stones) – 1]  1 if next to each other  2 if separated by one stone  3 for knight’s move or two separating stones

35 (81) 35 Representations: c-o-Matrix  Co-occurrence-matrices  Used in image processing  Many parameters are derived from it  Mean, Sd, energy, contrast, homogeneity, …  Quadratic  Based on a relation p between image positions (symmetric if p is)

36 (81) 36 Representations: c-o-Matrix  Elements C [i][j] =  Number of times pixels occur in an image of a specified value (color)  In the relation specified by p  Relative to other pixels  Size is number of different colors

37 (81) 37 Representations: c-o-Matrix  An actual go board is an “image” with 3 different colors (including empty)  Example  p1: Manhattan distance of 1 between 2 points  First matrix row:  B near B 16 times  B near W 3 times  B near empty 11 times

38 (81) 38 Representations: c-o-Matrix  Does not say much about absolute positions – must combine  SIR and C for whole board  NIR and C for whole board  NIR and Cs for 3x3 areas  sLVIR and Cs for 3x3 areas  NLVIR and Cs for 3x3 areas

39 (81) 39 Output Representations  Only 2  Standard Output Representation (SOR)  Each intersection is represented by 1 neuron  1 for PASS  (b^2 + 1) neurons

40 (81) 40 Output Representations  Row Column Output Representation (RCOR)  Used to decrease ANN size  5 neurons for columns; 5 for rows  1 for PASS  (2b + 1) neurons  Intention more complicated:  PASS intention is square of relevant neuron  RCOR Limits intention map:  v1>v2  y1>y2  v4>v3  All values positive, non-zero

41 (81) 41 Coevolution  Derives non-static fitness, as in nature  1 or more populations; interacting  Competitive [battle] vs. Cooperative [subtasks]  Advantages  “Who needs enemies when you got friends like these?” – saves finding opponents; Especially in Go where no strong program exists  Variety in fitness – adaptive opponents  No upper bound for improvement

42 (81) 42 Coevolution Methods Applied  Based on work by Lubberts & Mikkulainen [2001]  Hall of Fame  Host population and Master population  Maintaining the ability of host population to beat opponents of previous generations  Each generation, the best individual is added to HoF  All population competes against sample of the HoF

43 (81) 43 Coevolution - HoF  Applied in this resaearch  HoF initially filled without competition  Individuals get their fitness by competing against the masters  When full - host with highest win rates (against masters) joins HoF  Replace first Master to lose all games  Coevolutionary progress cannot be directly seen  Both populations constantly changeing

44 (81) 44 Cultural Coevolution  A new approach!  Maintains “culture” of masters resembling HoF  To enter culture, host must defeat all masters  Masters never replaced – unlimited culture size  Every individual receives a fitness score by competing against all masters  Culture growth rate decreases rapidly  Every new master is the strongest found (yet)

45 (81) 45 Cultural Coevolution [2]  Numerous advantages  Maintains ability to defeat weak players  Keeps good solutions found  Same player cannot enter twice  Needs to defeat itself  Culture’s performance never decreases  Avoid focusing on a specific player’s weakness  As soon as any master is immune, the hosts have to find another way  More masters  less likely to remember all weaknesses

46 (81) 46 General Evolution Setup  Opponents – Random; Naïve; JaGo  Fitness = strength  Rate of wins against all 3 opponents  6,000 games of both colors  Not using scores, only win rates  Defeating more opponents is better  Generalized Multi-Layer Perceptrons (GMLPs)  All non-loop connections are permitted  Evolving  Hidden neurons; connections; weights; bias (for non- input)

47 (81) 47 General Evolution Setup [2]  2 binary Chromosomes used  1 for connections : 0-no 1-yes  1 for hidden neurons (if 0, no connections also)  Number of possible connections:  n i, n h, n o – number of input, hidden and output neurons  Determines size of chromosome  Real-Chromosome  Weights & Bias values (seen as weights)  Size is number of connections + number of bias vals (for non-input)

48 (81) 48 General Evolution Setup [3]  Tournament selection (size 2)  2 point crossover  Binary mutation  Flip bits with 1/L probability  Real-Chromosome Mutation  multiple-σSA  Each object maintains altering “strategy” params which alter distribution of “object” params  Normal distributions used for both

49 (81) 49 Setup – Recurrent Nets  Difficult to learn Go without structured input  Experiments with recurrent nets included  Allow loops for input Ns  Naturally represent adjacent board intersections  No hidden Ns  Played against JaGo  Typically output changes without input change due to feedback loops  Computed output only once!  Only 2 directly connected Ns influence each other  Evolutions should connect only close Ns

50 (81) 50 Outline  Computers and Games  The game of Go  Experimental Setup  Training of Go playing ANNs  Evolution of Go Playing ANNs  Summary and Outlook

51 (81) 51 Training ANNs – Setup  Testing IRs mentioned previously  No Go-specific knowledge used  Each experiment was repeated 20 times  Nets, same as Richards [1998]  3 layers; Fully connected; Feed forward  Linear activation for input Ns; Sigmoid for rest  50 input; 26 output; 100 hidden - 7600 connections  Patterns:  JaGo vs Jago; 5x5 board;  Rprop – resilient variant of Backprop

52 (81) 52 Training ANNs – Experiment 1  Determine number of training cycles  Too few cycles  Weights not adjusted properly  Too many  over-fitting  Determine training pattern set  Limit the level a Go player can reach  Should include all 3 game stages  Both expert and novice moves  JaGo vs JaGo  All game stages  No distinction between winner and loser moves  1,000.. 5,000 Cycles; 50/100/200 Games

53 (81) 53 Training ANNs – Results 1  Average of 20 runs  100&200 games better than 50  3000\5000 games don’t add strength  Best – 200 games; 2000 cycles  Used hereafter

54 (81) 54 Training ANNs – Experiment 2  Determine number of hidden Ns  Many  Diverse features  Few  Few stronger features (perhaps better 1s)  Less time-consuming  100 Ns yielded best results  selected

55 (81) 55 Training ANNs – Experiment 3  Output representations  Standard (SOR) vs Row-Column (RCOR)  200 patterns; 2000 games; 100 hidden Ns  Similar strength; RCOR competence slightly lower  RCOR still expansive and adds constraints  SOR is used in the following experiments

56 (81) 56 Training ANNs – IR Experiments  Various input representations  Used reference-ANN (RANN)  SIR & SOR; 100 hidden; 7,600 connections  Strength = 0.2908; Competence = 0.8467  2,000 games; 200 cycles  NIR (half input size) & SOR  Strength = 0.2093; Competence = 0.8031  Naïve input makes it difficult to learn Go  LVIR (3x3 windows) & SOR  Strength = 0.2755; Competence = 0.8258  Slightly lower; LVIR doesn’t add input difficulty

57 (81) 57 Training ANNs – IRs [2]  Whole Co-occur-matrix (dist=1,2,3); SIR&SOR  Found better Strength & Competence!  Knight’s-Move matrix adds relevant information  Whole matrix (dist=1,2,3); NIR&SOR  21% less connections due to NIR  Better than standard NIR, but still low

58 (81) 58 Training ANNs – IRs [3]  3x3 matrices (dist=1,2,3) ; NIR&SOR  Low but ~20% better than previous (whole matrix) NIR  3x3 matrices (dist=1,2,3) ; LVIR\NLVIR  Both matrices and board views use 3x3 windows  No improvement; Huge number of Ns not necessary

59 (81) 59 Training ANNs – IRs Summary

60 (81) 60 Training ANNs – IRs Summary  Trained ANNS better against JaGo compared to Naïve  Although JaGo is better  Some over-fitting for good players  Against Naïve outputs close to zero – no repsonse  NIR ANNs generally weaker than SIR  Manhattan distance of 2 good against Random  IR + whole matrix (dist=2) was strongest  RANN is still best; Selected for evolution

61 (81) 61 Outline  Computers and Games  The game of Go  Experimental Setup  Training of Go playing ANNs  Evolution of Go Playing ANNs  Summary and Outlook

62 (81) 62 Evolving Go ANNs  Setup of Evolution experiments  Evolution of ANNs against Computer Players  Random Player; Naïve; JaGo  Recurrent against JaGo  Coevolution  Cultural  Hall of Fame  Training Evolved ANNs

63 (81) 63 Evolution Setup  5x5 boards; Komi of 5.5  50 Individuals  Described previously (3 chromosomes)  GMLPs with SIR and SOR  Max 3,010 connections  Recurrent ANNs  Using NIR (25 Ns) and SOR (26)  Max 2,601 connections  Same strength measure as training (6k games)

64 (81) 64 Evolution Against Random  Empirically 64 games to determine fitness  Best ANN evolved {Str=0.4005; Comp=0.48}  After 47 gens; 929 connections  Evolved ANNs hardly reacted to different positions  Always in the middle; Never in corners – creates eyes  Unnecessary to “think” against Random  Occasionally Random places at strategic intersection and then usually wins  Only 3 of 20 best ANNs open at optimal C3

65 (81) 65 Evolution Against Naive  Better player; ANNs develop better strategies  Same setting  200 gens for ALL population to win ½ of games – fast learning  Best {Str=0.69; Comp=0.487} after 2915 gens  High strength and only 10 hidden !!  Win rates  Same against Naïve and Random  Low against JaGo (~0.2)  25% use optimal opening move (still low)  Exploit Naïve’s weaknesses at endgames

66 (81) 66 Evolution Against JaGo  Far stronger than Naïve (85% wins)  Takes significantly more time for each move  Used distributed computing  64 games would take 32 hours per run  Only 32 games for fitness - empirically sufficient  Best {Str=0.772; Comp=0.476} after 1909 gens  Scores 100% wins  1k gens to score 0.4;  In 4 runs 100% wins in 3k gens!!!  Sd twice as large – harder for evolution  Weak against Naïve ~0.4;Strong against Random

67 (81) 67 Evolution Against JaGo  Again, low competence ~0.5  Evolved strategies  Still connecting stones but faster (responsive)  Tenuki (abandon & play elsewhere) to distract JaGo  9 open optimally; All in 3x3 area around center  Strength depends heavily on opening move  Mid games sometimes show standard Go sequences!  Take advantage of JaGo’s weakness – capturing weak stones

68 (81) 68 Recurrent Nets Evolution  Natural representation on Go board  Input are connected  More time consuming  Only 2 runs; 32 games; setting described previously  100% win rate within 1k generations!!!  Both nets open at C3  Strategies  1 aggressive;1 distractive  Protect; Create living groups; Bad Endgames  Very high relative strength  0.94 Random; 0.49 Naïve (never played before)

69 (81) 69 Cultural Coevolution  Until now much over-fitting was observed  Fitness  8 games against all masters (4 each color)  Few because games are quite similar  Results of typical run – host population  3,500 gens  90% wins at 500 gens  Stagnation around 1k  Last master added at 462  After 2k Mean fitness decreases

70 (81) 70 Cultural Coevolution [2]  Masters  21 ANNs  After number 8 all have R>0.8  Last obtained Strength of 0.365  Strategy (both populations)  Many random move selection  Due to many saturated Ns (output=1)  Games usually similar but multiple random moves are hard to defeat  May be cause by mutation (Multiple-Self Adaption)

71 (81) 71 Cultural Coevolution [3]  Strategy (cont.)  Coevolution found easy solution  Computer players are very difficult to beat with saturated neurons  New extremely long experiment (60k gens!) was performed with different mutation (single-SA)  Similar results, Except:  Now most culture growth until gen 10k (last at 40k)  Now less saturated Neurons  Less fitness decrease despite increasing culture Strength

72 (81) 72 Cultural Coevolution [4]  Culture Summary  80 members  After #16 Random>0.94  After #29 all opened optimally  After #57 all Strength>0.4  Wins against JaGo ~0.5 Naïve  ~15 hidden Ns – fluctuate between successive

73 (81) 73 Recurrent & Cultural  10k gens  Faster learning but basically same results  R>0.9 at C11 (compared to C14)  N>0.2 at 14 (compared to C37)  Strategy  Still bad against JaGo  Bad openings! (only 2% optimal)  Only last 5 masters close to center  Learned not to capture dead groups

74 (81) 74 Hall of Fame Coevolution  Compared to Cultural  Parameters  Important parameter is HoF size={1,2,4,8,16}  Eight games against each master  3k gens were coevolved  After coevolution all HoF ANNs were evaluated  Every 100 gens the best ANN was evaluated

75 (81) 75 Hall of Fame Coevolution [2]  Results – HoF size 1  Masters – low strength of 0.3625  In gen 1k – one ANN had 0.4  Lost solution  HoF changed every generation  cycles  Results – HoF size 16  Master 5 – highest strength of 0.4403 in gen 400  Strength of 0.5057 was obtained and lost  One master was replaced in every generation!  Somehow weak masters remained in the HoF  Host population stagnates (cycles)

76 (81) 76 Hall of Fame Coevolution [3]  Strategies  All place first stone at D4!  HoF coevolution does not encourage diversity among ANNs

77 (81) 77 Training Evolved ANNs  Evolution against JaGo –  Strength ~0.77  4-16 hidden Ns  Training  Strength ~0.3  100 hidden Ns  Check whether evolved structure is good  Train after evolution  Train without evolution only using structure

78 (81) 78 Training Evolved ANNs [2]  Used best 2 evolved ANNs against JaGo  Taken from runs 11 & 17  ANN11 – 10 hidden; 1178 connections  ANN17 – 14 hidden; 1162 connections  Trained with 200 games; 2,000 cycles  Experiment 1 (post-evolution) Results  Bad!  Strength of 0.11 and 0.10 –  Lower than any trained ANN (RANN has 0.29)  High competence 0.89

79 (81) 79 Training Evolved ANNs [3]  Experiment 2 – keep only evolved structure  Strength below 0.152 (RANN is 0.29)  Weakest against JaGo (0.05) although trained with JaGo  Against Naïve 0.11 (same as RANN)  Evolutions creates efficient structures  Few hidden Ns  Difficult to learn with training  High competence due to they seldom responded with same move to different positions

80 (81) 80 Summary  Training could not achieve high Go playing skills  Evolved ANNs specialized in the opponent which was used during evolution  Cultural coevolution generated strong players  Strength increasing throughout the process  Perhaps an ANN stronger than amateurs can be coevolved  Recurrent nets learned faster

81 (81) 81 Summary [2]  2 coevolved (recurrent and feed-forward) won the grand tournament  Coevolution proved better than evolution for developing Go strategies  Recurrent ANNs would provide a field for further research  More natural board representation  Could contain a fixed input layer representing the board


Download ppt "1 of 81 Evolution and Coevolution of ANNs playing Go Peter Mayer, 2004."

Similar presentations


Ads by Google