Download presentation
Presentation is loading. Please wait.
1
1 of 81 Evolution and Coevolution of ANNs playing Go Peter Mayer, 2004
2
(81) 2 Outline Computers and Games The game of Go Experimental Setup Training of Go playing ANNs Evolution of Go Playing ANNs Summary and Outlook
3
(81) 3 Games Algorithms designed since AIs onset Clearly defined rules Still complex Chess received the most attention More researched than Go Two main approaches Rely on expertise – directly programmed weighted features; Extensive knowledge Use evolution – less knowledge; more versatility
4
(81) 4 The game of Go Oldest (unaltered) strategic board game in the world 10,000,000 players in Japan “alone” Fairly simple rules BUT difficult to master Immense tree (~200 opts) Complex structures Many concurrent goals
5
(81) 5 Go Rules 19x19 board Empty in the beginning Black & White “stones” Black starts Each turn Place 1 stone At an intersection Never move stones OR pass
6
(81) 6 Go Rules [2] Objective - Get the most points ! Points are acquired by: Securing Territories Capturing opp’s pieces
7
(81) 7 Go Rules [3] Stones at a vertically or horizontally adjacent intersection are called a group An empty intersection adjacent to a stone or group is called a "liberty" of that groupliberty 1 Liberty = group in “atari” No liberties -> CAPTURE ! Group is removed Example – Black places stone in X resulting in right figure
8
(81) 8 Go Rules [4] Stones can be placed anywhere, but cannot commit suicide (except Chinese) Legal if stone simultaneously captures opponent’s group (2 right figures) Suicide – white cannot place at X White CAN place at X Result: capture
9
(81) 9 Go Rules [5] Same position cannot occur more than once Endless repetitions: Black can capture at upper figure by placing at X White - same by placing at Y Black – repeat… Ko rule White may not place at Y before playing somewhere else first Avoid any repetitions
10
(81) 10 Go Rules – Live and Dead groups “Dead” groups if impossible to prevent capture It is not necessary to do so Group remains on board At end of game, removed and added to captured stones “Living” groups are impossible to capture Group with 2 “eyes” – even if white surrounds it, playing at X or Y is suicide Opponent must play elsewhere
11
(81) 11 Go Basics – End game Play continues until both players pass Players then alternatively play stones at “neutral” points – adjacent to both White and Black Also known as “dame” (DAH-MAY) Dead stones are removed from the board and counted with other prisoners (1 point per prisoner) Also - 1 point for each intersection surrounded by player’s stones (“territory”)
12
(81) 12 Go Basics – End game example Prisoners were removed already All 4 points marked X are dame – worthless Black has 7 points in UR (territory); 2 points in LL 1 removed prisoner TOTAL = 10 points White has 5 in UL; 2 in LR 2 prisoners TOTAL = 9 points Black wins unless komi (5.5 pts compensation) is due
13
(81) 13 Ranking and Handicaps Determine Go players’ strength Resemblance to martial arts Both amateur and professional ranking system Amateur 35 kyu to 1 kyu THEN 1 dan to 7 dan Pro 1 dan to 9 dan Awarded only by Go institutions Pro dans are much stronger than amateur dans
14
(81) 14 Ranking and Handicaps (2) Handicaps Weaker player starts with several stones on the board Placed at specific places Helps make games more even Difference in ranks ~ number of handicap stones needed to win 2 stones to even 2 dan against 4 dan 4 to even 3 kyu and 2 dan The most powerful Go programs reach only … … 10 kyu!
15
(81) 15 Outline Computers and Games The game of Go Experimental Setup Training of Go playing ANNs Evolution of Go Playing ANNs Summary and Outlook
16
(81) 16 Experimental Setup Opponent Go players ANN player Go board (input) representations Move (output) representations Coevolution Hall of Fame coevolution Cultural coevolution General evolution setup
17
(81) 17 Go Players - Random No strategy Pass move also “Knows” only the rules of go legality of moves Usually weakest opponent
18
(81) 18 Go Players – Naïve Player Roughly human-beginner level Able to save and capture stones Knows about Lost stones Saving - connecting stones to living groups Weak stones (not savable)
19
(81) 19 Go Players – Naïve Strategy A subset of JaGo’s (main opponent) strategy Outline (arranged by priority): Attempt to save Try to put opponent into atari Connect weak stones Capture opponent groups in atari Check intersections for placing stones In random order Make sure no (own) liberties decrease below 2 as a result Perform Random move
20
(81) 20 Go Players – JaGo Player Java based program Best computer player used Not a strong player ~16 kyu Knows standard techniques Mainly save & capture Uses pattern matching Looks at entire board 32 patterns, with rotations and mirrors
21
(81) 21 Go Players – JaGo Strategy (1) Save stones in atari Try to decrease liberties of large groups Find own savable larger groups Attack opponent’s groups (decreasing order:) With 2 or more liberties and attackable With 2 or more stones & less than 3 liberties With 2 or less liberties
22
(81) 22 Go Players – JaGo Strategy (2) Save own groups with few liberties if savable Start pattern matching – Response; Center Random move order Seek opponent’s groups to capture in 2 moves Perform random move which isn’t of a bad pattern Capture opponent’s single liberties Connect own weak stones PASS
23
(81) 23 Go Players – JaGo Patterns (1)
24
(81) 24 Go Players – JaGo Patterns (2)
25
(81) 25 Go Players – GNU Go Advantages 5x5 to 19x19 boards Handles handicaps well Rated 10 kyu Problems 5x5 solved – open an C3 for 18.5 points (komi=5.5) – always wins in Black GNU Go passes on B3, C2-4, D3 (only correct at C3) Premature convergence of evolution
26
(81) 26 ANN Player Inform ANN about actual position Evaluate ANN output to receive next move Representation is important! Intention maps For each Go move (including PASS) – value between [0,1] High value – high intention to make move (and v.v) Select legal move with highest value To avoid predictability – consider sub optimal moves also (“creativity factor”)
27
(81) 27 Player Strength Commonly to receive a rating unrated Go players play against rated players (same in Chess) The strength s of a player is determined by The score of 1000 double games Against each of 3 opponents: R, N, JaGo Divided by the number of games (6,000) 1 is perfect strength 3 opponents help resist over-fitting
28
(81) 28 Player Competence Strength is not understanding of rules (legality) E.g. 2 players receive same score but only one always tried legal moves first The competence C of a player is defined as follows: b i = games; i = moves; t ij = #tried illegal moves; k ij = #possible illegal move C is the averaged on all games
29
(81) 29 Board Representations 19x19 boards far too large Even for evolved agents Use only 5x5 boards
30
(81) 30 Board Representations Should preprocess position to make ANNs life easier Tested in training experiments Standard Input Representation (SIR) 2 neurons at each intersection :- 1 per player’s piece; 1 per opponent’s No distinction between B and W stones Optional – 1 neuron to tell if B or W (2*b^2) neurons (were b is board size) = 50
31
(81) 31 Representations - NIR Naïve Input Representation More compact 1 neuron per intersection Set to -1 (player’s stone) or 1 (opponent’s) 0 if empty Uses half of SIRs neurons = 25
32
(81) 32 Representations - LVIR Limited View Input Representation Splits the Go board into several quadratic areas of size 3x3 Idea – simplest way of capturing stones works within this area E.g. capture of 1 stone by surrounding it Areas overlap at middle row and middle column Coding – similar to SIR w is number of areas (=4) 72 Neurons Could also be Naïve
33
(81) 33 Clever Representations Based on image processing and circuits We want less raw inputs to allow ANN to concentrate more on features Manhattan distance Used in integrated circuits where wires run parallel to X or Y axis Got its name from Manhattan NY, where streets are aligned in grid P1 = (x1, x2) P2 = (y1, y2)
34
(81) 34 Clever Representations Manhattan distance is related to distance of Go stones (no diagonals) distance = [#(separating stones) – 1] 1 if next to each other 2 if separated by one stone 3 for knight’s move or two separating stones
35
(81) 35 Representations: c-o-Matrix Co-occurrence-matrices Used in image processing Many parameters are derived from it Mean, Sd, energy, contrast, homogeneity, … Quadratic Based on a relation p between image positions (symmetric if p is)
36
(81) 36 Representations: c-o-Matrix Elements C [i][j] = Number of times pixels occur in an image of a specified value (color) In the relation specified by p Relative to other pixels Size is number of different colors
37
(81) 37 Representations: c-o-Matrix An actual go board is an “image” with 3 different colors (including empty) Example p1: Manhattan distance of 1 between 2 points First matrix row: B near B 16 times B near W 3 times B near empty 11 times
38
(81) 38 Representations: c-o-Matrix Does not say much about absolute positions – must combine SIR and C for whole board NIR and C for whole board NIR and Cs for 3x3 areas sLVIR and Cs for 3x3 areas NLVIR and Cs for 3x3 areas
39
(81) 39 Output Representations Only 2 Standard Output Representation (SOR) Each intersection is represented by 1 neuron 1 for PASS (b^2 + 1) neurons
40
(81) 40 Output Representations Row Column Output Representation (RCOR) Used to decrease ANN size 5 neurons for columns; 5 for rows 1 for PASS (2b + 1) neurons Intention more complicated: PASS intention is square of relevant neuron RCOR Limits intention map: v1>v2 y1>y2 v4>v3 All values positive, non-zero
41
(81) 41 Coevolution Derives non-static fitness, as in nature 1 or more populations; interacting Competitive [battle] vs. Cooperative [subtasks] Advantages “Who needs enemies when you got friends like these?” – saves finding opponents; Especially in Go where no strong program exists Variety in fitness – adaptive opponents No upper bound for improvement
42
(81) 42 Coevolution Methods Applied Based on work by Lubberts & Mikkulainen [2001] Hall of Fame Host population and Master population Maintaining the ability of host population to beat opponents of previous generations Each generation, the best individual is added to HoF All population competes against sample of the HoF
43
(81) 43 Coevolution - HoF Applied in this resaearch HoF initially filled without competition Individuals get their fitness by competing against the masters When full - host with highest win rates (against masters) joins HoF Replace first Master to lose all games Coevolutionary progress cannot be directly seen Both populations constantly changeing
44
(81) 44 Cultural Coevolution A new approach! Maintains “culture” of masters resembling HoF To enter culture, host must defeat all masters Masters never replaced – unlimited culture size Every individual receives a fitness score by competing against all masters Culture growth rate decreases rapidly Every new master is the strongest found (yet)
45
(81) 45 Cultural Coevolution [2] Numerous advantages Maintains ability to defeat weak players Keeps good solutions found Same player cannot enter twice Needs to defeat itself Culture’s performance never decreases Avoid focusing on a specific player’s weakness As soon as any master is immune, the hosts have to find another way More masters less likely to remember all weaknesses
46
(81) 46 General Evolution Setup Opponents – Random; Naïve; JaGo Fitness = strength Rate of wins against all 3 opponents 6,000 games of both colors Not using scores, only win rates Defeating more opponents is better Generalized Multi-Layer Perceptrons (GMLPs) All non-loop connections are permitted Evolving Hidden neurons; connections; weights; bias (for non- input)
47
(81) 47 General Evolution Setup [2] 2 binary Chromosomes used 1 for connections : 0-no 1-yes 1 for hidden neurons (if 0, no connections also) Number of possible connections: n i, n h, n o – number of input, hidden and output neurons Determines size of chromosome Real-Chromosome Weights & Bias values (seen as weights) Size is number of connections + number of bias vals (for non-input)
48
(81) 48 General Evolution Setup [3] Tournament selection (size 2) 2 point crossover Binary mutation Flip bits with 1/L probability Real-Chromosome Mutation multiple-σSA Each object maintains altering “strategy” params which alter distribution of “object” params Normal distributions used for both
49
(81) 49 Setup – Recurrent Nets Difficult to learn Go without structured input Experiments with recurrent nets included Allow loops for input Ns Naturally represent adjacent board intersections No hidden Ns Played against JaGo Typically output changes without input change due to feedback loops Computed output only once! Only 2 directly connected Ns influence each other Evolutions should connect only close Ns
50
(81) 50 Outline Computers and Games The game of Go Experimental Setup Training of Go playing ANNs Evolution of Go Playing ANNs Summary and Outlook
51
(81) 51 Training ANNs – Setup Testing IRs mentioned previously No Go-specific knowledge used Each experiment was repeated 20 times Nets, same as Richards [1998] 3 layers; Fully connected; Feed forward Linear activation for input Ns; Sigmoid for rest 50 input; 26 output; 100 hidden - 7600 connections Patterns: JaGo vs Jago; 5x5 board; Rprop – resilient variant of Backprop
52
(81) 52 Training ANNs – Experiment 1 Determine number of training cycles Too few cycles Weights not adjusted properly Too many over-fitting Determine training pattern set Limit the level a Go player can reach Should include all 3 game stages Both expert and novice moves JaGo vs JaGo All game stages No distinction between winner and loser moves 1,000.. 5,000 Cycles; 50/100/200 Games
53
(81) 53 Training ANNs – Results 1 Average of 20 runs 100&200 games better than 50 3000\5000 games don’t add strength Best – 200 games; 2000 cycles Used hereafter
54
(81) 54 Training ANNs – Experiment 2 Determine number of hidden Ns Many Diverse features Few Few stronger features (perhaps better 1s) Less time-consuming 100 Ns yielded best results selected
55
(81) 55 Training ANNs – Experiment 3 Output representations Standard (SOR) vs Row-Column (RCOR) 200 patterns; 2000 games; 100 hidden Ns Similar strength; RCOR competence slightly lower RCOR still expansive and adds constraints SOR is used in the following experiments
56
(81) 56 Training ANNs – IR Experiments Various input representations Used reference-ANN (RANN) SIR & SOR; 100 hidden; 7,600 connections Strength = 0.2908; Competence = 0.8467 2,000 games; 200 cycles NIR (half input size) & SOR Strength = 0.2093; Competence = 0.8031 Naïve input makes it difficult to learn Go LVIR (3x3 windows) & SOR Strength = 0.2755; Competence = 0.8258 Slightly lower; LVIR doesn’t add input difficulty
57
(81) 57 Training ANNs – IRs [2] Whole Co-occur-matrix (dist=1,2,3); SIR&SOR Found better Strength & Competence! Knight’s-Move matrix adds relevant information Whole matrix (dist=1,2,3); NIR&SOR 21% less connections due to NIR Better than standard NIR, but still low
58
(81) 58 Training ANNs – IRs [3] 3x3 matrices (dist=1,2,3) ; NIR&SOR Low but ~20% better than previous (whole matrix) NIR 3x3 matrices (dist=1,2,3) ; LVIR\NLVIR Both matrices and board views use 3x3 windows No improvement; Huge number of Ns not necessary
59
(81) 59 Training ANNs – IRs Summary
60
(81) 60 Training ANNs – IRs Summary Trained ANNS better against JaGo compared to Naïve Although JaGo is better Some over-fitting for good players Against Naïve outputs close to zero – no repsonse NIR ANNs generally weaker than SIR Manhattan distance of 2 good against Random IR + whole matrix (dist=2) was strongest RANN is still best; Selected for evolution
61
(81) 61 Outline Computers and Games The game of Go Experimental Setup Training of Go playing ANNs Evolution of Go Playing ANNs Summary and Outlook
62
(81) 62 Evolving Go ANNs Setup of Evolution experiments Evolution of ANNs against Computer Players Random Player; Naïve; JaGo Recurrent against JaGo Coevolution Cultural Hall of Fame Training Evolved ANNs
63
(81) 63 Evolution Setup 5x5 boards; Komi of 5.5 50 Individuals Described previously (3 chromosomes) GMLPs with SIR and SOR Max 3,010 connections Recurrent ANNs Using NIR (25 Ns) and SOR (26) Max 2,601 connections Same strength measure as training (6k games)
64
(81) 64 Evolution Against Random Empirically 64 games to determine fitness Best ANN evolved {Str=0.4005; Comp=0.48} After 47 gens; 929 connections Evolved ANNs hardly reacted to different positions Always in the middle; Never in corners – creates eyes Unnecessary to “think” against Random Occasionally Random places at strategic intersection and then usually wins Only 3 of 20 best ANNs open at optimal C3
65
(81) 65 Evolution Against Naive Better player; ANNs develop better strategies Same setting 200 gens for ALL population to win ½ of games – fast learning Best {Str=0.69; Comp=0.487} after 2915 gens High strength and only 10 hidden !! Win rates Same against Naïve and Random Low against JaGo (~0.2) 25% use optimal opening move (still low) Exploit Naïve’s weaknesses at endgames
66
(81) 66 Evolution Against JaGo Far stronger than Naïve (85% wins) Takes significantly more time for each move Used distributed computing 64 games would take 32 hours per run Only 32 games for fitness - empirically sufficient Best {Str=0.772; Comp=0.476} after 1909 gens Scores 100% wins 1k gens to score 0.4; In 4 runs 100% wins in 3k gens!!! Sd twice as large – harder for evolution Weak against Naïve ~0.4;Strong against Random
67
(81) 67 Evolution Against JaGo Again, low competence ~0.5 Evolved strategies Still connecting stones but faster (responsive) Tenuki (abandon & play elsewhere) to distract JaGo 9 open optimally; All in 3x3 area around center Strength depends heavily on opening move Mid games sometimes show standard Go sequences! Take advantage of JaGo’s weakness – capturing weak stones
68
(81) 68 Recurrent Nets Evolution Natural representation on Go board Input are connected More time consuming Only 2 runs; 32 games; setting described previously 100% win rate within 1k generations!!! Both nets open at C3 Strategies 1 aggressive;1 distractive Protect; Create living groups; Bad Endgames Very high relative strength 0.94 Random; 0.49 Naïve (never played before)
69
(81) 69 Cultural Coevolution Until now much over-fitting was observed Fitness 8 games against all masters (4 each color) Few because games are quite similar Results of typical run – host population 3,500 gens 90% wins at 500 gens Stagnation around 1k Last master added at 462 After 2k Mean fitness decreases
70
(81) 70 Cultural Coevolution [2] Masters 21 ANNs After number 8 all have R>0.8 Last obtained Strength of 0.365 Strategy (both populations) Many random move selection Due to many saturated Ns (output=1) Games usually similar but multiple random moves are hard to defeat May be cause by mutation (Multiple-Self Adaption)
71
(81) 71 Cultural Coevolution [3] Strategy (cont.) Coevolution found easy solution Computer players are very difficult to beat with saturated neurons New extremely long experiment (60k gens!) was performed with different mutation (single-SA) Similar results, Except: Now most culture growth until gen 10k (last at 40k) Now less saturated Neurons Less fitness decrease despite increasing culture Strength
72
(81) 72 Cultural Coevolution [4] Culture Summary 80 members After #16 Random>0.94 After #29 all opened optimally After #57 all Strength>0.4 Wins against JaGo ~0.5 Naïve ~15 hidden Ns – fluctuate between successive
73
(81) 73 Recurrent & Cultural 10k gens Faster learning but basically same results R>0.9 at C11 (compared to C14) N>0.2 at 14 (compared to C37) Strategy Still bad against JaGo Bad openings! (only 2% optimal) Only last 5 masters close to center Learned not to capture dead groups
74
(81) 74 Hall of Fame Coevolution Compared to Cultural Parameters Important parameter is HoF size={1,2,4,8,16} Eight games against each master 3k gens were coevolved After coevolution all HoF ANNs were evaluated Every 100 gens the best ANN was evaluated
75
(81) 75 Hall of Fame Coevolution [2] Results – HoF size 1 Masters – low strength of 0.3625 In gen 1k – one ANN had 0.4 Lost solution HoF changed every generation cycles Results – HoF size 16 Master 5 – highest strength of 0.4403 in gen 400 Strength of 0.5057 was obtained and lost One master was replaced in every generation! Somehow weak masters remained in the HoF Host population stagnates (cycles)
76
(81) 76 Hall of Fame Coevolution [3] Strategies All place first stone at D4! HoF coevolution does not encourage diversity among ANNs
77
(81) 77 Training Evolved ANNs Evolution against JaGo – Strength ~0.77 4-16 hidden Ns Training Strength ~0.3 100 hidden Ns Check whether evolved structure is good Train after evolution Train without evolution only using structure
78
(81) 78 Training Evolved ANNs [2] Used best 2 evolved ANNs against JaGo Taken from runs 11 & 17 ANN11 – 10 hidden; 1178 connections ANN17 – 14 hidden; 1162 connections Trained with 200 games; 2,000 cycles Experiment 1 (post-evolution) Results Bad! Strength of 0.11 and 0.10 – Lower than any trained ANN (RANN has 0.29) High competence 0.89
79
(81) 79 Training Evolved ANNs [3] Experiment 2 – keep only evolved structure Strength below 0.152 (RANN is 0.29) Weakest against JaGo (0.05) although trained with JaGo Against Naïve 0.11 (same as RANN) Evolutions creates efficient structures Few hidden Ns Difficult to learn with training High competence due to they seldom responded with same move to different positions
80
(81) 80 Summary Training could not achieve high Go playing skills Evolved ANNs specialized in the opponent which was used during evolution Cultural coevolution generated strong players Strength increasing throughout the process Perhaps an ANN stronger than amateurs can be coevolved Recurrent nets learned faster
81
(81) 81 Summary [2] 2 coevolved (recurrent and feed-forward) won the grand tournament Coevolution proved better than evolution for developing Go strategies Recurrent ANNs would provide a field for further research More natural board representation Could contain a fixed input layer representing the board
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.