1 of 81 Evolution and Coevolution of ANNs playing Go Peter Mayer, 2004.

Slides:



Advertisements
Similar presentations
METAGAMER: An Agent for Learning and Planning in General Games Barney Pell NASA Ames Research Center.
Advertisements

Heuristic Search techniques
Dougal Sutherland, 9/25/13.
Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Generating Random Numbers
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Tetris – Genetic Algorithm Presented by, Jeethan & Jun.
Tetris and Genetic Algorithms Math Club 5/30/2011.
On the Genetic Evolution of a Perfect Tic-Tac-Toe Strategy
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.
Adversarial Search Chapter 6 Section 1 – 4.
Table of Contents Why Play Chess? Setting Up the Board Get to Know the Pieces Check and Checkmate What the Chess Pieces Are Worth Opening Goals Endgame.
Tuesday, May 14 Genetic Algorithms Handouts: Lecture Notes Question: when should there be an additional review session?
Artificial Intelligence in Game Design Introduction to Learning.
CSC321: Neural Networks Lecture 3: Perceptrons
Representing a Game Board In a game, we represent the action taking place using an array – In a very simple game, we use individual variables to represent.
Logging and Replay of Go Game Steven Davis Elizabeth Fehrman Seth Groder.
Learning Shape in Computer Go David Silver. A brief introduction to Go Black and white take turns to place down stones Once played, a stone cannot move.
AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.
Evolution and Coevolution of Artificial Neural Networks playing Go Thesis by Peter Maier, Salzburg, April 2004 Additional paper used Computer Go, by Martin.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2006.
1 Game Playing Chapter 6 (supplement) Various deterministic board games Additional references for the slides: Luger’s AI book (2005). Robert Wilensky’s.
1 Solving Ponnuki-Go on Small Board Paper: Solving Ponnuki-Go on small board Authors: Erik van der Werf, Jos Uiterwijk, Jaap van den Herik Presented by:
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
Dan Simon Cleveland State University
Corea Japan China WeiqiGoBaduk. The Go is one of the oldest board game in the world. Its true origins are unknown, though it almost certainly originated.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Genetic Algorithm.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
Game Playing.
Evolving a Sigma-Pi Network as a Network Simulator by Justin Basilico.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
Computer Go : A Go player Rohit Gurjar CS365 Project Proposal, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
 Summary  How to Play Go  Project Details  Demo  Results  Conclusions.
CS 484 – Artificial Intelligence1 Announcements Lab 3 due Tuesday, November 6 Homework 6 due Tuesday, November 6 Lab 4 due Thursday, November 8 Current.
For games. 1. Control  Controllers for robotic applications.  Robot’s sensory system provides inputs and output sends the responses to the robot’s motor.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Arrays Tonga Institute of Higher Education. Introduction An array is a data structure Definitions  Cell/Element – A box in which you can enter a piece.
Game-playing AIs Part 1 CIS 391 Fall CSE Intro to AI 2 Games: Outline of Unit Part I (this set of slides)  Motivation  Game Trees  Evaluation.
Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
Neural Network Implementation of Poker AI
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #12 2/20/02 Evolutionary Algorithms.
Solving Kriegspiel endings with brute force: the case of KR vs. K Paolo Ciancarini Gian Piero Favini University of Bologna.
A game based off of the esteemed classic By: Tadziu Kosiara.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
Chess Strategies Component Skills Strategies Prototype Josh Waters, Ty Fenn, Tianyu Chen.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Artificial Neural Networks (ANN). Artificial Neural Networks First proposed in 1940s as an attempt to simulate the human brain’s cognitive learning processes.
Chapter 6 Neural Network.
Genetic Programming Using Simulated Natural Selection to Automatically Write Programs.
Neural Networks References: “Artificial Intelligence for Games” "Artificial Intelligence: A new Synthesis"
Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 1 Authors : Siming Liu, Christopher Ballinger, Sushil Louis
 Presented By: Abdul Aziz Ghazi  Roll No:  Presented to: Sir Harris.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Neural Network Architecture Session 2
CHESS.
Evolutionary Ensembles with Negative Correlation Learning
These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like.
Presentation transcript:

1 of 81 Evolution and Coevolution of ANNs playing Go Peter Mayer, 2004

(81) 2 Outline  Computers and Games  The game of Go  Experimental Setup  Training of Go playing ANNs  Evolution of Go Playing ANNs  Summary and Outlook

(81) 3 Games  Algorithms designed since AIs onset  Clearly defined rules  Still complex  Chess received the most attention  More researched than Go  Two main approaches  Rely on expertise – directly programmed weighted features; Extensive knowledge  Use evolution – less knowledge; more versatility

(81) 4 The game of Go  Oldest (unaltered) strategic board game in the world  10,000,000 players in Japan “alone”  Fairly simple rules  BUT difficult to master  Immense tree (~200 opts)  Complex structures  Many concurrent goals

(81) 5 Go Rules  19x19 board  Empty in the beginning  Black & White “stones”  Black starts  Each turn  Place 1 stone  At an intersection  Never move stones  OR pass

(81) 6 Go Rules [2]  Objective - Get the most points !  Points are acquired by:  Securing Territories  Capturing opp’s pieces

(81) 7 Go Rules [3]  Stones at a vertically or horizontally adjacent intersection are called a group  An empty intersection adjacent to a stone or group is called a "liberty" of that groupliberty  1 Liberty = group in “atari”  No liberties -> CAPTURE ! Group is removed  Example – Black places stone in X resulting in right figure

(81) 8 Go Rules [4]  Stones can be placed anywhere, but cannot commit suicide (except Chinese)  Legal if stone simultaneously captures opponent’s group (2 right figures) Suicide – white cannot place at X White CAN place at X Result: capture

(81) 9 Go Rules [5]  Same position cannot occur more than once  Endless repetitions:  Black can capture at upper figure by placing at X  White - same by placing at Y  Black – repeat…  Ko rule  White may not place at Y before playing somewhere else first  Avoid any repetitions

(81) 10 Go Rules – Live and Dead groups  “Dead” groups if impossible to prevent capture  It is not necessary to do so  Group remains on board  At end of game, removed and added to captured stones  “Living” groups are impossible to capture  Group with 2 “eyes” – even if white surrounds it, playing at X or Y is suicide  Opponent must play elsewhere

(81) 11 Go Basics – End game  Play continues until both players pass  Players then alternatively play stones at “neutral” points – adjacent to both White and Black  Also known as “dame” (DAH-MAY)  Dead stones are removed from the board and counted with other prisoners (1 point per prisoner)  Also - 1 point for each intersection surrounded by player’s stones (“territory”)

(81) 12 Go Basics – End game example  Prisoners were removed already  All 4 points marked X are dame – worthless  Black has  7 points in UR (territory); 2 points in LL  1 removed prisoner  TOTAL = 10 points  White has  5 in UL; 2 in LR  2 prisoners  TOTAL = 9 points  Black wins unless komi (5.5 pts compensation) is due

(81) 13 Ranking and Handicaps  Determine Go players’ strength  Resemblance to martial arts  Both amateur and professional ranking system  Amateur  35 kyu to 1 kyu  THEN 1 dan to 7 dan  Pro  1 dan to 9 dan  Awarded only by Go institutions  Pro dans are much stronger than amateur dans

(81) 14 Ranking and Handicaps (2)  Handicaps  Weaker player starts with several stones on the board  Placed at specific places  Helps make games more even  Difference in ranks ~ number of handicap stones needed to win  2 stones to even 2 dan against 4 dan  4 to even 3 kyu and 2 dan  The most powerful Go programs reach only …  … 10 kyu!

(81) 15 Outline  Computers and Games  The game of Go  Experimental Setup  Training of Go playing ANNs  Evolution of Go Playing ANNs  Summary and Outlook

(81) 16 Experimental Setup  Opponent Go players  ANN player  Go board (input) representations  Move (output) representations  Coevolution  Hall of Fame coevolution  Cultural coevolution  General evolution setup

(81) 17 Go Players - Random  No strategy  Pass move also  “Knows” only the rules of go  legality of moves  Usually weakest opponent

(81) 18 Go Players – Naïve Player  Roughly human-beginner level  Able to save and capture stones  Knows about  Lost stones  Saving - connecting stones to living groups  Weak stones (not savable)

(81) 19 Go Players – Naïve Strategy  A subset of JaGo’s (main opponent) strategy  Outline (arranged by priority):  Attempt to save  Try to put opponent into atari  Connect weak stones  Capture opponent groups in atari  Check intersections for placing stones  In random order  Make sure no (own) liberties decrease below 2 as a result  Perform Random move

(81) 20 Go Players – JaGo Player  Java based program  Best computer player used  Not a strong player ~16 kyu  Knows standard techniques  Mainly save & capture  Uses pattern matching  Looks at entire board  32 patterns, with rotations and mirrors

(81) 21 Go Players – JaGo Strategy (1)  Save stones in atari  Try to decrease liberties of large groups  Find own savable larger groups  Attack opponent’s groups (decreasing order:)  With 2 or more liberties and attackable  With 2 or more stones & less than 3 liberties  With 2 or less liberties

(81) 22 Go Players – JaGo Strategy (2)  Save own groups with few liberties if savable  Start pattern matching – Response; Center  Random move order  Seek opponent’s groups to capture in 2 moves  Perform random move which isn’t of a bad pattern  Capture opponent’s single liberties  Connect own weak stones  PASS

(81) 23 Go Players – JaGo Patterns (1)

(81) 24 Go Players – JaGo Patterns (2)

(81) 25 Go Players – GNU Go  Advantages  5x5 to 19x19 boards  Handles handicaps well  Rated 10 kyu  Problems  5x5 solved – open an C3 for 18.5 points (komi=5.5) – always wins in Black  GNU Go passes on B3, C2-4, D3 (only correct at C3)  Premature convergence of evolution

(81) 26 ANN Player  Inform ANN about actual position  Evaluate ANN output to receive next move  Representation is important!  Intention maps  For each Go move (including PASS) – value between [0,1]  High value – high intention to make move (and v.v)  Select legal move with highest value To avoid predictability – consider sub optimal moves also (“creativity factor”)

(81) 27 Player Strength  Commonly to receive a rating unrated Go players play against rated players (same in Chess)  The strength s of a player is determined by  The score of 1000 double games  Against each of 3 opponents: R, N, JaGo  Divided by the number of games (6,000)  1 is perfect strength  3 opponents help resist over-fitting

(81) 28 Player Competence  Strength is not understanding of rules (legality)  E.g. 2 players receive same score but only one always tried legal moves first  The competence C of a player is defined as follows:  b i = games; i = moves; t ij = #tried illegal moves; k ij = #possible illegal move  C is the averaged on all games

(81) 29 Board Representations  19x19 boards  far too large  Even for evolved agents  Use only 5x5 boards

(81) 30 Board Representations  Should preprocess position to make ANNs life easier  Tested in training experiments  Standard Input Representation (SIR)  2 neurons at each intersection :-  1 per player’s piece; 1 per opponent’s  No distinction between B and W stones  Optional – 1 neuron to tell if B or W  (2*b^2) neurons (were b is board size) = 50

(81) 31 Representations - NIR  Naïve Input Representation  More compact  1 neuron per intersection  Set to -1 (player’s stone) or 1 (opponent’s)  0 if empty  Uses half of SIRs neurons = 25

(81) 32 Representations - LVIR  Limited View Input Representation  Splits the Go board into several quadratic areas of size 3x3  Idea – simplest way of capturing stones works within this area  E.g. capture of 1 stone by surrounding it  Areas overlap at middle row and middle column  Coding – similar to SIR  w is number of areas (=4)  72 Neurons  Could also be Naïve

(81) 33 Clever Representations  Based on image processing and circuits  We want less raw inputs to allow ANN to concentrate more on features  Manhattan distance  Used in integrated circuits where wires run parallel to X or Y axis  Got its name from Manhattan NY, where streets are aligned in grid  P1 = (x1, x2)  P2 = (y1, y2)

(81) 34 Clever Representations  Manhattan distance is related to distance of Go stones (no diagonals)  distance = [#(separating stones) – 1]  1 if next to each other  2 if separated by one stone  3 for knight’s move or two separating stones

(81) 35 Representations: c-o-Matrix  Co-occurrence-matrices  Used in image processing  Many parameters are derived from it  Mean, Sd, energy, contrast, homogeneity, …  Quadratic  Based on a relation p between image positions (symmetric if p is)

(81) 36 Representations: c-o-Matrix  Elements C [i][j] =  Number of times pixels occur in an image of a specified value (color)  In the relation specified by p  Relative to other pixels  Size is number of different colors

(81) 37 Representations: c-o-Matrix  An actual go board is an “image” with 3 different colors (including empty)  Example  p1: Manhattan distance of 1 between 2 points  First matrix row:  B near B 16 times  B near W 3 times  B near empty 11 times

(81) 38 Representations: c-o-Matrix  Does not say much about absolute positions – must combine  SIR and C for whole board  NIR and C for whole board  NIR and Cs for 3x3 areas  sLVIR and Cs for 3x3 areas  NLVIR and Cs for 3x3 areas

(81) 39 Output Representations  Only 2  Standard Output Representation (SOR)  Each intersection is represented by 1 neuron  1 for PASS  (b^2 + 1) neurons

(81) 40 Output Representations  Row Column Output Representation (RCOR)  Used to decrease ANN size  5 neurons for columns; 5 for rows  1 for PASS  (2b + 1) neurons  Intention more complicated:  PASS intention is square of relevant neuron  RCOR Limits intention map:  v1>v2  y1>y2  v4>v3  All values positive, non-zero

(81) 41 Coevolution  Derives non-static fitness, as in nature  1 or more populations; interacting  Competitive [battle] vs. Cooperative [subtasks]  Advantages  “Who needs enemies when you got friends like these?” – saves finding opponents; Especially in Go where no strong program exists  Variety in fitness – adaptive opponents  No upper bound for improvement

(81) 42 Coevolution Methods Applied  Based on work by Lubberts & Mikkulainen [2001]  Hall of Fame  Host population and Master population  Maintaining the ability of host population to beat opponents of previous generations  Each generation, the best individual is added to HoF  All population competes against sample of the HoF

(81) 43 Coevolution - HoF  Applied in this resaearch  HoF initially filled without competition  Individuals get their fitness by competing against the masters  When full - host with highest win rates (against masters) joins HoF  Replace first Master to lose all games  Coevolutionary progress cannot be directly seen  Both populations constantly changeing

(81) 44 Cultural Coevolution  A new approach!  Maintains “culture” of masters resembling HoF  To enter culture, host must defeat all masters  Masters never replaced – unlimited culture size  Every individual receives a fitness score by competing against all masters  Culture growth rate decreases rapidly  Every new master is the strongest found (yet)

(81) 45 Cultural Coevolution [2]  Numerous advantages  Maintains ability to defeat weak players  Keeps good solutions found  Same player cannot enter twice  Needs to defeat itself  Culture’s performance never decreases  Avoid focusing on a specific player’s weakness  As soon as any master is immune, the hosts have to find another way  More masters  less likely to remember all weaknesses

(81) 46 General Evolution Setup  Opponents – Random; Naïve; JaGo  Fitness = strength  Rate of wins against all 3 opponents  6,000 games of both colors  Not using scores, only win rates  Defeating more opponents is better  Generalized Multi-Layer Perceptrons (GMLPs)  All non-loop connections are permitted  Evolving  Hidden neurons; connections; weights; bias (for non- input)

(81) 47 General Evolution Setup [2]  2 binary Chromosomes used  1 for connections : 0-no 1-yes  1 for hidden neurons (if 0, no connections also)  Number of possible connections:  n i, n h, n o – number of input, hidden and output neurons  Determines size of chromosome  Real-Chromosome  Weights & Bias values (seen as weights)  Size is number of connections + number of bias vals (for non-input)

(81) 48 General Evolution Setup [3]  Tournament selection (size 2)  2 point crossover  Binary mutation  Flip bits with 1/L probability  Real-Chromosome Mutation  multiple-σSA  Each object maintains altering “strategy” params which alter distribution of “object” params  Normal distributions used for both

(81) 49 Setup – Recurrent Nets  Difficult to learn Go without structured input  Experiments with recurrent nets included  Allow loops for input Ns  Naturally represent adjacent board intersections  No hidden Ns  Played against JaGo  Typically output changes without input change due to feedback loops  Computed output only once!  Only 2 directly connected Ns influence each other  Evolutions should connect only close Ns

(81) 50 Outline  Computers and Games  The game of Go  Experimental Setup  Training of Go playing ANNs  Evolution of Go Playing ANNs  Summary and Outlook

(81) 51 Training ANNs – Setup  Testing IRs mentioned previously  No Go-specific knowledge used  Each experiment was repeated 20 times  Nets, same as Richards [1998]  3 layers; Fully connected; Feed forward  Linear activation for input Ns; Sigmoid for rest  50 input; 26 output; 100 hidden connections  Patterns:  JaGo vs Jago; 5x5 board;  Rprop – resilient variant of Backprop

(81) 52 Training ANNs – Experiment 1  Determine number of training cycles  Too few cycles  Weights not adjusted properly  Too many  over-fitting  Determine training pattern set  Limit the level a Go player can reach  Should include all 3 game stages  Both expert and novice moves  JaGo vs JaGo  All game stages  No distinction between winner and loser moves  1, ,000 Cycles; 50/100/200 Games

(81) 53 Training ANNs – Results 1  Average of 20 runs  100&200 games better than 50  3000\5000 games don’t add strength  Best – 200 games; 2000 cycles  Used hereafter

(81) 54 Training ANNs – Experiment 2  Determine number of hidden Ns  Many  Diverse features  Few  Few stronger features (perhaps better 1s)  Less time-consuming  100 Ns yielded best results  selected

(81) 55 Training ANNs – Experiment 3  Output representations  Standard (SOR) vs Row-Column (RCOR)  200 patterns; 2000 games; 100 hidden Ns  Similar strength; RCOR competence slightly lower  RCOR still expansive and adds constraints  SOR is used in the following experiments

(81) 56 Training ANNs – IR Experiments  Various input representations  Used reference-ANN (RANN)  SIR & SOR; 100 hidden; 7,600 connections  Strength = ; Competence =  2,000 games; 200 cycles  NIR (half input size) & SOR  Strength = ; Competence =  Naïve input makes it difficult to learn Go  LVIR (3x3 windows) & SOR  Strength = ; Competence =  Slightly lower; LVIR doesn’t add input difficulty

(81) 57 Training ANNs – IRs [2]  Whole Co-occur-matrix (dist=1,2,3); SIR&SOR  Found better Strength & Competence!  Knight’s-Move matrix adds relevant information  Whole matrix (dist=1,2,3); NIR&SOR  21% less connections due to NIR  Better than standard NIR, but still low

(81) 58 Training ANNs – IRs [3]  3x3 matrices (dist=1,2,3) ; NIR&SOR  Low but ~20% better than previous (whole matrix) NIR  3x3 matrices (dist=1,2,3) ; LVIR\NLVIR  Both matrices and board views use 3x3 windows  No improvement; Huge number of Ns not necessary

(81) 59 Training ANNs – IRs Summary

(81) 60 Training ANNs – IRs Summary  Trained ANNS better against JaGo compared to Naïve  Although JaGo is better  Some over-fitting for good players  Against Naïve outputs close to zero – no repsonse  NIR ANNs generally weaker than SIR  Manhattan distance of 2 good against Random  IR + whole matrix (dist=2) was strongest  RANN is still best; Selected for evolution

(81) 61 Outline  Computers and Games  The game of Go  Experimental Setup  Training of Go playing ANNs  Evolution of Go Playing ANNs  Summary and Outlook

(81) 62 Evolving Go ANNs  Setup of Evolution experiments  Evolution of ANNs against Computer Players  Random Player; Naïve; JaGo  Recurrent against JaGo  Coevolution  Cultural  Hall of Fame  Training Evolved ANNs

(81) 63 Evolution Setup  5x5 boards; Komi of 5.5  50 Individuals  Described previously (3 chromosomes)  GMLPs with SIR and SOR  Max 3,010 connections  Recurrent ANNs  Using NIR (25 Ns) and SOR (26)  Max 2,601 connections  Same strength measure as training (6k games)

(81) 64 Evolution Against Random  Empirically 64 games to determine fitness  Best ANN evolved {Str=0.4005; Comp=0.48}  After 47 gens; 929 connections  Evolved ANNs hardly reacted to different positions  Always in the middle; Never in corners – creates eyes  Unnecessary to “think” against Random  Occasionally Random places at strategic intersection and then usually wins  Only 3 of 20 best ANNs open at optimal C3

(81) 65 Evolution Against Naive  Better player; ANNs develop better strategies  Same setting  200 gens for ALL population to win ½ of games – fast learning  Best {Str=0.69; Comp=0.487} after 2915 gens  High strength and only 10 hidden !!  Win rates  Same against Naïve and Random  Low against JaGo (~0.2)  25% use optimal opening move (still low)  Exploit Naïve’s weaknesses at endgames

(81) 66 Evolution Against JaGo  Far stronger than Naïve (85% wins)  Takes significantly more time for each move  Used distributed computing  64 games would take 32 hours per run  Only 32 games for fitness - empirically sufficient  Best {Str=0.772; Comp=0.476} after 1909 gens  Scores 100% wins  1k gens to score 0.4;  In 4 runs 100% wins in 3k gens!!!  Sd twice as large – harder for evolution  Weak against Naïve ~0.4;Strong against Random

(81) 67 Evolution Against JaGo  Again, low competence ~0.5  Evolved strategies  Still connecting stones but faster (responsive)  Tenuki (abandon & play elsewhere) to distract JaGo  9 open optimally; All in 3x3 area around center  Strength depends heavily on opening move  Mid games sometimes show standard Go sequences!  Take advantage of JaGo’s weakness – capturing weak stones

(81) 68 Recurrent Nets Evolution  Natural representation on Go board  Input are connected  More time consuming  Only 2 runs; 32 games; setting described previously  100% win rate within 1k generations!!!  Both nets open at C3  Strategies  1 aggressive;1 distractive  Protect; Create living groups; Bad Endgames  Very high relative strength  0.94 Random; 0.49 Naïve (never played before)

(81) 69 Cultural Coevolution  Until now much over-fitting was observed  Fitness  8 games against all masters (4 each color)  Few because games are quite similar  Results of typical run – host population  3,500 gens  90% wins at 500 gens  Stagnation around 1k  Last master added at 462  After 2k Mean fitness decreases

(81) 70 Cultural Coevolution [2]  Masters  21 ANNs  After number 8 all have R>0.8  Last obtained Strength of  Strategy (both populations)  Many random move selection  Due to many saturated Ns (output=1)  Games usually similar but multiple random moves are hard to defeat  May be cause by mutation (Multiple-Self Adaption)

(81) 71 Cultural Coevolution [3]  Strategy (cont.)  Coevolution found easy solution  Computer players are very difficult to beat with saturated neurons  New extremely long experiment (60k gens!) was performed with different mutation (single-SA)  Similar results, Except:  Now most culture growth until gen 10k (last at 40k)  Now less saturated Neurons  Less fitness decrease despite increasing culture Strength

(81) 72 Cultural Coevolution [4]  Culture Summary  80 members  After #16 Random>0.94  After #29 all opened optimally  After #57 all Strength>0.4  Wins against JaGo ~0.5 Naïve  ~15 hidden Ns – fluctuate between successive

(81) 73 Recurrent & Cultural  10k gens  Faster learning but basically same results  R>0.9 at C11 (compared to C14)  N>0.2 at 14 (compared to C37)  Strategy  Still bad against JaGo  Bad openings! (only 2% optimal)  Only last 5 masters close to center  Learned not to capture dead groups

(81) 74 Hall of Fame Coevolution  Compared to Cultural  Parameters  Important parameter is HoF size={1,2,4,8,16}  Eight games against each master  3k gens were coevolved  After coevolution all HoF ANNs were evaluated  Every 100 gens the best ANN was evaluated

(81) 75 Hall of Fame Coevolution [2]  Results – HoF size 1  Masters – low strength of  In gen 1k – one ANN had 0.4  Lost solution  HoF changed every generation  cycles  Results – HoF size 16  Master 5 – highest strength of in gen 400  Strength of was obtained and lost  One master was replaced in every generation!  Somehow weak masters remained in the HoF  Host population stagnates (cycles)

(81) 76 Hall of Fame Coevolution [3]  Strategies  All place first stone at D4!  HoF coevolution does not encourage diversity among ANNs

(81) 77 Training Evolved ANNs  Evolution against JaGo –  Strength ~0.77  4-16 hidden Ns  Training  Strength ~0.3  100 hidden Ns  Check whether evolved structure is good  Train after evolution  Train without evolution only using structure

(81) 78 Training Evolved ANNs [2]  Used best 2 evolved ANNs against JaGo  Taken from runs 11 & 17  ANN11 – 10 hidden; 1178 connections  ANN17 – 14 hidden; 1162 connections  Trained with 200 games; 2,000 cycles  Experiment 1 (post-evolution) Results  Bad!  Strength of 0.11 and 0.10 –  Lower than any trained ANN (RANN has 0.29)  High competence 0.89

(81) 79 Training Evolved ANNs [3]  Experiment 2 – keep only evolved structure  Strength below (RANN is 0.29)  Weakest against JaGo (0.05) although trained with JaGo  Against Naïve 0.11 (same as RANN)  Evolutions creates efficient structures  Few hidden Ns  Difficult to learn with training  High competence due to they seldom responded with same move to different positions

(81) 80 Summary  Training could not achieve high Go playing skills  Evolved ANNs specialized in the opponent which was used during evolution  Cultural coevolution generated strong players  Strength increasing throughout the process  Perhaps an ANN stronger than amateurs can be coevolved  Recurrent nets learned faster

(81) 81 Summary [2]  2 coevolved (recurrent and feed-forward) won the grand tournament  Coevolution proved better than evolution for developing Go strategies  Recurrent ANNs would provide a field for further research  More natural board representation  Could contain a fixed input layer representing the board