A Distributed Genetic Algorithm for Learning Evaluation Parameters in a Strategy Game Gary Matthias
Objective Design a genetic algorithm that learns the optimal set of evaluation parameters for the 5,5,4-game (Tic-Tac-Toe variant) Implement a game simulation engine that uses an alpha-beta search to select moves Parallelize the natural selection segment of the algorithm, i.e., game simulation Compare parallel performance and serial performance
The 5,5,4-Game
Natural Selection Initial population of size n Round-robin tournament: n*(n-1) games Fitness/survival determined heuristically from game results: 5*wins + 1*draws - 4*losses Non-surviving competitors replaced with children of surviving competitors Chance of reproduction proportional to fitness score
Recombination Mutation
Evaluation Parameters Used Specific squares (e.g. corner vs. middle) 12 parameters, 8 bits each Consecutive pieces (number and direction) (8 bits) Scale factors (weight given to specific squares or consecutive pieces) 2 parameters, 8 bits each Search depth 1 parameters, 3 bits each
Convergence Convergence was not considered for this project, however a reasonable criterion for termination would be when all games end in draws for several consecutive rounds. (Beware local solutions!) Since the 5,5,4-game is a theoretical draw with perfect play, a tournament of all draws may suggest that near-perfect strategy is being played by all competitors.
Parallel vs. Serial Performance Performance determined by running time of n simulated matches of the 5,5,4-game On starp.csail.mit.edu with 8 processes Parallel vs. serial ratio decreases with increasing n For a round-robin tournament of very large n, the parallel algorithm would significantly decrease running time n = 8 n = 16 n = 32 Parallel 19.33 s 36.08 s 67.29 s Serial 27.26 s 77.60 s 178.45 s Ratio 0.71 0.46 0.38