1 UM Stratego Colin Schepers Daan Veltman Enno Ruijters Leon Gerritsen Niek den Teuling Yannick Thimister
2 IntroductionYannick The game of StrategoDaan Evaluation FunctionLeon Monte CarloColin Genetic AlgorithmEnno Opponent modeling and strategyNiek ConclusionYannick Content
3 The game of Stratego Board of 10x10 Setup field 4x10
4 The game of Stratego B Bombs 1 Marshall 2 General 3 Colonels 4 Majors 5 Captains 6 Lieutenants 7 Sergeants 8 Miner 9 Scout S Spy F Flag
5 The game of Stratego Win Flag capture Unmovable pieces Draw Unmovable pieces Maximum moves
6 Starting Positions Flag placed Bombs placed Remaining pieces placed randomly
7 Starting Positions Distance to Freedom Being bombed in Partial obstruction Adjacency Flag defence Startup Pieces
8 Starting Positions Distance to Freedom
9 Starting Positions Startup Pieces
10 Sub-functions of the evaluation function: Material value Information value Near enemy piece value Near flag value Progressive bonus value First-move penalty Evaluation Function
11 Evaluation Function How it works: All the sub-functions return a value These values are then weighted and added to each other The higher the total added value, the better that move is for the player
12 Evaluation Function Material Value: Used for comparing the two players' board strengths Each piece type has a value Total value of the opponent's board is subtracted from the player's board value Positive value means strong player board Negative value means weak player board
13 Evaluation Function Information value: Stimulates the collection of opponent information and the keeping of personal piece information Each piece type has a certain information value All the values from each side are summed up and then substracted from each other A marshall being discovered is worse than a scout being discovered
14 Evaluation Function Near enemy piece value Checks if a moveable piece can or cannot defeat a piece next to it If piece can be defeated, return positive score If not, return a negative one If piece unknown, return 0
15 Evaluation Function Near flag value Stimulates the defence of own flag and the attacking of enemy's flag Constructs array with possible enemy flag locations If enemy near own flag, return negative number If own piece near possible enemy flag, return positive number
16 Evaluation Function Progressive bonus value Stimulates the advancement of pieces towards enemy lines Returns a positive value if piece moves forward Negative if backward
17 Evaluation Function First-move value Keeps pieces from giving away information Keeps the number of unmoved pieces high
18 Monte Carlo A subset of all possible moves is played No strategy or weights used Evaluation value received after every move At the end a comparison of evaluation values determines the best move A depth limit is used so the tree doesn't grow to big and the algorithm will end at some point
19 Monte Carlo Advantages: Simple implementation Can be changed quickly Easy observation of behavior Good documentation Good for partial information situations
20 Monte Carlo Disadvantages: Generally not smart Dependent on the evaluation function Computationally slow Tree grows very fast
21 Monte Carlo Experiments MC against lower-depth MC PlayerWinsLossesDraw MC MC-LD592849
22 Monte Carlo Experiments MC against no-depth MC PlayerWinsLossesDraw MC15212 MC-ND21512
23 Monte Carlo Experiments MC against deeper-depth but narrower MC PlayerWinsLossesDraw MC5211 MC-DDN2511
24 Monte Carlo Experiments MC against narrower MC PlayerWinsLossesDraw MC MC-N186285
25 Genetic Algorithm Evolve weights of the terms in the evaluation functions AI uses standard expectiminimax search tree Evolution strategies (evolution paremeters are themselves evolved)
26 Genetic Algorithm Genome: Mutation:
27 Genetic Algorithm Crossover: σ and α of parents average weights: Averaged if Else randomly chosen from parents
28 Genetic Algorithm Fitness function: Win bonus Number of own pieces left Number of turns spent
29 Genetic Algorithm Reference AI: Monte Carlo AI Self-selecting reference genome Select average genome from each generation Pick winner between this genome and previous reference
30 Hill climbing The GA takes too long to train Hill climbing is faster
31 Opponent modeling Observing moves Ruling out pieces Stronger pieces are moved towards you Weaker pieces are moved away
32 Opponent modeling No knowledge about enemy pieces at the start Updating the probabilities Update the probability of the moving piece Update probabilities of all other pieces
33 Monte Carlo Experiments MC against MC with opponent modeling using a database of Human versus human games PlayerWinsLossesDraw MC MC-OM443958
34 Monte Carlo Experiments MC against MC with opponent modeling using a database of MC versus MC games PlayerWinsLossesDraw MC MC-OM
35 Strategy Split the game up into phases Exploration phase Until 25% of enemy pieces are identified Elimination phase Until 70% of enemy pieces are killed End-game phase Alter the evaluation function
36 Conclusion Both AIs are very slow The genetic AI takes too long to train In case of Stratego, tweaking a few weights may not be an optimal way to create an intelligent player