Download presentation
Presentation is loading. Please wait.
Published byBeatrice Spencer Modified over 9 years ago
1
Co-evolution time, changing environments agents
2
Static search space solutions are determined by the optimization process only, by doing variation and selection evaluation determines fitness of each solution BUT…
3
Dynamic search space what happens if state of environment changes between actions of the optimization process AND / OR other processes are concurrently trying to change the state also
4
Agents to act in changing environments the search is now for an agent strategy to act successfully in a changing environment example: agent is defined with an algorithm based on a set of parameters ‘learning’ means searching for parameter settings to make the algorithm successful in the environment agent algorithm takes current state of environment as input and outputs an action to change the environment
5
Optimizing an agent procedural optimization recall7.1.3 finite state machines (recognize a string) and 7.1.4 symbolic expressions (control operation of a cart) which are functions
6
The environment of activity parameter space, e.g. game nodes are game states edges are legal moves that transform one state to another example: tic-tac-toe, rock-scissors-paper
7
Agent in the environment procedure to input current state and determine an action to alter the state evaluated according to its success at achieving goals in the environment external goal - environment in desirable state (win game) internal goal - maintain internal state (survive)
8
Models of agent implementation 1.look-up table: state / action 2.deterministic algorithm hard-coded intelligence ideal if perfect strategy is known, otherwise, parameterized algorithm ‘tune’ the parameters to optimize performance 3.neural network
9
Optimizing agent procedure initialize agent parameters evaluate agent success in environment repeat modify parameters evaluate modified agent success if improved success, retain modified parameters
10
Environment features 1.other agents? 2.state changes independent of agent(s)? 3.randomness? 4.discrete or continuous? 5.sequential or parallel?
11
Environment features game examples 1.other agents? solitaire, two-person, multi-player
12
Environment features game examples 2.state changes independent of agent(s)? YES simulators (weather, resources) NO board games
13
Environment features game examples 3.randomness? NO chess, sudoku(!) YES dice, card games
14
Environment features game examples 4.discrete or continuous? video games board games
15
Environment features game examples 5.sequential or parallel? turn-taking, simultaneous which sports? marathon, javelin, hockey, tennis
16
Models of agent optimization evolving agent vs environment evolving agent vs skilled player co-evolving agents competing as equals e.g., playing a symmetric game competing as non-equals (predator - prey) e.g., playing an asymmetric game co-operative
17
Example: Game Theory invented by John von Neumann models concurrent* interaction between agents each player has a set of choices players concurrently make a choice each player receives a payoff based on the outcome of play: the vector of choices e.g. paper-scissors-rock 2 players, with 3 choices each 3 outcomes: win, lose, draw *there are sequential Game Theory models also
18
Payoff matrix tabulates payoffs for all outcomes e.g. paper-scissors-rock P1 \ P2 paperscissorsrock paper(draw,draw)(lose,win)(win,lose) scissors(win,lose)(draw,draw)(lose,win) rock(lose,win)(win,lose)(draw,draw)
19
Two-player two-choice ordinal games (2 x 2 games) Four outcomes Each player has four ordered payoffs example game: P1 \ P2 LR U1,23,1 D2,44,3
20
2 x 2 games 144 distinct games based on relation of payoffs to outcomes: (4! x 4!) / (2 x 2) model many real world encounters environment for studying agent strategies how do players decide what choice to make? P1 \ P2 LR U1,23,1 D2,44,3
21
2 x 2 games player strategy: minimax - minimize losses pick row / column with largest minimum assumes nothing about other player P1 \ P2 LR U1,21,23,13,1 D2,42,44,34,3
22
2 x 2 games player strategies: dominant strategy preferred choice, whatever opponent does does not determine a choice in all games P1 \ P2 LR U1,31,34,24,2 D3,43,42,12,1 LR U1,21,24,34,3 D3,43,42,12,1 LR U2,12,11,21,2 D4,34,33,43,4
23
2 x 2 games some games are dilemmas “prisoner’s dilemma” C = ‘cooperate’, D = ‘defect’ P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2
24
2 x 2 games - playing strategies what happens if players use different strategies? how do/should players decide in difficult games? modeling players and evaluating in games P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2
25
iterated games can players analyze games after playing and determine a better strategy for next encounter? iterated play e.g., prisoner’s dilemma: can players learn to trust each other and get better payoffs? P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2
26
iterated prisoner’s dilemma player strategy for choice as a function of previous choices and outcomes P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2 game12…i-1ii+1… P1CC…DD? P2DC…CD? P1: choice i+1,P1 = f( choice i,P1, choice i,P2 )
27
iterated prisoner’s dilemma evaluating strategies based on payoffs in iterated play P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2 P1: choice n+1,P1 = f( choice n,P1, choice n,P2 ) game12…i-1ii+1…n-1nΣ 13…423…43298 P1CC…DDC…DC P2DC…CDC…CC 43…123…13343
28
tournament play set of K strategies for a game, e.g., iterated prisoner’s dilemma evaluate strategies by performance against all other strategies round-robin tournament
29
example of tournament play P1P2…P(K-1)PKPK Σ P1308298…3402787140 P2343302…2972546989 ………………… P(K-1)280288…3012896860 PKPK312322…2972777045
30
tournament play P1P2…P(K-1)PKΣ P1308298…3402787140 P2343302…2972546989 ………………… P(K-1)280288…3012896860 PK312322…2972777045 order players by total payoff change ‘weights’ (proportion of population) based on success repeat tournament until weights stabilize
31
tournament survivors final weights of players define who is successful typical distribution: 1 dominant player a few minor players most players eliminated
32
variations on fitness evaluation spatially distributed agents random players on an n x n grid each player plays 4 iterated games against neighbours and computes total payoff each player compares total payoff with 4 neighbours; replaced by neighbour with best payoff
33
spatial distribution
34
defining agents, e.g., prisoner’s dilemma algorithms from experts Axelrod 1978 exhaustive variations on a model e.g. react to previous choices of both players: need choices for: first move after (C,C) // (self, opponent) after (C,D) after (D,C) after (D,D) 5 bit representation of strategy 32 possible strategies, 32 players in tournament tit-for-tat first:C (C,C)C (C,D)D (D,C)C (D,D)D tit-for-tat first:C (C,C)C (C,D)D (D,C)C (D,D)D
35
tit-for-tat interpreting the strategy start by trusting repeat opponent’s last behaviour cooperate after cooperation punish defection don’t hold a grudge BUT… Axelrod’s payoffs became the standard tit-for-tat first:C (C,C)C (C,D)D (D,C)C (D,D)D tit-for-tat first:C (C,C)C (C,D)D (D,C)C (D,D)D P1 \ P2 CD C3,33,30,50,5 D5,05,01,11,1
36
problems with the research if payoffs are changed but the prisoner’s dilemma pattern remains, tournaments can produce different winners --> relative payoff does matter P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2 CD C3,33,30,50,5 D5,05,01,11,1 game12…i-1ii+1…n-1nΣ 13…423…43298 P1CC…DDC…DC P2DC…CDC…CC 43…123…13343
37
agent is function of more than previous strategy choice P1,n influenced byvariables previous choices - how many?2, 4, 6,..? payoffs - of both players4, 8 ? individual’s goals1 same, different measures? maximum payoff maximum difference altruism equality social utility other factors??
38
agent research (Robinson & Goforth) choice P1,n influenced by previous choices - how many payoffs - of both players repeated tournaments by sampling the space of payoffs - demonstrated that payoffs matter example of discovery: extra level of trust beyond tit-for-tat effect of players’ goals on outcomes and payoffs and classic strategies 3 x 3 games ~3 billion games -> grid computing
39
general model environment is payoff space where encounters take place multiple agents interact in the space repeated encounters where agents get payoffs agents ‘display’ their own strategies and agents ‘learn’ the strategies of others goals of research reproduce real-world behaviour by simulating strategies develop effective strategies
40
Developing effective agent strategies co-evolution bootstrapping symmetric competition games, symmetric game theory asymmetric competition asymmetric game theory cooperation common goal, social goal
41
Coevolution in symmetric competition Fogel’s checkers player evolving player strategies search space of strategies fitness is determined by game play variation by random change to parameters selection by ranking in play against others initial strategies randomly generated
42
Checkers game representation 32 element vector for the playable squares on the board (one colour) Domain of each element: {-K,-1,0,1,K} e.g., start state of game: {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,0,0,0,0,0,0,0, 1,1,1,1,1,1,1,1,1,1,1,1}
43
Game play algorithm board.initialize() while (not board.gameOver()) player1.makeMove(board) if (board.gameOver()) break player2.makeMove(board) end while board.assignPoints(player1,player2)
44
makeMove algorithm makeMove(board) { boardList = board.getNextBoards(this) bestBoardVal = -1; bestBoard = null forAll (tempBoard in boardList) boardVal = evaluate(tempBoard) if (boardVal>bestBoardVal) bestBoardVal = boardVal bestBoard = tempBoard board = bestBoard
45
evaluate algorithm evaluate(board) { function of 32 elements of board parameters are variables of search space, including K output is in range [-1,1] used for comparing boards } what to put in the evaluation?
46
Chellapilla and Fogel’s evaluation no knowledge of chess rules board generates legal moves and updates state no ‘look-ahead’ to future moves implicitly a pattern recognition system BUT only relational patterns predefined - see diagram (neural network)
47
Neural network input layeroutput layerhidden layer(s) i1i1 h 1 = w 11.i 1 +w 12.i 2 +w 13.i 3 o 1 =w 31.g 1 +w 32.g 2 +w 33.g 3 g1g1
48
Neural network gent 32 inputs - board state 1st hidden layer: 91 nodes 3x3 squares (36), 4x4 (25), 5x5 (16), 6x6 (9), 7x7 (4), 8x8 (1) 2nd hidden layer: 40 nodes 3rd hidden layer: 10 nodes 1 output - evaluation of board [-1, 1] parameters - weights on neural network plus K (value of king)
49
Genetic algorithm population: 15 player agents one child agent from each parent: variation by random changes in K and weights fitness based on score in tournament among 30 agents (win: 1, loss: -2, draw: 0) agents with best 15 scores selected for next generation after 840 generations, effective checkers player in competition against humans
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.