Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Slides:



Advertisements
Similar presentations
Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Advertisements

Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
6-1 LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to MultiAgent Systems
An Introduction to... Evolutionary Game Theory
February 7, 2006AI: Chapter 6: Adversarial Search1 Artificial Intelligence Chapter 6: Adversarial Search Michael Scherger Department of Computer Science.
Artificial Intelligence Adversarial search Fall 2008 professor: Luigi Ceccaroni.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.
Adversarial Search Chapter 5.
Agents the AI metaphor. D Goforth - COSC 4117, fall The agent model  agents include all aspects of AI in one object-oriented organizing model:
10/19/2004TCSS435A Isabelle Bichindaritz1 Game and Tree Searching.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Games What is ‘Game Theory’? There are several tools and techniques used by applied modelers to generate testable hypotheses Modeling techniques widely.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
5/16/20151 Game Theory Game theory was developed by John Von Neumann and Oscar Morgenstern in Economists! One of the fundamental principles of.
EC – Tutorial / Case study Iterated Prisoner's Dilemma Ata Kaban University of Birmingham.
Slide 1 of 13 So... What’s Game Theory? Game theory refers to a branch of applied math that deals with the strategic interactions between various ‘agents’,
GENETIC PROGRAMMING FOR CHECKERS PLAY Dustin Shutes Abstract: This project uses genetic programming to develop a program that plays the game of checkers.
This time: Outline Game playing The minimax algorithm
Game Playing CSC361 AI CSC361: Game Playing.
Game-Playing Read Chapter 6 Adversarial Search. Game Types Two-person games vs multi-person –chess vs monopoly Perfect Information vs Imperfect –checkers.
APEC 8205: Applied Game Theory Fall 2007
Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.
Introduction to Game Theory Yale Braunstein Spring 2007.
Othello Sean Farrell June 29, Othello Two-player game played on 8x8 board All pieces have one white side and one black side Initial board setup.
Game Theory Statistics 802. Lecture Agenda Overview of games 2 player games representations 2 player zero-sum games Render/Stair/Hanna text CD QM for.
Game Playing: Adversarial Search Chapter 6. Why study games Fun Clear criteria for success Interesting, hard problems which require minimal “initial structure”
CSC 412: AI Adversarial Search
PSU CS 370 – Introduction to Artificial Intelligence Game MinMax Alpha-Beta.
Game Playing. Introduction Why is game playing so interesting from an AI point of view? –Game Playing is harder then common searching The search space.
Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.
The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.
Game Playing.
Games as Systems of Challenge, Competition and Conflict goal states and obstacles to reaching them.
林偉楷 Taiwan Evolutionary Intelligence Laboratory.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Agents that can play multi-player games. Recall: Single-player, fully-observable, deterministic game agents An agent that plays Peg Solitaire involves.
Standard and Extended Form Games A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor, SIUC.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
What is Genetic Programming? Genetic programming is a model of programming which uses the ideas (and some of the terminology) of biological evolution to.
Presenter: Chih-Yuan Chou GA-BASED ALGORITHMS FOR FINDING EQUILIBRIUM 1.
Design of an Evolutionary Algorithm M&F, ch. 7 why I like this textbook and what I don’t like about it!
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Machine Learning for an Artificial Intelligence Playing Tic-Tac-Toe Computer Systems Lab 2005 By Rachel Miller.
Evolution Programs (insert catchy subtitle here).
Blondie24 Presented by Adam Duffy and Josh Hill. Overview Introduction to new concepts Design of Blondie24 Testing and results Other approaches to checkers.
Evolutionary Programming
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
Games as adversarial search problems Dynamic state space search.
Section 2 – Ec1818 Jeremy Barofsky
1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.
1. Genetic Algorithms: An Overview  Objectives - Studying basic principle of GA - Understanding applications in prisoner’s dilemma & sorting network.
Game tree search Thanks to Andrew Moore and Faheim Bacchus for slides!
Iterated Prisoner’s Dilemma Game in Evolutionary Computation Seung-Ryong Yang.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Statistics Overview of games 2 player games representations 2 player zero-sum games Render/Stair/Hanna text CD QM for Windows software Modeling.
The Standard Genetic Algorithm Start with a “population” of “individuals” Rank these individuals according to their “fitness” Select pairs of individuals.
Evolving Strategies for the Prisoner’s Dilemma Jennifer Golbeck University of Maryland, College Park Department of Computer Science July 23, 2002.
Adversarial Search and Game Playing Russell and Norvig: Chapter 6 Slides adapted from: robotics.stanford.edu/~latombe/cs121/2004/home.htm Prof: Dekang.
Evolutionary Programming A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Chapter 5.
A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Evolutionary Programming.
Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.
1 Fahiem Bacchus, University of Toronto CSC384: Intro to Artificial Intelligence Game Tree Search I ● Readings ■ Chapter 6.1, 6.2, 6.3, 6.6 ● Announcements:
March 1, 2016Introduction to Artificial Intelligence Lecture 11: Machine Evolution 1 Let’s look at… Machine Evolution.
Adversarial Search and Game-Playing
Evolutionary Programming
PENGANTAR INTELIJENSIA BUATAN (64A614)
Introduction to Artificial Intelligence Lecture 11: Machine Evolution
Evolutionary Programming
Boltzmann Machine (BM) (§6.4)
Unit II Game Playing.
Presentation transcript:

Co-evolution time, changing environments agents

Static search space  solutions are determined by the optimization process only, by doing variation and selection  evaluation determines fitness of each solution BUT…

Dynamic search space  what happens if state of environment changes between actions of the optimization process AND / OR other processes are concurrently trying to change the state also

Agents to act in changing environments  the search is now for an agent strategy to act successfully in a changing environment example: agent is defined with an algorithm based on a set of parameters ‘learning’ means searching for parameter settings to make the algorithm successful in the environment agent algorithm takes current state of environment as input and outputs an action to change the environment

Optimizing an agent  procedural optimization recall7.1.3 finite state machines (recognize a string) and symbolic expressions (control operation of a cart) which are functions

The environment of activity  parameter space, e.g. game nodes are game states edges are legal moves that transform one state to another example: tic-tac-toe, rock-scissors-paper

Agent in the environment procedure to input current state and determine an action to alter the state  evaluated according to its success at achieving goals in the environment external goal - environment in desirable state (win game) internal goal - maintain internal state (survive)

Models of agent implementation 1.look-up table: state / action 2.deterministic algorithm hard-coded intelligence ideal if perfect strategy is known, otherwise,  parameterized algorithm ‘tune’ the parameters to optimize performance 3.neural network

Optimizing agent procedure initialize agent parameters evaluate agent success in environment repeat modify parameters evaluate modified agent success if improved success, retain modified parameters

Environment features 1.other agents? 2.state changes independent of agent(s)? 3.randomness? 4.discrete or continuous? 5.sequential or parallel?

Environment features game examples 1.other agents? solitaire, two-person, multi-player

Environment features game examples 2.state changes independent of agent(s)? YES simulators (weather, resources) NO board games

Environment features game examples 3.randomness? NO chess, sudoku(!) YES dice, card games

Environment features game examples 4.discrete or continuous? video games board games

Environment features game examples 5.sequential or parallel? turn-taking, simultaneous which sports? marathon, javelin, hockey, tennis

Models of agent optimization  evolving agent vs environment  evolving agent vs skilled player  co-evolving agents competing as equals  e.g., playing a symmetric game competing as non-equals (predator - prey)  e.g., playing an asymmetric game co-operative

Example: Game Theory invented by John von Neumann  models concurrent* interaction between agents each player has a set of choices players concurrently make a choice each player receives a payoff based on the outcome of play: the vector of choices e.g. paper-scissors-rock  2 players, with 3 choices each  3 outcomes: win, lose, draw *there are sequential Game Theory models also

Payoff matrix  tabulates payoffs for all outcomes e.g. paper-scissors-rock P1 \ P2 paperscissorsrock paper(draw,draw)(lose,win)(win,lose) scissors(win,lose)(draw,draw)(lose,win) rock(lose,win)(win,lose)(draw,draw)

Two-player two-choice ordinal games (2 x 2 games)  Four outcomes  Each player has four ordered payoffs example game: P1 \ P2 LR U1,23,1 D2,44,3

2 x 2 games  144 distinct games based on relation of payoffs to outcomes: (4! x 4!) / (2 x 2)  model many real world encounters  environment for studying agent strategies how do players decide what choice to make? P1 \ P2 LR U1,23,1 D2,44,3

2 x 2 games player strategy:  minimax - minimize losses pick row / column with largest minimum assumes nothing about other player P1 \ P2 LR U1,21,23,13,1 D2,42,44,34,3

2 x 2 games player strategies:  dominant strategy preferred choice, whatever opponent does does not determine a choice in all games P1 \ P2 LR U1,31,34,24,2 D3,43,42,12,1 LR U1,21,24,34,3 D3,43,42,12,1 LR U2,12,11,21,2 D4,34,33,43,4

2 x 2 games some games are dilemmas “prisoner’s dilemma” C = ‘cooperate’, D = ‘defect’ P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2

2 x 2 games - playing strategies  what happens if players use different strategies?  how do/should players decide in difficult games?  modeling players and evaluating in games P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2

iterated games  can players analyze games after playing and determine a better strategy for next encounter?  iterated play e.g., prisoner’s dilemma: can players learn to trust each other and get better payoffs? P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2

iterated prisoner’s dilemma  player strategy for choice as a function of previous choices and outcomes P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2 game12…i-1ii+1… P1CC…DD? P2DC…CD? P1: choice i+1,P1 = f( choice i,P1, choice i,P2 )

iterated prisoner’s dilemma  evaluating strategies based on payoffs in iterated play P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2 P1: choice n+1,P1 = f( choice n,P1, choice n,P2 ) game12…i-1ii+1…n-1nΣ 13…423…43298 P1CC…DDC…DC P2DC…CDC…CC 43…123…13343

tournament play  set of K strategies for a game, e.g., iterated prisoner’s dilemma  evaluate strategies by performance against all other strategies  round-robin tournament

example of tournament play P1P2…P(K-1)PKPK Σ P … P … ………………… P(K-1)280288… PKPK312322…

tournament play P1P2…P(K-1)PKΣ P … P … ………………… P(K-1)280288… PK312322… order players by total payoff change ‘weights’ (proportion of population) based on success repeat tournament until weights stabilize

tournament survivors final weights of players define who is successful typical distribution: 1 dominant player a few minor players most players eliminated

variations on fitness evaluation  spatially distributed agents random players on an n x n grid each player plays 4 iterated games against neighbours and computes total payoff each player compares total payoff with 4 neighbours; replaced by neighbour with best payoff

spatial distribution

defining agents, e.g., prisoner’s dilemma  algorithms from experts Axelrod 1978  exhaustive variations on a model e.g. react to previous choices of both players: need choices for: first move after (C,C) // (self, opponent) after (C,D) after (D,C) after (D,D)  5 bit representation of strategy 32 possible strategies, 32 players in tournament tit-for-tat first:C (C,C)C (C,D)D (D,C)C (D,D)D tit-for-tat first:C (C,C)C (C,D)D (D,C)C (D,D)D

tit-for-tat  interpreting the strategy start by trusting repeat opponent’s last behaviour  cooperate after cooperation  punish defection don’t hold a grudge BUT… Axelrod’s payoffs became the standard tit-for-tat first:C (C,C)C (C,D)D (D,C)C (D,D)D tit-for-tat first:C (C,C)C (C,D)D (D,C)C (D,D)D P1 \ P2 CD C3,33,30,50,5 D5,05,01,11,1

problems with the research  if payoffs are changed but the prisoner’s dilemma pattern remains, tournaments can produce different winners --> relative payoff does matter P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2 CD C3,33,30,50,5 D5,05,01,11,1 game12…i-1ii+1…n-1nΣ 13…423…43298 P1CC…DDC…DC P2DC…CDC…CC 43…123…13343

agent is function of more than previous strategy  choice P1,n influenced byvariables previous choices - how many?2, 4, 6,..? payoffs - of both players4, 8 ? individual’s goals1 same, different measures?  maximum payoff  maximum difference  altruism  equality  social utility other factors??

agent research (Robinson & Goforth)  choice P1,n influenced by previous choices - how many payoffs - of both players repeated tournaments by sampling the space of payoffs - demonstrated that payoffs matter example of discovery: extra level of trust beyond tit-for-tat  effect of players’ goals on outcomes and payoffs and classic strategies 3 x 3 games ~3 billion games -> grid computing

general model  environment is payoff space where encounters take place  multiple agents interact in the space repeated encounters where  agents get payoffs  agents ‘display’ their own strategies and  agents ‘learn’ the strategies of others  goals of research reproduce real-world behaviour by simulating strategies develop effective strategies

Developing effective agent strategies  co-evolution bootstrapping symmetric competition  games, symmetric game theory asymmetric competition  asymmetric game theory cooperation  common goal, social goal

Coevolution in symmetric competition  Fogel’s checkers player evolving player strategies search space of strategies  fitness is determined by game play  variation by random change to parameters  selection by ranking in play against others  initial strategies randomly generated

Checkers game representation  32 element vector for the playable squares on the board (one colour)  Domain of each element: {-K,-1,0,1,K} e.g., start state of game: {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,0,0,0,0,0,0,0, 1,1,1,1,1,1,1,1,1,1,1,1}

Game play algorithm board.initialize() while (not board.gameOver()) player1.makeMove(board) if (board.gameOver()) break player2.makeMove(board) end while board.assignPoints(player1,player2)

makeMove algorithm makeMove(board) { boardList = board.getNextBoards(this) bestBoardVal = -1; bestBoard = null forAll (tempBoard in boardList) boardVal = evaluate(tempBoard) if (boardVal>bestBoardVal) bestBoardVal = boardVal bestBoard = tempBoard board = bestBoard

evaluate algorithm evaluate(board) {  function of 32 elements of board  parameters are variables of search space, including K  output is in range [-1,1] used for comparing boards } what to put in the evaluation?

Chellapilla and Fogel’s evaluation  no knowledge of chess rules board generates legal moves and updates state no ‘look-ahead’ to future moves implicitly a pattern recognition system BUT only relational patterns predefined - see diagram (neural network)

Neural network input layeroutput layerhidden layer(s) i1i1 h 1 = w 11.i 1 +w 12.i 2 +w 13.i 3 o 1 =w 31.g 1 +w 32.g 2 +w 33.g 3 g1g1

Neural network gent  32 inputs - board state  1st hidden layer: 91 nodes 3x3 squares (36), 4x4 (25), 5x5 (16), 6x6 (9), 7x7 (4), 8x8 (1)  2nd hidden layer: 40 nodes  3rd hidden layer: 10 nodes  1 output - evaluation of board [-1, 1]  parameters - weights on neural network plus K (value of king)

Genetic algorithm  population: 15 player agents  one child agent from each parent: variation by random changes in K and weights  fitness based on score in tournament among 30 agents (win: 1, loss: -2, draw: 0)  agents with best 15 scores selected for next generation after 840 generations, effective checkers player in competition against humans