Repeated Games Examples of Repeated Prisoner’s Dilemma Overfishing Transboundary pollution Cartel enforcement Labor union Public goods The Tragedy of the.

Slides:

Advertisements

Similar presentations

The Basics of Game Theory

Advertisements

Infinitely Repeated Games

Evolution and Repeated Games D. Fudenberg (Harvard) E. Maskin (IAS, Princeton)

Game Theory “Доверяй, Но Проверяй” (“Trust, but Verify”) - Russian Proverb (Ronald Reagan) Topic 5 Repeated Games.

Evolution of Cooperation The importance of being suspicious.

1 Evolution & Economics No Evolutionary Stability in Repeated Games Played by Finite Automata K. Binmore & L. Samuelson J.E.T Automata.

Infinitely Repeated Games. In an infinitely repeated game, the application of subgame perfection is different - after any possible history, the continuation.

Non-Cooperative Game Theory To define a game, you need to know three things: –The set of players –The strategy sets of the players (i.e., the actions they.

6-1 LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to MultiAgent Systems

The basics of Game Theory Understanding strategic behaviour.

M9302 Mathematical Models in Economics Instructor: Georgi Burlakov 2.5.Repeated Games Lecture

Infinitely Repeated Games Econ 171. Finitely Repeated Game Take any game play it, then play it again, for a specified number of times. The game that is.

EC941 - Game Theory Lecture 7 Prof. Francesco Squintani

Game Theory. Games Oligopolist Play ▫Each oligopolist realizes both that its profit depends on what its competitor does and that its competitor’s profit.

What is a game?. Game: a contest between players with rules to determine a winner. Strategy: a long term plan of action designed to achieve a particular.

Coye Cheshire & Andrew Fiore March 21, 2012 // Computer-Mediated Communication Collective Action and CMC: Game Theory Approaches and Applications.

Game Theory Lecture 8.

Prisoner’s dilemma TEMPTATION>REWARD>PUNISHMENT>SUCKER.

Games People Play. 8: The Prisoners’ Dilemma and repeated games In this section we shall learn How repeated play of a game opens up many new strategic.

EC – Tutorial / Case study Iterated Prisoner's Dilemma Ata Kaban University of Birmingham.

Institutions and the Evolution of Collective Action Mark Lubell UC Davis.

Story time! Robert Axelrod. Contest #1 Call for entries to game theorists All entrants told of preliminary experiments 15 strategies = 14 entries + 1.

Evolving Game Playing Strategies (4.4.3) Darren Gerling Jason Gerling Jared Hopf Colleen Wtorek.

A Memetic Framework for Describing and Simulating Spatial Prisoner’s Dilemma with Coalition Formation Sneak Review by Udara Weerakoon.

Yale Lectures 21 and Repeated Games: Cooperation vs the End Game.

Unit III: The Evolution of Cooperation Can Selfishness Save the Environment? Repeated Games: the Folk Theorem Evolutionary Games A Tournament How to Promote.

Unit III: The Evolution of Cooperation Can Selfishness Save the Environment? Repeated Games: the Folk Theorem Evolutionary Games A Tournament How to Promote.

6/2/2001 Cooperative Agent Systems: Artificial Agents Play the Ultimatum Game Steven O. Kimbrough Presented at FMEC 2001, Oslo Joint work with Fang Zhong.

Evolutionary Games The solution concepts that we have discussed in some detail include strategically dominant solutions equilibrium solutions Pareto optimal.

Unit IV: Thinking about Thinking Choice and Consequence Fair Play Learning to Cooperate Summary and Conclusions 4/23.

Unit III: The Evolution of Cooperation Can Selfishness Save the Environment? Repeated Games: the Folk Theorem Evolutionary Games A Tournament How to Promote.

APEC 8205: Applied Game Theory Fall 2007

Repeated games - example This stage game is played 2 times Any SPNE where players behave differently than in a 1-time game? Player 2 LMR L1, 10, 05, 0.

Unit III: The Evolution of Cooperation

UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.

UNIT III: MONOPOLY & OLIGOPOLY Monopoly Oligopoly Strategic Competition 7/20.

UNIT III: COMPETITIVE STRATEGY Monopoly Oligopoly Strategic Behavior 7/21.

QR 38 3/15/07, Repeated Games I I.The PD II.Infinitely repeated PD III.Patterns of cooperation.

On Bounded Rationality and Computational Complexity Christos Papadimitriou and Mihallis Yannakakis.

Evolutionary Games The solution concepts that we have discussed in some detail include strategically dominant solutions equilibrium solutions Pareto optimal.

UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.

Unit III: The Evolution of Cooperation Can Selfishness Save the Environment? Repeated Games: the Folk Theorem Evolutionary Games A Tournament How to Promote.

Recent Research Studies on Cooperative Games In the recent years since around 2003 I have been doing research following up on a specific idea for the finding.

Unit III: The Evolution of Cooperation Can Selfishness Save the Environment? Repeated Games: the Folk Theorem Evolutionary Games A Tournament How to Promote.

Social Choice Session 7 Carmen Pasca and John Hey.

Chapter 12 Choices Involving Strategy Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written.

A Game-Theoretic Approach to Strategic Behavior. Chapter Outline ©2015 McGraw-Hill Education. All Rights Reserved. 2 The Prisoner’s Dilemma: An Introduction.

Agenda, Day 2  Questions about syllabus? About myths?  Prisoner’s dilemma  Prisoner’s dilemma vs negotiation  Play a single round  Play multiple rounds.

Learning in Multiagent systems

Unit III: The Evolution of Cooperation Can Selfishness Save the Environment? Repeated Games: the Folk Theorem Evolutionary Games A Tournament How to Promote.

Dynamic Games of complete information: Backward Induction and Subgame perfection - Repeated Games -

Standard and Extended Form Games A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor, SIUC.

Presenter: Chih-Yuan Chou GA-BASED ALGORITHMS FOR FINDING EQUILIBRIUM 1.

Dynamic Games & The Extensive Form

Evolving cooperation in one-time interactions with strangers Tags produce cooperation in the single round prisoner’s dilemma and it’s.

Section 2 – Ec1818 Jeremy Barofsky

Robert Axelrod’s Tournaments Robert Axelrod’s Tournaments, as reported in Axelrod, Robert. 1980a. “Effective Choice in the Prisoner’s Dilemma.” Journal.

UNIT III: MONOPOLY & OLIGOPOLY Monopoly Oligopoly Strategic Competition 7/30.

Game Theory by James Crissey Luis Mendez James Reid.

Replicator Dynamics. Nash makes sense (arguably) if… -Uber-rational -Calculating.

Evolving Strategies for the Prisoner’s Dilemma Jennifer Golbeck University of Maryland, College Park Department of Computer Science July 23, 2002.

Ec1818 Economics of Discontinuous Change Section 1 [Lectures 1-4] Wei Huang Harvard University (Preliminary and subject to revisions)

Indirect Reciprocity in the Selective Play Environment Nobuyuki Takahashi and Rie Mashima Department of Behavioral Science Hokkaido University 08/07/2003.

Computer-Mediated Communication

Multiagent Systems Repeated Games © Manfred Huber 2018.

Game Theory Fall Mike Shor Topic 5.

Chapter 14 & 15 Repeated Games.

Chapter 14 & 15 Repeated Games.

Collaboration in Repeated Games

Game Theory Spring Mike Shor Topic 5.

Presentation transcript:

Repeated Games Examples of Repeated Prisoner’s Dilemma Overfishing Transboundary pollution Cartel enforcement Labor union Public goods The Tragedy of the Global Commons Free-rider Problems

Repeated Games Some Questions: What happens when a game is repeated? Can threats and promises about the future influence behavior in the present? Cheap talk Finitely repeated games: Backward induction Indefinitely repeated games: Trigger strategies

Can threats and promises about future actions influence behavior in the present? Consider the following game, played 2X: C 3,3 0,5 D 5,0 1,1 Repeated Games C D See Gibbons:

Repeated Games Draw the extensive form game: (3,3) (0,5)(5,0) (1,1) (6,6) (3,8) (8,3) (4,4) (3,8)(0,10)(5,5)(1,6)(8,3) (5,5)(10,0) (6,1) (4,4) (1,6) (6,1) (2,2)

Repeated Games Now, consider three repeated game strategies: D (ALWAYS DEFECT): Defect on every move. C(ALWAYS COOPERATE):Cooperate on every move. T(TRIGGER): Cooperate on the first move, then cooperate after the other cooperates. If the other defects, then defect forever.

Repeated Games If the game is played twice, the V(alue) to a player using ALWAYS DEFECT (D) against an opponent using ALWAYS DEFECT(D) is: V (D/D) = = 2, and so on... V (C/C) =3 + 3 =6 V (T/T)=3 + 3 = 6 V (D/C)=5 + 5 =10 V (D/T)=5 + 1 = 6 V (C/D)=0 + 0 =0 V (C/T)=3 + 3 =6 V (T/D)=0 + 1 =1 V (T/C)=3 + 3 =6

Repeated Games And 3x: V (D/D) = = 3 V (C/C) = = 9 V (T/T)= = 9 V (D/C)= =15 V (D/T)= = 7 V (C/D)= =0 V (C/T)= = 9 V (T/D)= =2 V (T/C)= = 9

Repeated Games Time average payoffs: n=3 V (D/D) = = 3 /3= 1 V (C/C) = = 9/3= 3 V (T/T)= = 9/3= 3 V (D/C)= =15/3= 5 V (D/T)= = 7/3= 7/3 V (C/D)= =0/3= 0 V (C/T)= = 9/3= 3 V (T/D)= =2/3 = 2/3 V (T/C)= = 9/3= 3

Repeated Games Time average payoffs: n V (D/D) = /n= 1 V (C/C) = /n= 3 V (T/T)= /n= 3 V (D/C)= /n= 5 V (D/T)= /n= 1 +  V (C/D)= /n= 0 V (C/T)= … /n= 3 V (T/D)= /n = 1 -  V (T/C)= /n= 3

Repeated Games Now draw the matrix form of this game: 1x T3,3 0,5 3,3 C 3,3 0,53,3 D 5,0 1,15,0 C D T

Repeated Games T 3,3 1-  1+  3,3 C 3,3 0,5 3,3 D 5,0 1,1 1+ ,1-  C D T If the game is repeated, ALWAYS DEFECT is no longer dominant. Time Average Payoffs

Repeated Games T 3,3 1-  1+  3,3 C 3,3 0,5 3,3 D 5,0 1,1 1+ ,1-  C D T … and TRIGGER achieves “a NE with itself.”

Repeated Games Time Average Payoffs T(emptation)> R(eward)> P(unishment)> S(ucker) T R,R P-  P +  R,R C R,R S,T R,R D T,S P,P P + , P -  C D T

Discounting The discount parameter, , is the weight of the next payoff relative to the current payoff. In a indefinitely repeated game,  can also be interpreted as the likelihood of the game continuing for another round (so that the expected number of moves per game is 1/(1-  )). The V(alue) to someone using ALWAYS DEFECT (D) when playing with someone using TRIGGER (T) is the sum of T for the first move,  P for the second,  2 P for the third, and so on (Axelrod: 13-4): V (D/T) = T +  P +  2 P + … “The Shadow of the Future”

Discounting Writing this as V (D/T) = T +  P +   2 P +..., we have the following: V (D/D) = P +  P +  2 P + … = P/(1-  ) V (C/C) =R +  R +  2 R + … = R/(1-  ) V (T/T)=R +  R +  2 R + … = R/(1-  ) V (D/C)=T +  T +  2 T + … = T/(1-  ) V (D/T)=T +  P +  2 P + … = T+  P/(1-  ) V (C/D)=S +  S +  2 S + … = S/(1-  ) V (C/T)=R +  R +  2 R + … = R/(1-  ) V (T/D)=S +  P +  2 P + … = S+  P/(1-  ) V (T/C)=R +  R +  2 R + … = R/(1-  )

T C D Discounted Payoffs T > R > P > S 0 >  > 1 R /(1-  ) S /(1-  ) R /(1-  ) R /(1-  ) T /(1-  ) R /(1-  ) T /(1-  ) P /(1-  ) T +  P /(1-  ) S /(1-  ) P /(1-  ) S +  P /(1-  ) Discounting C D T R /(1-  ) S +  P /(1-  ) R /(1-  ) R /(1-  ) T +  P /(1-  ) R /(1-  )

T C D Discounted Payoffs T > R > P > S 0 >  > 1 T weakly dominates C R /(1-  ) S /(1-  ) R /(1-  ) R /(1-  ) T /(1-  ) R /(1-  ) T /(1-  ) P /(1-  ) T +  P /(1-  ) S /(1-  ) P /(1-  ) S +  P /(1-  ) Discounting C D T R /(1-  ) S +  P /(1-  ) R /(1-  ) R /(1-  ) T +  P /(1-  ) R /(1-  )

Discounting Now consider what happens to these values as  varies (from 0-1): V (D/D) = P +  P +  2 P + … = P/(1-  ) V (C/C) =R +  R +  2 R + … = R/(1-  ) V (T/T)=R +  R +  2 R + … = R/(1-  ) V (D/C)=T +  T +  2 T + … = T/(1-  ) V (D/T)=T +  P +  2 P + … = T+  P/(1-  ) V (C/D)=S +  S +  2 S + … = S/(1-  ) V (C/T)=R +  R +  2 R + … = R/(1-  ) V (T/D)=S +  P +  2 P + … = S+  P/(1-  ) V (T/C)=R +  R +  2 R + … = R/(1-  )

Discounting Now consider what happens to these values as  varies (from 0-1): V (D/D) = P +  P +  2 P + … = P/(1-  ) V (C/C) =R +  R +  2 R + … = R/(1-  ) V (T/T)=R +  R +  2 R + … = R/(1-  ) V (D/C)=T +  T +  2 T + … = T/(1-  ) V (D/T)=T +  P +  2 P + … = T+  P/(1-  ) V (C/D)=S +  S +  2 S + … = S/(1-  ) V (C/T)=R +  R +  2 R + … = R/(1-  ) V (T/D)=S +  P +  2 P + … = S+  P/(1-  ) V (T/C)=R +  R +  2 R + … = R/(1-  )

Discounting Now consider what happens to these values as  varies (from 0-1): V (D/D) = P +  P +  2 P + … = P+  P/(1-  ) V (C/C) =R +  R +  2 R + … = R/(1-  ) V (T/T)=R +  R +  2 R + … = R/(1-  ) V (D/C)=T +  T +  2 T + … = T/(1-  ) V (D/T)=T +  P +  2 P + … = T+  P/(1-  ) V (C/D)=S +  S +  2 S + … = S/(1-  ) V (C /T) = R +  R +  2 R + … = R/(1-  ) V (T/D)=S +  P +  2 P + … = S+  P/(1-  ) V (T/C)=R +  R +  2 R + … = R/(1-  ) V(D/D) > V(T/D) D is a best response to D

Discounting Now consider what happens to these values as  varies (from 0-1): V (D/D) = P +  P +  2 P + … = P+  P/(1-  ) V (C/C) =R +  R +  2 R + … = R/(1-  ) V (T/T)=R +  R +  2 R + … = R/(1-  ) V (D/C)=T +  T +  2 T + … = T/(1-  ) V (D/T)=T +  P +  2 P + … = T+  P/(1-  ) V (C/D)=S +  S +  2 S + … = S/(1-  ) V (C/T)=R +  R +  2 R + … = R/(1-  ) V (T/D)=S +  P +  2 P + … = S+  P/(1-  ) V (T/C)=R +  R +  2 R + … = R/(1-  ) ?

Discounting Now consider what happens to these values as  varies (from 0-1): For all values of  : V(D/T) > V(D/D) > V(T/D) V(T/T) > V(D/D) > V(T/D) Is there a value of  s.t., V(D/T) = V(T/T)? Call this  *. If  <  *, the following ordering hold: V(D/T) > V(T/T) > V(D/D) > V(T/D) D is dominant: GAME SOLVED V(D/T) = V(T/T) T+  P(1-  ) = R/(1-  ) T-  t+  P = R T-R =  (T-P)   * = (T-R)/(T-P) ?

Discounting Now consider what happens to these values as  varies (from 0-1): For all values of  : V(D/T) > V(D/D) > V(T/D) V(T/T) > V(D/D) > V(T/D) Is there a value of  s.t., V(D/T) = V(T/T)? Call this  *.  * = (T-R)/(T-P) If  >  *, the following ordering hold: V(T/T) > V(D/T) > V(D/D) > V(T/D) D is a best response to D; T is a best response to T; multiple NE.

Discounting V(T/T) = R/(1-  )  * 1 V TRV TR Graphically: The V(alue) to a player using ALWAYS DEFECT (D) against TRIGGER (T), and the V(T/T) as a function of the discount parameter (  ) V(D/T) = T +  P/(1-  )

The Folk Theorem (R,R) (T,S) (S,T) (P,P) The payoff set of the repeated PD is the convex closure of the points [( T,S ); ( R,R ); ( S,T ); ( P,P )].

The Folk Theorem (R,R) (T,S) (S,T) (P,P) The shaded area is the set of payoffs that Pareto-dominate the one-shot NE ( P,P ).

The Folk Theorem (R,R) (T,S) (S,T) (P,P) Theorem: Any payoff that pareto- dominates the one-shot NE can be supported in a SPNE of the repeated game, if the discount parameter is sufficiently high.

The Folk Theorem (R,R) (T,S) (S,T) (P,P) In other words, in the repeated game, if the future matters “enough” i.e., (  >  * ), there are zillions of equilibria!

The theorem tells us that in general, repeated games give rise to a very large set of Nash equilibria. In the repeated PD, these are pareto-rankable, i.e., some are efficient and some are not. In this context, evolution can be seen as a process that selects for repeated game strategies with efficient payoffs. “Survival of the Fittest” The Folk Theorem

Thinking About Evolution Fifteen months after I had begun my systematic enquiry, I happened to read for amusement ‘Malthus on Population’... It at once struck me that... favorable variations would tend to be preserved, and unfavorable ones to be destroyed. Here then I had at last got a theory by which to work. Charles Darwin

Thinking About Evolution Biological Evolution: Under the pressure of natural selection, any population (capable of reproduction and variation) will evolve so as to become better adapted to its environment, i.e., will develop in the direction of increasing “fitness.” Economic Evolution: Firms that adopt efficient “routines” will survive, expand, and multiply; whereas others will be “weeded out” (Nelson and Winters, 1982).

The Evolution of Cooperation Under what conditions will cooperation emerge in world of egoists without central authority? Axelrod uses an experimental method – the indefinitely repeated PD tournament – to investigate a series of questions: Can a cooperative strategy gain a foothold in a population of rational egoists? Can it survive better than its uncooperative rivals? Can it resist invasion and eventually dominate the system?

The Indefinitely Repeated Prisoner’s Dilemma Tournament Axelrod (1980a,b, Journal of Conflict Resolution). A group of scholars were invited to design strategies to play indefinitely repeated prisoner’s dilemmas in a round robin tournament. Contestants submitted computer programs that select an action, Cooperate or Defect, in each round of the game, and each entry was matched against every other, itself, and a control, RANDOM. The Evolution of Cooperation

The Indefinitely Repeated Prisoner’s Dilemma Tournament Axelrod (1980a,b, Journal of Conflict Resolution). Contestants did not know the length of the games. (The first tournament lasted 200 rounds; the second varied probabilistically with an average of 151.) The first tournament had 14 entrants, including game theorists, mathematicians, psychologists, political scientists, and others. Results were published and new entrants solicited. The second tournament included 62 entrants... The Evolution of Cooperation

The Indefinitely Repeated Prisoner’s Dilemma Tournament TIT FOR TAT won both tournaments! TFT cooperates in the first round, and then does whatever the opponent did in the previous round. TFT “was the simplest of all submitted programs and it turned out to be the best!” (31). TFT was submitted by Anatol Rapoport to both tournaments, even after contestants could learn from the results of the first. The Evolution of Cooperation

The Indefinitely Repeated Prisoner’s Dilemma Tournament TIT FOR TAT won both tournaments! In addition, Axelrod provides a “theory of cooperation” based on his analysis of the repeated prisoner’s dilemma game. In particular, if the “shadow of the future” looms large, then players may have an incentive to cooperate. A cooperative strategy such as TFT is “collectively stable.” He also offers an evolutionary argument, i.e., TFT wins in an evolutionary competition in which payoffs play the role of reproductive rates. The Evolution of Cooperation

The Indefinitely Repeated Prisoner’s Dilemma Tournament This result has been so influential that “some authors use TIT FOR TAT as though it were a synonym for a self-enforcing, cooperative agreement” (Binmore, 1992, p. 433). And many have taken these results to have shown that TFT is the “best way to play” in IRPD.  While TFT won these, will it win every tournament?  Is showing that TFT is collectively stable equivalent to predicting a winner in the computer tournaments?  Is TFT evolutionarily stable? The Evolution of Cooperation

An Evolutionary Tournament Imagine a population of strategies matched in pairs to play repeated PD, where outcomes determine the number of offspring each leaves to the next generation. –In each generation, each strategy is matched against every other, itself, and RANDOM. –Between generations, the strategies reproduce, where the chance of successful reproduction (“fitness”) is determined by the payoffs (i.e., payoffs play the role of reproductive rates). Then, strategies that do better than average will grow as a share of the population and those that do worse than average will eventually die-out...

The Evolution of Cooperation An Evolutionary Tournament Imagine a population of strategies matched in pairs to play repeated PD, where outcomes determine the number of offspring each leaves to the next generation. –In each generation, each strategy is matched against every other, itself, and RANDOM. –Between generations, the strategies reproduce, where the chance of successful reproduction (“fitness”) is determined by the payoffs (i.e., payoffs play the role of reproductive rates). Then, strategies that do better than average will grow as a share of the population and those that do worse than average will eventually die-out... The Replicator Dynamic

Replicator Dynamics There is a very simple way to describe this process. Let: x(A) = the proportion of the population using strategy A in a given generation; V(A) = strategy A’s tournament score; V = the population’s average score. Then A’s population share in the next generation is: x’(A) = x(A) V(A) V

Replicator Dynamics For any finite set of strategies, the replicator dynamic will attain a fixed-point, where population shares do not change and all strategies are equally fit, i.e., V(A) = V(B), for all B. However, the dynamic described is population-specific. For instance, if the population consists entirely of naive cooperators (ALWAYS COOPERATE), then x(A) = x’(A) = 1, and the process is at a fixed-point. To be sure, the population is in equilibrium, but only in a very weak sense. For if a single D strategy were to “invade” the population, the system would be driven away from equilibrium, and C would be driven toward extinction.

Pop. Share Generations Simulating Evolution? 1(TFT) , ,12,15 13 No. = Position after 1 st Generation Source: Axelrod 1984, p. 51.

An evolutionary model includes three components: Reproduction + Selection + Variation Population of Strategies Selection Mechanism Variation Mechanism Mutation or Learning Reproduction Competition Invasion The Evolution of Cooperation Simulating Evolution

The Trouble with TIT FOR TAT TIT FOR TATis susceptible to 2 types of perturbations : Mutations: random Cs can invade TFT (TFT is not ESS), which in turn allows exploiters to gain a foothold. Noise: a “mistake” between a pair of TFTs induces CD, DC cycles (“mirroring” or “echo” effect). TIT FOR TAT never beats its opponent; it wins because it elicits reciprocal cooperation. It never exploits “naively” nice strategies. (See Poundstone: ; Casti )

Noise in the form of random errors in implementing or perceiving an action is a common problem in real-world interactions. Such misunderstandings may lead “well-intentioned” cooperators into periods of alternating or mutual defection resulting in lower tournament scores. TFT:CCCC TFT:CCCD The Trouble with TIT FOR TAT

Noise in the form of random errors in implementing or perceiving an action is a common problem in real-world interactions. Such misunderstandings may lead “well-intentioned” cooperators into periods of alternating or mutual defection resulting in lower tournament scores. TFT:CCCC TFT:CCCD “mistake” The Trouble with TIT FOR TAT

Noise in the form of random errors in implementing or perceiving an action is a common problem in real-world interactions. Such misunderstandings may lead “well-intentioned” cooperators into periods of alternating or mutual defection resulting in lower tournament scores. TFT:CCCCDCD …. TFT:CCCDCDC …. “mistake” The Trouble with TIT FOR TAT

Noise in the form of random errors in implementing or perceiving an action is a common problem in real-world interactions. Such misunderstandings may lead “well-intentioned” cooperators into periods of alternating or mutual defection resulting in lower tournament scores. TFT:CCCCDCD …. TFT:CCCDCDC …. “mistake” The Trouble with TIT FOR TAT

Noise in the form of random errors in implementing or perceiving an action is a common problem in real-world interactions. Such misunderstandings may lead “well-intentioned” cooperators into periods of alternating or mutual defection resulting in lower tournament scores. TFT:CCCCDCD …. TFT:CCCDCDC …. “mistake” The Trouble with TIT FOR TAT

Noise in the form of random errors in implementing or perceiving an action is a common problem in real-world interactions. Such misunderstandings may lead “well-intentioned” cooperators into periods of alternating or mutual defection resulting in lower tournament scores. TFT:CCCCDCD …. TFT:CCCDCDC …. “mistake” Avg Payoff = R (T+S)/2 The Trouble with TIT FOR TAT

Nowak and Sigmund (1993) ran an extensive series of computer- based experiments and found the simple learning rule PAVLOV outperformed TIT FOR TAT in the presence of noise. PAVLOV (win-stay, lose-switch) Cooperate after both cooperated or both defected; otherwise defect. The Trouble with TIT FOR TAT

PAVLOV cannot be invaded by random C; PAVLOV is an exploiter (will “fleece a sucker” once it discovers no need to fear retaliation). A mistake between a pair of PAVLOVs causes only a single round of mutual defection followed by a return to mutual cooperation. PAV:CCCCD PAV:CCCDD “mistake” The Trouble with TIT FOR TAT

PAVLOV cannot be invaded by random C; PAVLOV is an exploiter (will “fleece a sucker” once it discovers no need to fear retaliation). A mistake between a pair of PAVLOVs causes only a single round of mutual defection followed by a return to mutual cooperation. PAV:CCCCDCC PAV:CCCDDCC “mistake” The Trouble with TIT FOR TAT

PAVLOV cannot be invaded by random C; PAVLOV is an exploiter (will “fleece a sucker” once it discovers no need to fear retaliation). A mistake between a pair of PAVLOVs causes only a single round of mutual defection followed by a return to mutual cooperation. PAV:CCCCDCC PAV:CCCDDCC “mistake” The Trouble with TIT FOR TAT

PAVLOV cannot be invaded by random C; PAVLOV is an exploiter (will “fleece a sucker” once it discovers no need to fear retaliation). A mistake between a pair of PAVLOVs causes only a single round of mutual defection followed by a return to mutual cooperation. PAV:CCCCDCC PAV:CCCDDCC “mistake” The Trouble with TIT FOR TAT

Designing Repeated Game Strategies Imagine a very simple decision making machine playing a repeated game. The machine has very little information at the start of the game: no knowledge of the payoffs or “priors” over the opponent’s behavior. It merely makes a choice, receives a payoff, then adapts its behavior, and so on. The machine, though very simple, is able to implement a strategy against any possible opponent, i.e., it “knows what to do” in any possible situation of the game.

Designing Repeated Game Strategies A repeated game strategy is a map from a history to an action. A history is all the actions in the game thus far …. … T -3 T -2 T -1 T o CCCCDCC CCCDDCC History at time T ?

Designing Repeated Game Strategies A repeated game strategy is a map from a history to an action. A history is all the actions in the game thus far …. … T -3 T -2 T -1 T o CCCCDCC CCCDDCD History at time T o ?

Designing Repeated Game Strategies A repeated game strategy is a map from a history to an action. A history is all the actions in the game thus far, subject to the constraint of a finite memory: … T -3 T -2 T -1 T o CCCCDCC CCCDDCC History of memory-4 ?

Designing Repeated Game Strategies TIT FOR TAT is a remarkably simple repeated game strategy. It merely requires recall of what happened in the last round (memory-1). … T -3 T -2 T -1 T o CCCCDDC CCCDDCD History of memory-1 ?

Finite Automata A FINITE AUTOMATON (FA) is a mathematical representation of a simple decision-making process. FA are completely described by: A finite set of internal states An initial state An output function A transition function The output function determines an action, C or D, in each state. The transition function determines how the FA changes states in response to the inputs it receives (e.g., actions of other FA). Rubinstein, “Finite Automata Play the Repeated PD” JET, 1986)

FA will implement a strategy against any possible opponent, i.e., they “know what to do” in any possible situation of the game. FA meet in 2-player repeated games and make a move in each round (either C or D). Depending upon the outcome of that round, they “decide” what to play on the next round, and so on. FA are very simple, have no knowledge of the payoffs or priors over the opponent’s behavior, and no deductive ability. They simply read and react to what happens. Nonetheless, they are capable of a crude form of “learning” — they receive payoffs that reinforce certain behaviors and “punish” others. Finite Automata

DC D C D “TIT FOR TAT” C

Finite Automata CC D C C D D D C “TIT FOR TWO TATS”

Finite Automata Some examples: CC D D DD C,D C D D C C,D C D START “ALWAYS DEFECT” “TIT FOR TAT” “GRIM (TRIGGER)” C D D D C CD “PAVLOV” “M5” C C C D D C C

Calculating Automata Payoffs D C D D D C C “PAVLOV” “M5” CC D D C C D Time-average payoffs can be calculated because any pair of FA will achieve cycles, since each FA takes as input only the actions in the previous period (i.e., it is “Markovian”). For example, consider the following pair of FA: C

Calculating Automata Payoffs D C D D D C C “PAVLOV” “M5” CC D D C C PAVLOV:C M5:D D C

Calculating Automata Payoffs D C D D D C C “PAVLOV” “M5” CC D D C C PAVLOV:CD M5:DC D C

Calculating Automata Payoffs D C D D D C C “PAVLOV” “M5” CC D D C C PAVLOV:CDD M5:DCD D C

Calculating Automata Payoffs D C D D D C C “PAVLOV” “M5” CC D D C C Payoff051 PAVLOV:CDD M5:DCD Payoff501 D C

Calculating Automata Payoffs D C D D D C C “PAVLOV” “M5” CC C D D C C Payoff PAVLOVCDDCDDCD M5DCDDCDDC Payoff D

Calculating Automata Payoffs D C D D D C C “PAVLOV” “M5” CC C D D C C Payoff AVG=2 PAVLOVCDDCDDCD M5DCDDCDDC Payoff AVG=2 D cyclecycle cycle

Tournament Assignment To design your strategy, access the programs through your fas Unix account. The Finite Automaton Creation Tool (fa) will prompt you to create a finite automata to implement your strategy. Select the number of internal states, designate the initial state, define output and transition functions, which together determine how an automaton “behaves.” The program also allows you to specify probabilistic output and transition functions. Simple probabilistic strategies such as GENEROUS TIT FOR TAT have been shown to perform particularly well in noisy environments, because they avoid costly sequences of alternating defections that undermine sustained cooperation.

Creating your automaton The program prompts the user to: specify the number of states in the automaton, with an upper limit of 50. For each state, the program asks: “choose an action (cooperate or defect);” and “in response to cooperate (defect), transition to what state?” Finally, the program asks: specify the initial state. The program also allows the user to specify probabilistic outputs and transitions. Tournament Assignment

Design a strategy to play an Evolutionary Prisoner’s Dilemma Tournament. Entries will meet in a round robin tournament, with 1% noise (i.e., for each intended choice there is a 1% chance that the opposite choice will be implemented). Games will last at least 1000 repetitions (each generation), and after each generation, population shares will be adjusted according to the replicator dynamic, so that strategies that do better than average will grow as a share of the population whereas others will be driven to extinction. The winner or winners will be those strategies that survive after at least 10,000 generations.

Pop. Share Generations Simulating Evolution 1(TFT) , ,12,15 13 No. = Position after 1 st Generation Source: Axelrod 1984, p. 51.

Simulating Evolution PAV TFT GRIM (TRIGGER) D R C Population shares for 6 RPD strategies (including RANDOM), with noise at 0.01 level. Pop. Shares Generations GTFT?

Preliminary Tournament Results After 5000 generations (as of 4/25/02) Avg. Score (x10)

Preliminary Tournament Results After 5000 generations (10pm 4/27/02)

Preliminary Tournament Results After generations (7am 4/28/02)

Preliminary Tournament Results After generations (4/28/05)

Preliminary Tournament Results After generations (8/09/05)