1 Strategic Thinking Lecture 7: Repeated Strategic Situations Suggested reading: Dixit and Skeath, ch. 9 University of East Anglia School of Economics Dr. Anders Poulsen 6 March 2011
Repeated Games Many real-life strategic situations are not one-shot games. They are longer, repeated games. In these situations, what a player does early on can affect what others choose to do later on. Repeated games can therefore give rise to behaviour that is important and that is not captured when we restrict our attention to static, one-shot games. Our question is: Under what conditions can cooperation emerge as an equilibrium of a repeated game?
The One-Shot Prisoner’s Dilemma Two players with two strategies: The strategy (Defect, Defect) is the only Nash equilibrium CooperateDefect Cooperate150,15050,200 Defect200,5075,75 Player 2 Player 1
A Finitely Repeated Prisoner’s Dilemma Let’s consider a subgame perfect equilibrium of a prisoner’s dilemma game that lasts for, say, 10 periods. To find the subgame perfect equilibrium of a repeated prisoner’s dilemma, we need to use backwards induction. So we consider the last period’s equilibrium. The last period of a finitely repeated prisoner’s dilemma is exactly the same as playing the game only once, as both players know that the game will not continue further. Therefore, it is optimal for players to choose their dominant strategy (Defect, Defect) in the last period.
Given firms will play (Defect, Defect) for sure in the last period, there is no benefit to playing Cooperate in period 9 Why? Firms will be ‘punished’ whether they cooperate or not. As a result, they will play (Defect, Defect) in period 9 as well. The same argument continues for period 8. Firms will be punished whether they cooperate or not… so it is optimal to play (Defect, Defect) And so on… Therefore the unique subgame perfect equilibrium is for players to choose (Defect, Defect) in all ten periods. As long as there is a known, finite end to the game, there will be no change in the equilibrium outcome of a game This is true for all finitely repeated games that have a unique Nash equilibrium of the one-shot game.
Does This Theory Work in Reality? In a one-shot game about 40% of people try to cooperate (Tversky, 2004) In repeated games, the unique subgame perfect equilibrium of a prisoner’s dilemma game (always Defect) are not usually observed in all periods. (Andreoni and Miller, 1993) There is incomplete information about players’ types: your opponent may be altruistic or `crazy’ (Kreps et al, 1982) People may be, or pretend to be altruistic, or behave reciprocally People need to learn the subgame perfect equilibrium (Selten and Stoecker, 1986)
The Infinitely Repeated Prisoner’s Dilemma Let us return to the same game as before Consider the following strategy: Start out playing C; then, in any period, play C if the opponent has played C in all previous periods, otherwise play D in this and all future periods. This is called the Grim trigger strategy. Denote it Grim. CooperateDefect Cooperate150,15050,200 Defect200,5075,75 Player 2 Player 1
Discounting We assume players discount payoffs received in the future using a constant discount factor, δ, where 0 < δ < 1. For example, if δ = 0.8, then a player values £1 received one period in the future as being equivalent to £0.80 in the present period (that is, δ*£1 = 0.8* £1). Why discount the future? Most people are inherently impatient. Also, players can invest present period payoffs to receive interest. Let r be the interest rate. Then, if £0.8 is invested for one period, in the next period there will be 0.8*(1+r). So δ*£1 = 0.8(1+r), or δ =1/(1+r), r = 0.25.
Discounting in an Infinitely Repeated Game Suppose a player expects to receive a payoff of X today and in every future period. Today = period 0. Value X. Period 1 = δX Period 2 = δδX= δ 2 X Period 3 = δδδX= δ 3 X … Period n = δδ…δX= δ n X, and so on. The expected present discounted value of this stream of payoffs is: E(π) = X+Xδ + Xδ 2 + Xδ Xδ n +...
Fact: E(π) = X/(1-δ), as long as 0<δ<1. Proof: See notes. Another fact: X(δ+ δ 2 +…+)= δX(1- δ).
Payoff from (Grim,Grim) : E(π c ) = 150/(1-δ) Payoff from best deviation: E(π d ) = δ/(1-δ) CooperateDefect Cooperate150,15050,200 Defect200,5075,75 Player 2 Player 1
How Patient Must Players Be to Sustain Cooperation? The strategy (Grim,Grim) is a subgame perfect equilibrium if E(π c ) ≥ E(π d ) or150/(1-δ) ≥ δ/(1-δ) Solve to get:δ ≥ 50/125 = 0.4
A General Version of the Infinitely Repeated Prisoner’s Dilemma The game below is a Prisoner’s Dilemma if B > C > D > A Ask: When is (Grim,Grim) a subgame perfect Nash equilibrium? CooperateDefect CooperateC,CA,B DefectB,AD,D Player 2 Player 1
Computing the Critical Value of the Discount Factor that Sustains Cooperation as an Equilibrium The strategy (Grim,Grim) is subgame perfect if E(π c ) ≥ E(π d ) whereE(π c ) = C/(1-δ)C E(π d ) = B + δ/(1-δ)D Solve:1/(1-δ)C ≥ B + δ/(1-δ)D C ≥ (1-δ)B + δD C ≥ B - δB + δD (B-D)δ ≥ B-Cso δ ≥ (B-C)/(B-D) Notice that: δ > 0 because B > C > D. Also, δ D.
The Tit-For-Tat Strategy Other punishment strategies also sustain cooperation. Consider the Tit-For-Tat (TFT) strategy: (1) Cooperate in the first period. (2) Then, in any subsequent period, do what the other person did in the previous period. Example: Period (C,C)(D,C)(D,D)(D,D)(C,D)(C,C) …
A Tit-for-tat strategy Can Also Sustain Cooperation: Consider again our base game: Consider the strategy where Player 1 defects in period one and then cooperates forever. E(π d ) = δ + 150δ 2 /(1-δ) CooperateDefect Cooperate150,15050,200 Defect200,5075,75 Player 2 Player 1
E(π d ) = δ + 150δ 2 /(1-δ) E(π c ) = δ/(1-δ)150 = δ + 150δ 2 /(1-δ) The strategy (TFT, TFT) is a Nash equilibrium if E(π c ) ≥ E(π d ) δ + 150δ 2 /(1-δ) ≥ δ + 150δ 2 /(1-δ) δ ≥ δ 100δ ≥ 50so δ ≥ 0.5 Recall δ ≥ 0.4 for the Grim trigger strategy. This suggests that players have to be more patient to sustain cooperation if their punishment strategies are weaker.
Games With an Unknown Last Period Many interactions are unlikely to be infinite: they all have to end sometime! Suppose instead that in each period there is a probability θ that the game will continue for at least another period, and a probability 1-θ that the game will end. Assume players use a Grim trigger strategy and set (for simplicity) δ = 1 CooperateDefect Cooperate4,42,6 Defect6,23,3 Player 2 Player 1
An Unknown End is Similar to Discounting E(π c ) = 4 + 4(θ + θ 2 + θ θ n +... ) = 4 + 4θ/(1-θ) E(π d ) = 6 + 2(θ + θ 2 + θ θ n +... ) = 6 + 3θ/(1-θ) Solve: 4/(1-θ) ≥ 6 + 3θ/(1-θ) 4 ≥ 6(1-θ) + 3θ 4 ≥ 6 - 6θ + 3θ 3θ ≥ 2 θ ≥ 2/3 = Cooperation sustainable if probability of playing another game is larger than
When to Model a Strategic Situation as a Finitely or an Infinitely Repeated Game? Even if a game is finitely repeated, it may be better to model it as an infinitely repeated game. This is the case when decision makers perceive the game as being one that will never end See discussion in Osborne and Rubinstein: A Course in Game Theory (MIT Press, 1994) –
Application: Repeated Duopoly Assume there are two firms that can set high and low prices: highlow high500,500100,1000 low1000,100250,250 Firm 2 Firm 1
The Bertrand Paradox We assume that the firms sell homogeneous products with (i) no fixed costs and (ii) marginal cost = c Firm i’s profit function: E(π i ) = [p i (p j ) – c]q i Will firms set p i > c? At any p i > c a firm has an incentive to set p i – Ɛ to attract the whole market. Subgame perfect equilibrium is to set: p i n = c As a result, firms receive: E(π i c ) = [c – c]q i c = 0
Breaking the Bertrand Paradox For policymakers the Bertrand Paradox represents the best outcome for society (i)products are sold at cost (ii)consumers receive all the benefits from competition Given firms cannot receive positive profit in equilibrium, they will try to break the Bertrand Paradox Breaking the Bertrand Paradox can be done by a number of ways We are interested in whether firms can collude to set higher prices This will only happen if they can threaten a credible punishment if a rival defects from cooperation…
The Infinitely Repeated Game Consider whether the homogeneous duopoly can sustain the monopoly price in an infinitely repeated game… In each period if collusion is sustained firms receive: π i m /2 E(π c ) =1/(1-δ)[π i m /2] Assume firms follow a Grim trigger strategy (revert to p i n forever) E(π d ) = π i d + δ/(1-δ)π i n = π i m
The firms can sustain the monopoly price if E(π c ) ≥ E(π d ) 1/(1-δ)[π i m /2] ≥ π i m π i m /2 ≥ π i m (1-δ) π i m /2 ≥ π i m – δπ i m δπ i m ≥ π i m – π i m /2 δπ i m ≥ π i m /2 δ ≥ 1/2 = 0.5
How can the theory of repeated games help policymakers? When intervening in a market policymakers try to make markets work well (better) for consumers When firms collude they charge higher prices than they would otherwise (1) Tacit collusion versus explicit collusion (2) Mergers - may lead to market structure that facilitates collusion (3) Interventions - can affect firms’ abilities to sustain collusion
Collusion is sustainable if: A market has the potential to be collusive if: (i) frequently repeated interaction (ii) the ability to monitor each other’s strategies (iii) few firms (iv)symmetry in firms’ market share (v)high entry barriers Short-term benefit from deviation Long-term punishment is outweighed by
The Number of Firms in the Market is Important Assume there are n > 1 firms in a market that share the market evenly In each period if collusion is sustained firms receive: π i m /n E(π c ) = π i m /n + δ/(1-δ)*[π i m /n] = 1/(1-δ)*[π i m /n] Assume firms follow a Grim trigger strategy (revert to p i n forever) E(π d ) = π i d + δ/(1-δ)*π i n = π i m
The firms can sustain the monopoly price if E(π c ) ≥ E(π d ) 1/(1-δ)[π i m /n] ≥ π i m π i m /n ≥ π i m (1-δ) π i m /n ≥ π i m – δπ i m δπ i m ≥ π i m – π i m /n δπ i m ≥ π i m (n – 1)/n δ ≥ 1 – 1/n As the number of firms goes up firms must be more patient to sustain collusion. As a result, collusion becomes harder to sustain.
Asymmetries in firms’ market shares are important Let us return to the homogeneous duopoly case, but assume: (i) firm 1 has a market share of s > 0.5 (ii) firm 2 has a market share of 1 – s < 0.5 If collusion is sustained firms receive: E(π 1 c ) = sπ 1 m + δ/(1-δ)*[sπ 1 m ] = 1/(1-δ)*[sπ 1 m ] E(π 2 c ) = 1/(1-δ)*[(1-s)*π 1 m ] Assume firms follow a Grim trigger strategy (revert to p i n forever) E(π d ) = π i d + δ/(1-δ)*π i n = π i m
Two constraints must be satisfied… Firm 1:1/(1-δ)*[sπ m ] ≥ π m sπ m ≥ π m *(1-δ) s ≥ 1 – δ δ ≥ 1 –s Firm 2:1/(1-δ)*[(1 – s)π m ] ≥ π m (1-s)π m ≥ π m *(1-δ) 1 – s ≥ 1 – δ δ ≥ s Firm 2 has to be more patient than firm 1 to sustain collusion
The effect of an increase in consumer activity is ambiguous (i) increased short-term incentive to deviate (ii)harsher punishment Kühn (2001) and OECD (2001) suggest that former effect dominates latter No empirical evidence of this conjecture… Two Types of Government Interventions: (i) increase consumer activity within a market and (ii) increase firms ability to monitor each other’s strategies i) Increased Consumer Activity Short-term benefit from deviation Long-term punishment is outweighed by
Increased Consumer Activity Can Go Either Way B > C > D > A Increased consumer activity: (i)increases B, which increases short-term benefit (ii)decreases D, which harshens long-term punishment δ/(1-δ)[C – D]≥ B – C Long-term punishment short-term benefit highlow highC,CA,B lowB,AD,D Firm 2 Firm 1
The effect of improving firms’ ability to monitor each other is detrimental (i)assists firms’ ability to coordinate upon strategies (ii)increases likelihood of punishment Empirical evidence: Albæk et al (1997)Danish concrete market (B2B) Fuller et al (1990)US freight (railroads)(B2B) Kauffman et al (2005)online book market(B2C) ii) Increased Monitoring ability Short-term benefit from deviation Long-term punishment is outweighed by
Improving Firms’ Ability to Monitor is Bad Remember B > C > D > A Improved monitoring ability: (i)decreases D, so long-term punishment is harsher δ/(1-δ)*[C – D]≥ B – C Long-term punishment short-term benefit highlow highC,CA,B lowB,AD,D Firm 2 Firm 1