The Evolution of Conventions H. Peyton Young Presented by Na Li and Cory Pender
What is a convention? Customary behavior Self-enforcing Not always symmetric Follow given that other people do Examples Driving on the right Eating with utensils Men propose to women
How are conventions “chosen”? A convention is an equilibrium, but there could be others Some equilibria are inherently more reasonable (Harsanyi and Selten) One equilibrium more prominent (Schelling)
Evolutionary explanation Past plays influence players’ choices One equilibrium eventually becomes more prevalent This paper shows that behavior will converge over time to a Nash, given some limitations on the game
The model n people randomly selected from large population Base actions on sampling of plays from recent past No individual learning Mistakes possible “Adaptive play”
Goals In weakly acyclic games: –If samples are sufficiently incomplete and memory is finite, converge to Nash With mistakes: –Almost always converges to a particular equilibrium
Adaptive play n-person game G, strategy set S i N divided into classes C 1, C 2,..., C n. G played once per period; t = 1, 2,... Play at time t is s(t) = (s 1 (t), s 2 (t),... s n (t)) In class C i, utility u i (s) History of plays is h(t) = (s(1), s(2),..., s(t))
Choosing strategies Choose m, k such that 1≤k≤m In period t+1, where t ≥ m: –Each player sees k plays from past m periods –k/m is completeness of information –Plays are not necessarily equally likely to be seen
First m plays random H consists of all sequences of length m drawn from ∏S i Finite Markov chain on H with initial h(m) Successor of h H is h’ H For s S i, p i (s|h) P i ( · ) is a best-reply distribution –p i (s|h) > 0 iff s is i’s best reply for some k –p i (s|h) independent of t P moving from h to h’ is ∏ i=1,n p i (s i |h)
Convergence of adaptive play h is an absorbing state iff it is Nash played m times h = h’ = (s, s,..., s) Convergence strict Nash –But strict Nash does not guarantee convergence –Cycling Use weakly acyclic games
Best-reply graph s s’ s*
Theorem G is a weakly acyclic n-person game L(s) = length shortest path from s to Nash L G = max s L(s) If k ≤ m/(L G + 2), adaptive play “almost surely” converges to convention Main idea: If information is sufficiently incomplete, adaptive play converges
Proof Positive probability that: –At some t + 1, all agents sample last k plays (call this µ) –From periods t + 1 to t + k, all agents choose sample µ –Each agent makes same best-reply to µ k times in a row So positive probability of a run (s, s,..., s) from t + 1 to t + k
If s is a strict Nash: Positive probability that from t + k + 1 to t + m, each agent samples last k plays s is played for m - k more periods, then absorbing state has been reached
If s is not a strict Nash: There is a best-reply path from s to strict Nash s r along the path s s 1 s 2 ... s r For s s 1 : –Player i samples from periods t + 1 to t + k (i.e. samples s) –Everyone else samples µ –Positive probability that these will occur for the next k periods By similar argument, you can move from s 1 to s 2, and so on to s r Hence limiting the size of k
Example Battle of the sexes –Opera vs. football game - yield or not yield Man Woman YieldNot Yield Yield0,01,√2 Not Yield√2,10,0
Why must we limit k? Let k = m Consider initial sequence where they both yielded/both didn’t yield To decide next round: pick choice with highest expected payoff (in this case, each yields if 1 - f > f√2) What would happen if k is bounded as specified by adaptive play?
Is this the best we can do? Note that the theorem guarantees convergence to an equilibrium –But which equilibrium? Also, it seems unlikely that people would always play best response perfectly
Back to our example... With slightly different payoffs Man Woman YieldNot Yield Yield0,01,√2 Not Yield√2/2, 1/20,0
Let k = 1, m = 3 We can imagine a situation where –Both yield on first round –Both not yield on second round –On 3rd round, woman samples yielding round, man not yielding round –What would be each player’s best reply? –Next round? –Get stuck in suboptimal equilibrium Perhaps introducing mistakes could solve this problem
Simulation