tit-for-tat algorithm auxiliary document tit-for-tat algorithm lectured by Chang-jin Suh Soongsil University, Dep. of Computer Science Tel : 820-0686 cjsuh @ ssu.ac.kr
0.contents 1. Prinsoners’ Dilemma 2. Iterative PD(IPD) 3. famous IPD solutions II-a: Application Layer
1. Prinsoners’ Dilemma description Two very clever suspects are arrested by the police. Evidences are insufficient for a conviction(유죄선고). Having separated both prisoners, the policeman visits each of them to offer the same plea deal(형량거래). deal : See the table next slide. Each prisoner must choose to betray or to be silent. Each one is assured that the other would not know about the betrayal until declaring the sentence(선고). What do you choose if you were a prisoner? tit-for-tat algorithm
1. Prinsoners’ Dilemma plea deal matrix (A, B represent prisoners.) simplified plea deal matrix (x,y) : (A’s and B’s sentence.) PD payoff matrix (negative) penalty changes to (positive) payoff. ** fair payoff : Same payoff numbers to each player. B stays silent B betrays A stays silent A,B serve 0.5 year A: 10 years B: goes free A betrays A: goes free B: 10 years A,B serve 5 years cooperate betray (3,3) (0,5) (5,0) (1,1) B cooperates B betrays A cooperates (-0.5y, -0.5y) (-10y,0) A betrays (0,-10y) (-5y,-5) tit-for-tat algorithm
1. Prinsoners’ Dilemma (PD) 1. general PD problem Given the PD matrix, how does player do to maximize its payoff ? T> R > P > S, 2 R > T+S PD problem’s solution Choose “betray”(=”war”). (proof) Under a given but unknown other’s decision, I always can get benefit by choosing “betray/war” if other is ‘c’, R(3) < T(5), if other is ‘d’, S(0) < T(1). even though my decision damages the other. if other is ‘c’, R(3) > S(0), if other is ‘d’, T(5) > T(1). cooperate betray (R,R) (S,T) (T,S) (P,P) peace war (3,3) (0,5) (5,0) (1,1) peach-war game table tit-for-tat algorithm
1. Prinsoners’ Dilemma (PD) Prisoner’s dilemma Both (very clever) prisoners know they can achieve the maximum payoff, if both choose “cooperate” (=“peace”). But they cannot do it because they are too clever and they know the previous proof. tit-for-tat algorithm
2. Iterative PD(IPD) iterated prisoner's dilemma problem Repeat the PD problem without announcing the repetition number. (If it is known, all “war” is the best solution.) remembering results in the current IPD game. Players can punish the opponent’s “war” in the later rounds by choosing “war”. games goal : Maximize the accumulated payoff We do not count lose or win of the current IPD game! Greedy players(who prefer war) used to win, but used to accumulate less payoff. tit-for-tat algorithm
2. Iterative PD(IPD) iterated prisoner's dilemma problem contest called “peach war game” or “IPD tournament” held once a year since 1975. game objective : Maximize payoff while playing IPD with many players. tit-for-tat algorithm
2. Iterative PD(IPD) well-known good IPD game strategies Nice : this is also called "optimistic" def : Do not “defect” before its opponent does Retaliating : def : Do not do “blind optimism (always-nice)”. why? “nasty”(un-nice) strategy ruthlessly attacks it. Forgiving : def : Do not do “infinite retaliation”. why? to shorten the long runs of revenge and counter-revenge, to maximize payoff. Non-envious : def : Do not strive to win the game (score more than the opponent’s). nice player are always non-envious. tit-for-tat algorithm
3. famous IPD solutions (very simple original) tit-for-tat rule 1’st decision : ‘p’ n’th decision : the opponent’s (n-1)’th decision. (n=2,3,4, …) property : nice, retaliating, non-envious non-forgiving : If two players use this strategies, each will never forgive. examples two original tit-for-tat peace war (3,3) (0,5) (5,0) (1,1) pessimist vs (original tit-for-tat) round 1 2 3 4 5 score tit-for-tat1 p 15 tit-for-tat2 round 1 2 3 4 5 score pessimist w 9 tit-for-tat p tit-for-tat algorithm
3. famous IPD solutions (continued) death spiral example A,B players uses original tit-for-tat, but B shows “war” at the first round peace war (3,3) (0,5) (5,0) (1,1) 1 2 3 4 5 score A p w 15 B 10 tit-for-tat algorithm
3. famous IPD solutions tit-for-tat with forgiveness This is generally called simply “tit-for-tat” rule Unless provoked, the agent will always cooperate. nice If provoked, the agent will retaliate : retaliating The agent is quick to forgive. : forgiving The agent must have a good chance of competing against the opponent more than once. (?) tit-for-tat algorithm
3. famous IPD solutions tit-for-two tat rule : only different part from tit-for-tat is defined. If provoked twice consecutively, the agent will retaliate. property : nicer than tit-for-tat usage : a variant of tit-for-two tat is used in bitTorrent. bitTorrent call it as “optimistically un-choked”. tit-for-tat algorithm