Collaboration in Repeated Games

Collaboration in Repeated Games
Michael L. Littman Rutgers University

Planning and Learning with Hidden State
Motivation Create agents that achieve their goals, perhaps working together. Separate payoff functions General sum Compute Nash equilibria (stable strategies) Algorithm assigns strategies Not learning (at present) 8/29/2019 Planning and Learning with Hidden State

Grid Game 3 (Hu & Wellman 01)
B U, D, R, L, X No move on collision Semiwalls (50%) -1 for step , -10 for collision, +100 for goal Both can get goal. 8/29/2019 Planning and Learning with Hidden State

Repeated Markov Game S: Finite set of states A1, A2: Finite set of action choices R1(s, a1, a2): Payoff to first player R2(s, a1, a2): Payoff to second player P(s’| s, a1, a2): Transition function G: Goal (terminal) states (subset of S) Objective: maximize average (over repetitions) total reward 8/29/2019 Planning and Learning with Hidden State

Nash Equilibrium Pair of strategies such that neither has incentive to deviate unilaterally. can be a function of history can be randomized Always exists. Claim: Assumes games are repeated; players choose best response. 8/29/2019 Planning and Learning with Hidden State

Nash in Grid Game A B Average total: (97, 48) (48, 97) (- , -) (not Nash) (64, 64) (not Nash) (75, 75)? 8/29/2019 Planning and Learning with Hidden State

Collaborative Solution
Average total: (96, 96) (not Nash) A won’t wait. B changes incentives. A B A B A B A B A B 8/29/2019 Planning and Learning with Hidden State

Repeated Matrix Game One-state Markov game A1 = A2 = {cooperate, defect}: PD One (single-step) Nash 8/29/2019 Planning and Learning with Hidden State

Nash-Value Problem Computational problem: Given one-state Markov game (two tables) Find a Nash (always exists) Return each player’s value. In NP  co-NP; exact complexity open. Useful subproblem for Markov games. 8/29/2019 Planning and Learning with Hidden State

Two Special Cases Saddle-point equilibrium Deviation helps other player. Value is unique solution to zero-sum game. Coordination equilibrium Both players get maximum reward possible Value is unique max value 8/29/2019 Planning and Learning with Hidden State

Tit-for-Tat Saddle point, not coordination. Consider: cooperate, defect iff defected on. Better (3) than with defect-defect (1). 8/29/2019 Planning and Learning with Hidden State

Tit-For-Tat is Nash (C,C) = 3 (D,C) = 5 C D (C,D) = 0 (D,D) = 1 Cooperation (TFT) is best response C: C, D: D = 3 C: C, D: C = 3 C: D, D: D = 1 C: D, D: C = 2.5 8/29/2019 Planning and Learning with Hidden State

Generalized TFT TFT stablizes mutually beneficial outcome. General class of policies: Play beneficial action Punish deviation to suppress temptation Need to generalize both components. 8/29/2019 Planning and Learning with Hidden State

Security Level Values achievable without collaboration Solution of zero-sum game (LP) Can force a player to this value Player can guarantee this value Possibly stochastic Useful as a punishment, but also threshold PD: (1,1) 8/29/2019 Planning and Learning with Hidden State

Two-Player Plot (C,D) (C,C) (D, D) (D,C) Mark payoff for each combo of actions Mark security level 8/29/2019 Planning and Learning with Hidden State

Dominating Pair Let (s1, s2) be security-level values.
Dominating pair of actions (a1, a2): c1 = R1(a1, a2), c2 = R2(a1, a2) c1 > s1, c2 > s2 ti is temptation payoff (t1 = maxa R1(a, a2)). ni > (ti-ci)/(ci-si) punishments sufficient to stablize cooperation (folk thm). 8/29/2019 Planning and Learning with Hidden State

Alternation Repeat one, then the other. Repeat. If security below convex hull, can stablize. 8/29/2019 Planning and Learning with Hidden State

Algorithm Find Nash pair for repeated matrix game in polynomial time. Idea: If convex hull, use generalized TFT. Else, can find one-step Nash quickly. 8/29/2019 Planning and Learning with Hidden State

Proof Sketch (S1, S2*) (S1, S2) (S1*, S2) Try improving S1 policy. If can’t, try improving S2. If can’t, Nash. If can, won’t hurt player 1 and player 1 can’t improve, so Nash. 8/29/2019 Planning and Learning with Hidden State

Symmetric Case R1(a, a’) = R2(a’, a) Value of game just maximum average! Alternate or accept security-level. 8/29/2019 Planning and Learning with Hidden State

Symmetric Markov Game Episodic Roles chosen randomly Algorithm: Maximize sum (MDP) Security-level (0-sum) Choose max if better Converges to Nash. A B A B 8/29/2019 Planning and Learning with Hidden State

Conclusion Threats can help (Littman & Stone 01) Find repeated Nash in polynomial time Very simple structure for symmetric games Applies to Markov games 8/29/2019 Planning and Learning with Hidden State

Discussion Objectives in game theory/RL for agents? Desiderata? How learn state space when repeated? Multiobjective negotiation? Learning: combine leading and following? Different unknown discount rates?? Incomplete rationality? Incomplete information of rewards? 8/29/2019 Planning and Learning with Hidden State

Collaboration in Repeated Games

Similar presentations

Presentation on theme: "Collaboration in Repeated Games"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Collaboration in Repeated Games

Similar presentations

Presentation on theme: "Collaboration in Repeated Games"— Presentation transcript:

Similar presentations

About project

Feedback