Download presentation
Presentation is loading. Please wait.
1
Collaboration in Repeated Games
Michael L. Littman Rutgers University
2
Planning and Learning with Hidden State
Motivation Create agents that achieve their goals, perhaps working together. Separate payoff functions General sum Compute Nash equilibria (stable strategies) Algorithm assigns strategies Not learning (at present) 8/29/2019 Planning and Learning with Hidden State
3
Grid Game 3 (Hu & Wellman 01)
B U, D, R, L, X No move on collision Semiwalls (50%) -1 for step , -10 for collision, +100 for goal Both can get goal. 8/29/2019 Planning and Learning with Hidden State
4
Planning and Learning with Hidden State
Repeated Markov Game S: Finite set of states A1, A2: Finite set of action choices R1(s, a1, a2): Payoff to first player R2(s, a1, a2): Payoff to second player P(s’| s, a1, a2): Transition function G: Goal (terminal) states (subset of S) Objective: maximize average (over repetitions) total reward 8/29/2019 Planning and Learning with Hidden State
5
Planning and Learning with Hidden State
Nash Equilibrium Pair of strategies such that neither has incentive to deviate unilaterally. can be a function of history can be randomized Always exists. Claim: Assumes games are repeated; players choose best response. 8/29/2019 Planning and Learning with Hidden State
6
Planning and Learning with Hidden State
Nash in Grid Game A B Average total: (97, 48) (48, 97) (- , -) (not Nash) (64, 64) (not Nash) (75, 75)? 8/29/2019 Planning and Learning with Hidden State
7
Collaborative Solution
Average total: (96, 96) (not Nash) A won’t wait. B changes incentives. A B A B A B A B A B 8/29/2019 Planning and Learning with Hidden State
8
Planning and Learning with Hidden State
Repeated Matrix Game One-state Markov game A1 = A2 = {cooperate, defect}: PD One (single-step) Nash 8/29/2019 Planning and Learning with Hidden State
9
Planning and Learning with Hidden State
Nash-Value Problem Computational problem: Given one-state Markov game (two tables) Find a Nash (always exists) Return each player’s value. In NP co-NP; exact complexity open. Useful subproblem for Markov games. 8/29/2019 Planning and Learning with Hidden State
10
Planning and Learning with Hidden State
Two Special Cases Saddle-point equilibrium Deviation helps other player. Value is unique solution to zero-sum game. Coordination equilibrium Both players get maximum reward possible Value is unique max value 8/29/2019 Planning and Learning with Hidden State
11
Planning and Learning with Hidden State
Tit-for-Tat Saddle point, not coordination. Consider: cooperate, defect iff defected on. Better (3) than with defect-defect (1). 8/29/2019 Planning and Learning with Hidden State
12
Planning and Learning with Hidden State
Tit-For-Tat is Nash (C,C) = 3 (D,C) = 5 C D (C,D) = 0 (D,D) = 1 Cooperation (TFT) is best response C: C, D: D = 3 C: C, D: C = 3 C: D, D: D = 1 C: D, D: C = 2.5 8/29/2019 Planning and Learning with Hidden State
13
Planning and Learning with Hidden State
Generalized TFT TFT stablizes mutually beneficial outcome. General class of policies: Play beneficial action Punish deviation to suppress temptation Need to generalize both components. 8/29/2019 Planning and Learning with Hidden State
14
Planning and Learning with Hidden State
Security Level Values achievable without collaboration Solution of zero-sum game (LP) Can force a player to this value Player can guarantee this value Possibly stochastic Useful as a punishment, but also threshold PD: (1,1) 8/29/2019 Planning and Learning with Hidden State
15
Planning and Learning with Hidden State
Two-Player Plot (C,D) (C,C) (D, D) (D,C) Mark payoff for each combo of actions Mark security level 8/29/2019 Planning and Learning with Hidden State
16
Dominating Pair Let (s1, s2) be security-level values.
Dominating pair of actions (a1, a2): c1 = R1(a1, a2), c2 = R2(a1, a2) c1 > s1, c2 > s2 ti is temptation payoff (t1 = maxa R1(a, a2)). ni > (ti-ci)/(ci-si) punishments sufficient to stablize cooperation (folk thm). 8/29/2019 Planning and Learning with Hidden State
17
Planning and Learning with Hidden State
Alternation Repeat one, then the other. Repeat. If security below convex hull, can stablize. 8/29/2019 Planning and Learning with Hidden State
18
Planning and Learning with Hidden State
Algorithm Find Nash pair for repeated matrix game in polynomial time. Idea: If convex hull, use generalized TFT. Else, can find one-step Nash quickly. 8/29/2019 Planning and Learning with Hidden State
19
Planning and Learning with Hidden State
Proof Sketch (S1, S2*) (S1, S2) (S1*, S2) Try improving S1 policy. If can’t, try improving S2. If can’t, Nash. If can, won’t hurt player 1 and player 1 can’t improve, so Nash. 8/29/2019 Planning and Learning with Hidden State
20
Planning and Learning with Hidden State
Symmetric Case R1(a, a’) = R2(a’, a) Value of game just maximum average! Alternate or accept security-level. 8/29/2019 Planning and Learning with Hidden State
21
Planning and Learning with Hidden State
Symmetric Markov Game Episodic Roles chosen randomly Algorithm: Maximize sum (MDP) Security-level (0-sum) Choose max if better Converges to Nash. A B A B 8/29/2019 Planning and Learning with Hidden State
22
Planning and Learning with Hidden State
Conclusion Threats can help (Littman & Stone 01) Find repeated Nash in polynomial time Very simple structure for symmetric games Applies to Markov games 8/29/2019 Planning and Learning with Hidden State
23
Planning and Learning with Hidden State
Discussion Objectives in game theory/RL for agents? Desiderata? How learn state space when repeated? Multiobjective negotiation? Learning: combine leading and following? Different unknown discount rates?? Incomplete rationality? Incomplete information of rewards? 8/29/2019 Planning and Learning with Hidden State
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.