Download presentation
Presentation is loading. Please wait.
Published byBarry Singleton Modified over 9 years ago
1
Rutgers University A Polynomial-time Nash Equilibrium Algorithm for Repeated Stochastic Games Enrique Munoz de Cote Michael L. Littman
2
Polytime Nash for repeated stochastic games 2 Main Result Given a repeated stochastic game, return a strategy profile that is a Nash equilibrium (specifically one whose payo ff s match the egalitarian point) of the average payo ff repeated stochastic game in polynomial time. Concretely, we address the following computational problem: v1v1 v2v2 Convex hull of the average payoffs egalitarian line Main Result
3
Polytime Nash for repeated stochastic games 3 Framework Multiple states Single state Single agent Multiple agents MDPs matrix games Decision Theory, Planning stochastic games
4
Polytime Nash for repeated stochastic games 4 Stochastic Games (SG) Superset of MDPs & NFGs S is the set of states T is the transition function Such that: backgrounds
5
Polytime Nash for repeated stochastic games A Computational Example: SG version of chicken actions: U, D, R, L, X coin flip on collision Semiwalls (50%) collision = -5; step cost = -1; goal = +100; discount factor = 0.95; both can get goal. SG of chicken [Hu & Wellman, 03] A B $ backgrounds
6
Polytime Nash for repeated stochastic games Strategies on the SG of chicken A B $ Average expected reward: (88.3,43.7); (43.7,88.3); (66,66); (43.7,43.7); (38.7,38.7); (83.6,83.6) backgrounds discount factor:.95
7
Polytime Nash for repeated stochastic games A B $ Equilibrium values Average total reward on equilibrium: Nash (88.3,43.7) very imbalanced, inefficient (43.7,88.3) very imbalanced, inefficient (53.6,53.6) ½ mix, still inefficient Correlated ([43.7,88.3],[43.7,88.3]); Minimax (43.7,43.7); Friend (38.7,38.7) Nash: computationally difficult to find in general backgrounds
8
Polytime Nash for repeated stochastic games Repeated Games Many more equilibrium alternatives (Folk theorems) Equilibrium strategies: Can depend on past interactions Can be randomized Nash equilibrium still exists. What if players are allowed to play multiple times? v1v1 v2v2 Convex hull of the average payoffs backgrounds
9
Polytime Nash for repeated stochastic games Nash equilibrium of the repeated game [Folk theorems]. For any set of average payoffs that is, Strictly enforceable Feasible there exist equilibrium profile strategies that achieve these payoffs Mutual advantage strategies: up and right of disagreement point v = (v 1,v 2 ) Threats: attack strategies against deviations v1v1 v2v2 v
10
Polytime Nash for repeated stochastic games Egalitarian equilibrium point Folk theorems conceptual drawback: infinitely many feasible and enforceable strategies egalitarian line v1v1 v2v2 Convex hull of the average payoffs P Egalitarian point. Maximizes the minimum advantage of the players’ rewards Egalitarian line. line where payoffs are equally high above v v
11
Polytime Nash for repeated stochastic games 11 How? (the short story version) Compute attack and defense strategies. Solve two linear programming problems. The algorithm searches for a point: egalitarian line v2v2 2 v1v1 where P Convex hull of a hypothetical SG P is the point with the highest egalitarian value. Repeated SG Nash algorithm→result
12
Polytime Nash for repeated stochastic games Game representation Folk theorems can be interpreted computationally Matrix form [Littman & Stone, 2005] Stochastic game form [Munoz de Cote & Littman, 2008] Define a weighted combination value: A strategy profile ( π) that achieves σ w (p π ) can be found by modeling an MDP
13
Polytime Nash for repeated stochastic games Markov Decision Processes We use MDPs to model 2 players as a meta-player Return: joint strategy profile that maximizes a weighted combination of the players’ payoffs Friend solutions: (R 0, π 1 ) = MDP(1), (L 0, π 2 ) = MDP(0), A weighted solution: (P, π) = MDP(w) v1v1 v2v2 v R0R0 L0L0 P
14
Polytime Nash for repeated stochastic games 14 The algorithm egalitarian line v2v2 2 v1v1 P=R Convex hull of a hypothetical SG Repeated SG Nash algorithm→result Compute attack 1, attack 2, defense 1, defense 2 and R=friend 1, L=friend 2 Find egalitarian point and its strategy proflile If R is left of egalitarian line: P=R elseIf L is right of egalitarian line: P = L Else egalSearch(R,L,T) L R P=L R L folk FolkEgal(U1,U2, ε ) \ \ \
15
Polytime Nash for repeated stochastic games The key subroutine Finds intersection between X and egalitarian line Close to a binary search Input: Point L (to the left of egalitarian line) Point R (to the right of egalitarian line) A bound T on the number of iterations Return: The egalitarian point P (with accuracy ε ) Each iteration solves an MDP(w) by finding a solution to: EgalSearch(L,R,T)
16
Polytime Nash for repeated stochastic games 16 Complexity Dissagreement point (accuracy ε ) : 1 / (1 – γ), 1 / ε, U max MDPs are solved in polynomial time [Puterman, 1994] The algorithm is polynomial iff T is bounded by a polynomial. Result: Running time. Polynomial in: The discount factor 1 / (1 – γ) ; The approximation factor 1 / ε; Magnitude of largest utility U max Running time. Polynomial in: The discount factor 1 / (1 – γ) ; The approximation factor 1 / ε; Magnitude of largest utility U max Repeated SG Nash algorithm→result
17
Polytime Nash for repeated stochastic games 17 SG version of the PD game $ $B$B$B$B $A$A$A$A A B AlgorithmAgent AAgent B security-VI46.5 mutual defection friend-VI46 mutual defection CE-VI46.5 mutual defection folkEgal88.8 mutual cooperation with threat of defection experiments
18
Polytime Nash for repeated stochastic games 18 Compromise game AB $B$B$B$B $A$A$A$A AlgorithmAgent AAgent B security-VI00attacker blocking goal friend-VI-20 mutual defection CE-VI68.270.1suboptimal waiting strategy folkEgal78.7 mutual cooperation (w=0.5) with treat of defection experiments A BBB B AA A
19
Polytime Nash for repeated stochastic games 19 Asymmetric game AB $B$B$B$B $A$A$A$A $A$A$A$A AlgorithmAgent AAgent B security-VI00attacker blocking goal friend-VI-200 mutual defection CE-VI32.1 suboptimal mutual cooperation folkEgal37.2 mutual cooperation with threat of defection experiments
20
Rutgers University Thanks for your attention!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.