Rutgers University A Polynomial-time Nash Equilibrium Algorithm for Repeated Stochastic Games Enrique Munoz de Cote Michael L. Littman.

Slides:



Advertisements
Similar presentations
Primal Dual Combinatorial Algorithms Qihui Zhu May 11, 2009.
Advertisements

Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games.
Vincent Conitzer CPS Repeated games Vincent Conitzer
Markov Decision Processes (MDPs) read Ch utility-based agents –goals encoded in utility function U(s), or U:S  effects of actions encoded in.
Markov Decision Process
Continuation Methods for Structured Games Ben Blum Christian Shelton Daphne Koller Stanford University.
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Bilinear Games: Polynomial Time Algorithms for Rank Based Subclasses Ruta Mehta Indian Institute of Technology, Bombay Joint work with Jugal Garg and Albert.
Chapter 14 Infinite Horizon 1.Markov Games 2.Markov Solutions 3.Infinite Horizon Repeated Games 4.Trigger Strategy Solutions 5.Investing in Strategic Capital.
Calibrated Learning and Correlated Equilibrium By: Dean Foster and Rakesh Vohra Presented by: Jason Sorensen.
An Introduction to... Evolutionary Game Theory
MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)
Nash Equilibria In Graphical Games On Trees Edith Elkind Leslie Ann Goldberg Paul Goldberg.
EC941 - Game Theory Lecture 7 Prof. Francesco Squintani
General-sum games You could still play a minimax strategy in general- sum games –I.e., pretend that the opponent is only trying to hurt you But this is.
Markov Decision Processes
Infinite Horizon Problems
Planning under Uncertainty
An Introduction to Game Theory Part II: Mixed and Correlated Strategies Bernhard Nebel.
Overlapping Coalition Formation: Charting the Tractability Frontier Y. Zick, G. Chalkiadakis and E. Elkind (submitted to AAMAS 2012)
Games, Times, and Probabilities: Value Iteration in Verification and Control Krishnendu Chatterjee Tom Henzinger.
Dipartimento di Elettronica e Informazione Multiagent rational decision making: searching and learning for “good” strategies Enrique Munoz de Cote What.
Markov Decision Processes
Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
An Introduction to Game Theory Part III: Strictly Competitive Games Bernhard Nebel.
Correlated-Q Learning and Cyclic Equilibria in Markov games Haoqi Zhang.
APEC 8205: Applied Game Theory Fall 2007
Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.
QR 38 3/20/07, More on repeated games in IR I.Folk theorem II.Other solutions to the PD III.Repeated PDs in practice.
Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.
Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
1 Algorithms for Computing Approximate Nash Equilibria Vangelis Markakis Athens University of Economics and Business.
On Partially Controlled Multi-Agent Systems By: Ronan I. Brafman and Moshe Tennenholtz Presentation By: Katy Milkman CS286r - April 12, 2006.
On Bounded Rationality and Computational Complexity Christos Papadimitriou and Mihallis Yannakakis.
Extending Implicit Negotiation to Repeated Grid Games Robin Carnow Computer Science Department Rutgers University.
Stochastic Games Krishnendu Chatterjee CS 294 Game Theory.
Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer
Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
MAKING COMPLEX DEClSlONS
Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.
1 Game Theory Sequential bargaining and Repeated Games Univ. Prof.dr. M.C.W. Janssen University of Vienna Winter semester Week 46 (November 14-15)
The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.
Standard and Extended Form Games A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor, SIUC.
Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.
Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
R L 3 Introduction to Planning Under Uncertainty Michael L. Littman Rutgers University Department of Computer Science Rutgers Laboratory for Real Life.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Ásbjörn H Kristbjörnsson1 The complexity of Finding Nash Equilibria Ásbjörn H Kristbjörnsson Algorithms, Logic and Complexity.
Part 3 Linear Programming
Algorithmic Game Theory and Internet Computing Vijay V. Vazirani Georgia Tech Primal-Dual Algorithms for Rational Convex Programs II: Dealing with Infeasibility.
1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.
1 Algorithms for Computing Approximate Nash Equilibria Vangelis Markakis Athens University of Economics and Business.
R. Brafman and M. Tennenholtz Presented by Daniel Rasmussen.
1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Yuan Deng Vincent Conitzer Duke University
Non-additive Security Games
Making complex decisions
Markov Decision Processes
Vincent Conitzer CPS Repeated games Vincent Conitzer
Convergence, Targeted Optimality, and Safety in Multiagent Learning
Structured Models for Multi-Agent Interactions
Multiagent Systems Game Theory © Manfred Huber 2018.
Game Theory in Wireless and Communication Networks: Theory, Models, and Applications Lecture 10 Stochastic Game Zhu Han, Dusit Niyato, Walid Saad, and.
Multiagent Systems Repeated Games © Manfred Huber 2018.
Vincent Conitzer Repeated games Vincent Conitzer
Collaboration in Repeated Games
Normal Form (Matrix) Games
Vincent Conitzer CPS Repeated games Vincent Conitzer
Presentation transcript:

Rutgers University A Polynomial-time Nash Equilibrium Algorithm for Repeated Stochastic Games Enrique Munoz de Cote Michael L. Littman

Polytime Nash for repeated stochastic games 2 Main Result  Given a repeated stochastic game, return a strategy profile that is a Nash equilibrium (specifically one whose payo ff s match the egalitarian point) of the average payo ff repeated stochastic game in polynomial time. Concretely, we address the following computational problem: v1v1 v2v2 Convex hull of the average payoffs egalitarian line Main Result

Polytime Nash for repeated stochastic games 3 Framework Multiple states Single state Single agent Multiple agents MDPs matrix games Decision Theory, Planning stochastic games

Polytime Nash for repeated stochastic games 4 Stochastic Games (SG)  Superset of MDPs & NFGs  S is the set of states  T is the transition function Such that: backgrounds

Polytime Nash for repeated stochastic games A Computational Example: SG version of chicken  actions: U, D, R, L, X  coin flip on collision  Semiwalls (50%)  collision = -5;  step cost = -1;  goal = +100;  discount factor = 0.95;  both can get goal. SG of chicken [Hu & Wellman, 03] A B $ backgrounds

Polytime Nash for repeated stochastic games Strategies on the SG of chicken A B $ Average expected reward:  (88.3,43.7);  (43.7,88.3);  (66,66);  (43.7,43.7);  (38.7,38.7);  (83.6,83.6) backgrounds discount factor:.95

Polytime Nash for repeated stochastic games A B $ Equilibrium values Average total reward on equilibrium:  Nash (88.3,43.7) very imbalanced, inefficient (43.7,88.3) very imbalanced, inefficient (53.6,53.6) ½ mix, still inefficient  Correlated ([43.7,88.3],[43.7,88.3]);  Minimax (43.7,43.7);  Friend (38.7,38.7) Nash: computationally difficult to find in general backgrounds

Polytime Nash for repeated stochastic games Repeated Games  Many more equilibrium alternatives (Folk theorems)  Equilibrium strategies: Can depend on past interactions Can be randomized  Nash equilibrium still exists. What if players are allowed to play multiple times? v1v1 v2v2 Convex hull of the average payoffs backgrounds

Polytime Nash for repeated stochastic games Nash equilibrium of the repeated game [Folk theorems]. For any set of average payoffs that is, Strictly enforceable Feasible there exist equilibrium profile strategies that achieve these payoffs  Mutual advantage strategies: up and right of disagreement point v = (v 1,v 2 )  Threats: attack strategies against deviations v1v1 v2v2 v

Polytime Nash for repeated stochastic games Egalitarian equilibrium point Folk theorems conceptual drawback: infinitely many feasible and enforceable strategies egalitarian line v1v1 v2v2 Convex hull of the average payoffs P Egalitarian point. Maximizes the minimum advantage of the players’ rewards  Egalitarian line. line where payoffs are equally high above v v

Polytime Nash for repeated stochastic games 11 How? (the short story version)  Compute attack and defense strategies. Solve two linear programming problems.  The algorithm searches for a point: egalitarian line v2v2 2 v1v1 where P Convex hull of a hypothetical SG  P is the point with the highest egalitarian value. Repeated SG Nash algorithm→result

Polytime Nash for repeated stochastic games Game representation  Folk theorems can be interpreted computationally Matrix form [Littman & Stone, 2005] Stochastic game form [Munoz de Cote & Littman, 2008]  Define a weighted combination value:  A strategy profile ( π) that achieves σ w (p π ) can be found by modeling an MDP

Polytime Nash for repeated stochastic games Markov Decision Processes  We use MDPs to model 2 players as a meta-player Return: joint strategy profile that maximizes a weighted combination of the players’ payoffs  Friend solutions: (R 0, π 1 ) = MDP(1), (L 0, π 2 ) = MDP(0),  A weighted solution: (P, π) = MDP(w) v1v1 v2v2 v R0R0 L0L0 P

Polytime Nash for repeated stochastic games 14 The algorithm egalitarian line v2v2 2 v1v1 P=R Convex hull of a hypothetical SG Repeated SG Nash algorithm→result  Compute attack 1, attack 2, defense 1, defense 2 and R=friend 1, L=friend 2  Find egalitarian point and its strategy proflile If R is left of egalitarian line: P=R elseIf L is right of egalitarian line: P = L Else egalSearch(R,L,T) L R P=L R L folk FolkEgal(U1,U2, ε ) \ \ \

Polytime Nash for repeated stochastic games The key subroutine  Finds intersection between X and egalitarian line  Close to a binary search  Input: Point L (to the left of egalitarian line) Point R (to the right of egalitarian line) A bound T on the number of iterations  Return: The egalitarian point P (with accuracy ε )  Each iteration solves an MDP(w) by finding a solution to: EgalSearch(L,R,T)

Polytime Nash for repeated stochastic games 16 Complexity  Dissagreement point (accuracy ε ) : 1 / (1 – γ), 1 / ε, U max  MDPs are solved in polynomial time [Puterman, 1994]  The algorithm is polynomial iff T is bounded by a polynomial. Result: Running time. Polynomial in: The discount factor 1 / (1 – γ) ; The approximation factor 1 / ε; Magnitude of largest utility U max Running time. Polynomial in: The discount factor 1 / (1 – γ) ; The approximation factor 1 / ε; Magnitude of largest utility U max Repeated SG Nash algorithm→result

Polytime Nash for repeated stochastic games 17 SG version of the PD game $ $B$B$B$B $A$A$A$A A B AlgorithmAgent AAgent B security-VI46.5 mutual defection friend-VI46 mutual defection CE-VI46.5 mutual defection folkEgal88.8 mutual cooperation with threat of defection experiments

Polytime Nash for repeated stochastic games 18 Compromise game AB $B$B$B$B $A$A$A$A AlgorithmAgent AAgent B security-VI00attacker blocking goal friend-VI-20 mutual defection CE-VI suboptimal waiting strategy folkEgal78.7 mutual cooperation (w=0.5) with treat of defection experiments A BBB B AA A

Polytime Nash for repeated stochastic games 19 Asymmetric game AB $B$B$B$B $A$A$A$A $A$A$A$A AlgorithmAgent AAgent B security-VI00attacker blocking goal friend-VI-200 mutual defection CE-VI32.1 suboptimal mutual cooperation folkEgal37.2 mutual cooperation with threat of defection experiments

Rutgers University Thanks for your attention!