Constraints in Repeated Games. Rational Learning Leads to Nash Equilibrium …so what is rational learning? Kalai & Lehrer, 1993.

Slides:



Advertisements
Similar presentations
Vincent Conitzer CPS Repeated games Vincent Conitzer
Advertisements

Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
M9302 Mathematical Models in Economics Instructor: Georgi Burlakov 3.1.Dynamic Games of Complete but Imperfect Information Lecture
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Mixed Strategies CMPT 882 Computational Game Theory Simon Fraser University Spring 2010 Instructor: Oliver Schulte.
Evolution and Repeated Games D. Fudenberg (Harvard) E. Maskin (IAS, Princeton)
Congestion Games with Player- Specific Payoff Functions Igal Milchtaich, Department of Mathematics, The Hebrew University of Jerusalem, 1993 Presentation.
Infinitely Repeated Games. In an infinitely repeated game, the application of subgame perfection is different - after any possible history, the continuation.
Non-Cooperative Game Theory To define a game, you need to know three things: –The set of players –The strategy sets of the players (i.e., the actions they.
Chapter 14 Infinite Horizon 1.Markov Games 2.Markov Solutions 3.Infinite Horizon Repeated Games 4.Trigger Strategy Solutions 5.Investing in Strategic Capital.
1 Game Theory. By the end of this section, you should be able to…. ► In a simultaneous game played only once, find and define:  the Nash equilibrium.
Game Theory and Computer Networks: a useful combination? Christos Samaras, COMNET Group, DUTH.
M9302 Mathematical Models in Economics Instructor: Georgi Burlakov 2.5.Repeated Games Lecture
Infinitely Repeated Games Econ 171. Finitely Repeated Game Take any game play it, then play it again, for a specified number of times. The game that is.
EC941 - Game Theory Lecture 7 Prof. Francesco Squintani
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
M9302 Mathematical Models in Economics Instructor: Georgi Burlakov 3.1.Dynamic Games of Complete but Imperfect Information Lecture
Dynamic Games of Complete Information.. Repeated games Best understood class of dynamic games Past play cannot influence feasible actions or payoff functions.
EC941 - Game Theory Prof. Francesco Squintani Lecture 8 1.
A camper awakens to the growl of a hungry bear and sees his friend putting on a pair of running shoes, “You can’t outrun a bear,” scoffs the camper. His.
Planning under Uncertainty
Rational Learning Leads to Nash Equilibrium Ehud Kalai and Ehud Lehrer Econometrica, Vol. 61 No. 5 (Sep 1993), Presented by Vincent Mak
An Introduction to Game Theory Part II: Mixed and Correlated Strategies Bernhard Nebel.
AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.
APEC 8205: Applied Game Theory Fall 2007
PRISONER’S DILEMMA By Ajul Shah, Hiten Morar, Pooja Hindocha, Amish Parekh & Daniel Castellino.
UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.
Extensive Game with Imperfect Information Part I: Strategy and Nash equilibrium.
On Bounded Rationality and Computational Complexity Christos Papadimitriou and Mihallis Yannakakis.
DANSS Colloquium By Prof. Danny Dolev Presented by Rica Gonen
1 On the Agenda(s) of Research on Multi-Agent Learning by Yoav Shoham and Rob Powers and Trond Grenager Learning against opponents with bounded memory.
Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer
Bayesian and non-Bayesian Learning in Games Ehud Lehrer Tel Aviv University, School of Mathematical Sciences Including joint works with: Ehud Kalai, Rann.
1 Game Theory Sequential bargaining and Repeated Games Univ. Prof.dr. M.C.W. Janssen University of Vienna Winter semester Week 46 (November 14-15)
Chapter 12 Choices Involving Strategy Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written.
Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie Mellon University.
Punishment and Forgiveness in Repeated Games. A review of present values.
Dynamic Games of complete information: Backward Induction and Subgame perfection - Repeated Games -
Standard and Extended Form Games A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor, SIUC.
M9302 Mathematical Models in Economics Instructor: Georgi Burlakov 4.1.Dynamic Games of Incomplete Information Lecture
Dynamic Games & The Extensive Form
Moshe Tennenholtz, Aviv Zohar Learning Equilibria in Repeated Congestion Games.
Game-theoretic analysis tools Tuomas Sandholm Professor Computer Science Department Carnegie Mellon University.
Finite Iterated Prisoner’s Dilemma Revisited: Belief Change and End Game Effect Jiawei Li (Michael) & Graham Kendall University of Nottingham.
Game Theory: introduction and applications to computer networks Game Theory: introduction and applications to computer networks Lecture 2: two-person non.
Regret Minimizing Equilibria of Games with Strict Type Uncertainty Stony Brook Conference on Game Theory Nathanaël Hyafil and Craig Boutilier Department.
Extensive Games with Imperfect Information
Topic 3 Games in Extensive Form 1. A. Perfect Information Games in Extensive Form. 1 RaiseFold Raise (0,0) (-1,1) Raise (1,-1) (-1,1)(2,-2) 2.
Mixed Strategies and Repeated Games
1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
Game Theory (Microeconomic Theory (IV)) Instructor: Yongqin Wang School of Economics, Fudan University December, 2004.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
5.1.Static Games of Incomplete Information
By: Donté Howell Game Theory in Sports. What is Game Theory? It is a tool used to analyze strategic behavior and trying to maximize his/her payoff of.
M9302 Mathematical Models in Economics Instructor: Georgi Burlakov 2.1.Dynamic Games of Complete and Perfect Information Lecture
Advanced Subjects in GT Outline of the tutorials Static Games of Complete Information Introduction to games Normal-form (strategic-form) representation.
Yuan Deng Vincent Conitzer Duke University
Dynamic Games of Complete Information
Vincent Conitzer CPS Repeated games Vincent Conitzer
Convergence, Targeted Optimality, and Safety in Multiagent Learning
Multiagent Systems Extensive Form Games © Manfred Huber 2018.
Multiagent Systems Game Theory © Manfred Huber 2018.
Equlibrium Selection in Stochastic Games
Multiagent Systems Repeated Games © Manfred Huber 2018.
Vincent Conitzer Repeated games Vincent Conitzer
Chapter 14 & 15 Repeated Games.
Chapter 14 & 15 Repeated Games.
Collaboration in Repeated Games
Vincent Conitzer CPS Repeated games Vincent Conitzer
Presentation transcript:

Constraints in Repeated Games

Rational Learning Leads to Nash Equilibrium …so what is rational learning? Kalai & Lehrer, 1993

Rational learning is… Bayesian Updating frequentist vs. Bayesian statistics What is Rational Learning?

frequentist vs. Bayesian statistics

Frequentist Approach Assume a coin 10 times and it comes up heads 8 times A frequentist approach would conclude that the coin comes up heads 80% of the time Using the relative frequency as a probability estimate, we can calculate the maximum likelihood estimate (MLE) Frequentist MLE not always accurate in all contexts For  m the model asserting P(head) = m, and s an observed sequence, the MLE is: arg max m P(s|  m )

Bayesian Approach Allows us to incorporate prior beliefs e.g., that our coin is fair (why not?) We can measure degrees of belief, which can be updated in the face of evidence using Bayes’ theorem P(  m |s) = (P(s|  m ) * P(  m ))/P(s) We already have P(s|  m ), we can quantify P(  m ) and ignore the normalization factor P(s) Arg max m P(  m |s) =.75 for P(  m ) = 6m(1-m)

Under What Conditions? Infinitely repeated game subjective beliefs about others are compatible with true strategies Players know their own payoff matrices Players choose strategies to maximize their expected utility Perfectly monitored Discounted payoffs …must eventually play according to a Nash equilibrium of the repeated game

What Isn’t Needed assumptions about the rationality of other players knowledge of the payoff matrices of other players

Definitions A game is perfectly monitored if all players have access to the complete history of the game up to the point where they are currently at. discounting introduces a factor that future payoffs are multiplied by: u i (f) = (1 - i ) ∑ t = 0 ∞ E f (x i t+1 ) i t note the relation to geometric series

…continued beliefs are compatible with true strategies if the distribution over infinite play paths induced by the belief is absolutely continuous with respect to that of the true strategies A measure  f is absolutely continuous with respect to  g (denoted  f <<  g ) if every event having a positive measure according to  f also has a positive measure according to  g.

More Definitions Let  > 0 and let  and  ’ be two probability measures defined on the same space.  is  -close to  ’ if there is a measurable set Q satisfying:  (Q) and  ’(Q) are greater than 1 -  for every measurable set A  Q (1 -  )  ’(A) <=  (A) <= (1 +  )  ’(A) For  >= 0, f plays  -like g if  f is  -close to  g

Optimality and Domination in Repeated Games Fortnow & Whang …so what are optimality and domination?

Two Types of Infinite Repeated Games History (h) -> strategy (  ) -> action (a) … payoff (u) Limit of means game (G ∞ )  i G∞ (  I,  II ) = lim inf k->∞ (1/k)∑ k j = 1 u i (a I j (  I,  II ), a II j (  I,  II )) Discounted game (G  ) with discount 0 <  < 1  i G  (  I,  II ) = (1 -  ) ∑ ∞ j = 1  k -- 1 u i (a I j (  I,  II ), a II j (  I,  II ))

Definitions Optimality: their way of saying Nash equilibrium  i G∞ (  I,  II ) -  i G∞ (  I ’,  II ) >= 0 lim inf  -> 1- (  i G  (  I,  II ) -  i G  (  I ’,  II )) >= 0 Domination: reciprocal best response not for one opposing strategy, but for all. for every choice of strategy  II, the strategy  I is optimal

Example 2, 2 0, 0 0, 0 1, 1 3, 30, 4 4, 01, 1 Mozart or Mahler has two optimal strategies, and prisoners’ dilemma has one Mozart or Mahler has no dominant strategies and prisoners’ dilemma has one

Classes of Strategies All possible strategies (rational) -- uncountably many Those strategies implemented on a Turing Machine that always halts (recursive) Those strategies implemented on a Turing Machine that halts in polynomial time in r between rounds r and r + 1 (polynomial) Those strategies implemented on a Finite State Automata (regular) can also allow behavioral versions of these strategies

Bounding the Number of Rounds We want our payoff functions to a reasonable rate of convergence to the final payoff of the game With average payoff function  i k (  I,  II ) = (1/k)∑ k j = 1 u i (a I j (  I,  II ), a II j (  I,  II )) We have  i G∞ (  I,  II ) = lim inf k->∞  i k (  I,  II )

We know that for all  > 0 there is a round t such that for all k >= t  i k (  I,  II ) >=  i G∞ (  I,  II ) -  Our bound will be to require that t be a function of  and the size of the strategy for  II (number of states of FSA) For discounted games, we use  to bound the number of rounds  I converges in t rounds for  > 0 and  > 2 -1/(t(s(  II)/  ) if  i G  (  I,  II ) -  i G∞  (  I ’,  II ) >= -  …continued

Previous Work Gilboa & Samet (1989) showed that if player II is limited to strategies realized by strongly connected FSAs, then there exists a recursive dominant strategy. Strong connection needed to protect against vengeful strategies i.e., those strategies that penalize the opponent forever simply for some earlier choice made in the game To extend this result to arbitrary finite automata we must weaken our notion of domination. An eventually dominant strategy is one that only requires domination of strategies that agree with it for some initial finite number of rounds

Extension of Previous Work For any game G, there is a recursive strategy  1 which is eventually dominant for the class of rational strategies against the class of strategies realized by finite automata The following results aim to show how well strategies of differing complexities perform against one another in certain cases In the next paper, we will see what happens when strategies are both restricted to the same complexity class

Prisoner’s Dilemma vs. Matching Pennies Prisoner’s Dilemma: max(u I (a 1, A 1 ), u I (a 2, A 1 )) != max(u I (a 1, A 2 ), u I (a 2, A 2 )) Matching Pennies: max(u I (a 1, A 1 ), u I (a 2, A 1 )) = max(u I (a 1, A 1 ), u I (a 2, A 1 ))

More Results Consider prisoner’s dilemma for any fixed 0 <=  < 1 and n. If  I is any strategy for player I then there is some rational strategy  II of player II implemented by an FSA of n states such that any strategy  -optimal strategy  I against  II will require an exponential number of rounds to converge. For matching pennies, there exists a polynomial-time strategy that dominates all finite automata and converges in a polynomial number of rounds. There exists a behavioral regular strategy for which there is no optimal rational strategy even for matching pennies.

…continued For prisoner’s dilemma, there is a polynomial-time strategy  II for which there is no eventually optimal rational strategy  I. For prisoner’s dilemma, there is some polynomial-time strategy  II such that there is an optimal rational strategy but for all 0 <=  < 1, there is no eventually  -optimal recursive strategy for the class of rational strategies. For matching pennies, there is a recursive strategy  I which is dominant for the class of rational strategies against the class of polynomial-time strategies.

Further Questions Does there exist a behavioral strategy for which there is no eventually optimal rational strategy? For some  > 0, does there exist a behavioral regular strategy for which there is no eventually  -optimal recursive of polynomial-time strategy? Does there exist a polynomial-time or recursive strategy that eventually dominates all behavioral regular strategies? Incomplete information? Finite Games? Infinite non-repeated stage games?

On Bounded Rationality and Computational Complexity Papadimitriou & Yannakakis …so what is bounded rationality?

Bounded Rationality From Simon: Reasoning and computation are costly, so agents don’t invest inordinate amounts of computational resources and reasoning power to achieve relatively insignificant gains in their payoff We can implement bounded rationality by restricting the computational complexity of a strategy. but why would we want to?

Motivation Leads to a more accurate model of the world Has interesting game-theoretic consequences Increased elegance (no needlessly complicated strategies) Leads to more and better cooperation, and therefore higher payoffs

Example Consider the prisoner’s dilemma, repeated n times for n > 1 The only Nash Equilibrium to this game is (D n, D n ) Both players play the stage game in the last round (D, D) and they proceed backwards by induction for all rounds Shouldn’t we be able to do better than this?

...continued For the n-repeated prisoner’s dilemma, the strategy space is doubly exponential in n This is not realistic for even small n Our undesirable result (no cooperation) results when we place no constraints on the complexity (i.e. states in FSA) of the strategies What happens when we constrain the complexity?

Good Things If we require that s I (n) and s II (n) are less than n -1, then the FSAs can’t count to n, and backwards induction fails. In this case, tit-for-tat is a Nash equilibrium Neyman: for s I (n) and s II (n) between n 1/k and n k for k > 1, there is an equilibrium that approximates cooperation (payoff 3 - 1/k) If s I (n), s II (n) >= 2 n then backwards induction is possible via dynamic programming

Theorem 1 For all subexponential complexities there are equilibria that are arbitrarily close to the collaborative behavior If at least one of the state bounds in the n-round prisoner’s dilemma is 2 O(  n) then for large enough n, there is a mixed equilibrium with average payoff for each player at least 3 -  This can be extended to arbitrary games and payoffs by making  a function of these new parameters

The Idea The number of histories is exponential in the length of the game Memory states can be filled with small histories (to use up space) and for the remaining states (few enough so that they can’t count too high and use backwards induction to always defect), cooperation is enabled

Some Details Players exchange short customized sequence of Cs and Ds (“buisness cards”) between them, then periodically repeat this with the XOR of these sequences intermittently with long periods of cooperation Advantage that players with D-heavy business cards have must be cancelled Imbalance in the periodic repetitions solved with XOR (then players get the same payoff as each other) The possibility of saving states by misusing punitive transitions only detected by dishonesty must be eliminated

General Games (definitions) The minimax of player I is v 1 = min y  Y max x  X g I (x, y) Player I can always guarantee this much payoff, assuming that player II uses a payoff known to player I. v = (v 1, v 2 ) is called the threat point ((1, 1) is prisoner’s dilemma) The feasible region is the convex hull of payoff combinations The individually rational region is the part of the feasible region that dominates the corresponding threat point.

General Games (theorems) The Folk Theorem: In the infinitely repeated game, all points in the Mixed individually rational region are equilibria The Folk Theorem for Automata: Let (a, b) be a payoff Combination in the infinitely repeated game with automata. TFAE: (a, b) is a pure equilibrium payoff (a, b) is a mixed equilibrium payoff with finite support and rational coefficients (a,b) is a rational point in the pure nonstrict individually rational region

More Definitions For pure strategy pairs A, B and A’, B’: they are dependent if A = A’ or B = B’ and independent otherwise they are aligned if g I (A, B) = g I (A’, B’) or g II (A, B) = g II (A’, B’) and nonaligned otherwise Every point on the Pareto boundary corresponds to either a pure strategy or a convex combination of two nonaligned pure strategies

Another Theorem Let G be an arbitrary game and let p = (p 1, p 2 ) be a point in the strict, pure individually rational region. For every  > 0, there are a, c, n > 0 such that for m >= n > 0 in the n-round repeated game G played by automata with sizes bounded by a, there is a mixed equilibrium with average payoff for each player within  of p i if either (i)p can be realized by pure strategies and at least one of the bounds is smaller than 2 c*n (ii)p can be realized as the convex combination of two nonaligned (or independent) pure strategy pairs, and both bounds are smaller than 2 c*n