Learning Games Presented by: Aggelos Papazissis Alexandros Papapostolou.

Slides:



Advertisements
Similar presentations
An Introduction to Game Theory Part V: Extensive Games with Perfect Information Bernhard Nebel.
Advertisements

Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
M9302 Mathematical Models in Economics Instructor: Georgi Burlakov 3.1.Dynamic Games of Complete but Imperfect Information Lecture
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Evolution and Repeated Games D. Fudenberg (Harvard) E. Maskin (IAS, Princeton)
Congestion Games with Player- Specific Payoff Functions Igal Milchtaich, Department of Mathematics, The Hebrew University of Jerusalem, 1993 Presentation.
Infinitely Repeated Games. In an infinitely repeated game, the application of subgame perfection is different - after any possible history, the continuation.
Non-Cooperative Game Theory To define a game, you need to know three things: –The set of players –The strategy sets of the players (i.e., the actions they.
Joint Strategy Fictitious Play Sherwin Doroudi. “Adapted” from J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for.
Calibrated Learning and Correlated Equilibrium By: Dean Foster and Rakesh Vohra Presented by: Jason Sorensen.
M9302 Mathematical Models in Economics Instructor: Georgi Burlakov 2.5.Repeated Games Lecture
MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)
 1. Introduction to game theory and its solutions.  2. Relate Cryptography with game theory problem by introducing an example.  3. Open questions and.
EC941 - Game Theory Lecture 7 Prof. Francesco Squintani
Course: Applications of Information Theory to Computer Science CSG195, Fall 2008 CCIS Department, Northeastern University Dimitrios Kanoulas.
How Bad is Selfish Routing? By Tim Roughgarden Eva Tardos Presented by Alex Kogan.
Game-theoretic analysis tools Necessary for building nonmanipulable automated negotiation systems.
Games What is ‘Game Theory’? There are several tools and techniques used by applied modelers to generate testable hypotheses Modeling techniques widely.
An Introduction to Game Theory Part I: Strategic Games
Dynamic Games of Complete Information.. Repeated games Best understood class of dynamic games Past play cannot influence feasible actions or payoff functions.
Satisfaction Equilibrium Stéphane Ross. Canadian AI / 21 Problem In real life multiagent systems :  Agents generally do not know the preferences.
A camper awakens to the growl of a hungry bear and sees his friend putting on a pair of running shoes, “You can’t outrun a bear,” scoffs the camper. His.
Planning under Uncertainty
Rational Learning Leads to Nash Equilibrium Ehud Kalai and Ehud Lehrer Econometrica, Vol. 61 No. 5 (Sep 1993), Presented by Vincent Mak
Visual Recognition Tutorial
An Introduction to Game Theory Part II: Mixed and Correlated Strategies Bernhard Nebel.
Communication Networks A Second Course Jean Walrand Department of EECS University of California at Berkeley.
An Introduction to Game Theory Part III: Strictly Competitive Games Bernhard Nebel.
APEC 8205: Applied Game Theory Fall 2007
TOPIC 6 REPEATED GAMES The same players play the same game G period after period. Before playing in one period they perfectly observe the actions chosen.
UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
Extensive Game with Imperfect Information Part I: Strategy and Nash equilibrium.
On Bounded Rationality and Computational Complexity Christos Papadimitriou and Mihallis Yannakakis.
DANSS Colloquium By Prof. Danny Dolev Presented by Rica Gonen
Computing Equilibria Christos H. Papadimitriou UC Berkeley “christos”
Constraints in Repeated Games. Rational Learning Leads to Nash Equilibrium …so what is rational learning? Kalai & Lehrer, 1993.
Game Theory.
UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.
1 On the Agenda(s) of Research on Multi-Agent Learning by Yoav Shoham and Rob Powers and Trond Grenager Learning against opponents with bounded memory.
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer
Computing Equilibria Christos H. Papadimitriou UC Berkeley “christos”
MAKING COMPLEX DEClSlONS
Chapter 9 Games with Imperfect Information Bayesian Games.
Bayesian and non-Bayesian Learning in Games Ehud Lehrer Tel Aviv University, School of Mathematical Sciences Including joint works with: Ehud Kalai, Rann.
Chapter 12 Choices Involving Strategy Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written.
Auction Seminar Optimal Mechanism Presentation by: Alon Resler Supervised by: Amos Fiat.
Derivative Action Learning in Games Review of: J. Shamma and G. Arslan, “Dynamic Fictitious Play, Dynamic Gradient Play, and Distributed Convergence to.
Dynamic Games of complete information: Backward Induction and Subgame perfection - Repeated Games -
6.853: Topics in Algorithmic Game Theory Fall 2011 Constantinos Daskalakis Lecture 11.
Presenter: Chih-Yuan Chou GA-BASED ALGORITHMS FOR FINDING EQUILIBRIUM 1.
Games with Imperfect Information Bayesian Games. Complete versus Incomplete Information So far we have assumed that players hold the correct belief about.
Dynamic Games & The Extensive Form
Moshe Tennenholtz, Aviv Zohar Learning Equilibria in Repeated Congestion Games.
Game-theoretic analysis tools Tuomas Sandholm Professor Computer Science Department Carnegie Mellon University.
Game Theory: introduction and applications to computer networks Game Theory: introduction and applications to computer networks Lecture 2: two-person non.
Chapters 29, 30 Game Theory A good time to talk about game theory since we have actually seen some types of equilibria last time. Game theory is concerned.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
Beyond selfish routing: Network Games. Network Games NGs model the various ways in which selfish agents strategically interact in using a network They.
1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.
Strategic Behavior in Business and Econ Static Games of complete information: Dominant Strategies and Nash Equilibrium in pure and mixed strategies.
By: Donté Howell Game Theory in Sports. What is Game Theory? It is a tool used to analyze strategic behavior and trying to maximize his/her payoff of.
Network Formation Games. NFGs model distinct ways in which selfish agents might create and evaluate networks We’ll see two models: Global Connection Game.
Approximation Algorithms based on linear programming.
Game theory basics A Game describes situations of strategic interaction, where the payoff for one agent depends on its own actions as well as on the actions.
The Duality Theorem Primal P: Maximize
Game Theory.
Multiagent Systems Repeated Games © Manfred Huber 2018.
Normal Form (Matrix) Games
Presentation transcript:

Learning Games Presented by: Aggelos Papazissis Alexandros Papapostolou

Subject Correlated Equilibria Learning, Mutation & Long run equilibria Rational learning leading to NE Dynamic Fictitious play & Dynamic Gradient Play

Computing Correlated Equilibria in Multi-Player Games Nash Equilibrium (N.E.) is the standard notion of rationality in game theory Correlated Equilibrium (C.E.) is another competing notion of rationality, more general than N.E. Christos Papadimitriou Tim Roughgarden Advantages C.E vs. N.E.: 1.It is guaranteed to always exist 2.Is arises from simple and natural dynamics in sense that the N.E. does not 3.It can be found in polynomial time for any number of players and strategies by linear programming 4.Ensures Pareto Optimality within the set of correlated equilibria RepresentationSize(O(…))Pure NEMixed NECEOptimal CE Normal formns n LinearPPAD-completePP Graphicalns d+1 NP-completePPAD-completePNP-hard SymmetricNP-completePPAD-completePP AnonymousNP-hardPP Polymatrixn 2 * s 2 PPAD-completePNP-hard CircuitNP-complete CongestionPLS-completePNP-hard

The idea of Correlated Equilibria While the mixed N.E. is a distribution on the strategy space that is “uncorrelated”(independent distributions), a C.E. is general distribution over strategy profiles. Each player chooses his action according to his observation of the value of the same public signal. A strategy assigns an action to every possible observation a player can make. If no player would want to deviate from the recommended strategy (assuming the others don't deviate but obey),because is the best in expectation, the distribution is called a correlated equilibrium.

An example of C.E. Dare-Chicken Out DC D0, 07, 2 C2, 76, 6 5 distributions that are C.E /2 0 01/3 1/4 Pure N.E. Mixed N.E. Both players playing the mixed strategy {1/2,1/2} (3.75,3.75) Traffic Light (4.5,4.5) A third party draws 3 cards:(C,C),(D,C),(C,D) D: don’t deviate C: the other plays C with ½ and D with ½. Daring:0(1/2)+7(1/2)=3.5 Chicken:2(1/2)+6(1/2)=4 Therefore, chooses C. Nobody wants to change strategy and so C.E. Expected payoff= 7(1/3)+2(1/3)+6(1/3)=5 Better of mixed N.E. Author: Iskander Karibzhanov

Computing Correlated Equilibria (Pmp’s) Existence Proof: Every game has a CE. Linear Programming Duality. (P)-(D) is infeasible Constructive Proof: Apply Ellipsoid Algorithm to Dual, which has polynomially many variables and exponentially many constraints, so as to reduce the number of dual constraints to polynomial and that number suffices to compute CE’s and furthermore an optimal CE. At each step of the ellipsoid algorithm we find violated convex combinations of the constraints of (D) by using Markov Chain computations. At the conclusion of the algorithm, we have a polynomial number of such combinations (the cuts of ellipsoid algorithm) that are themselves in feasible. We call them (D’) Solving this dual of this new linear program, (P’), gives us the required CE as a polynomial mixtures of products (pmp’s) To optimize the CE, we reduce the dimensiomality of linear programming. For each player, the strategy profiles of all opponents are divided into equivalence classes, in which player’s utility depends only on his own choice. Main Result : Every game has a CE that is a convex combination of product distributions, whose is polynomial in n(players) and m(strategies)

Computing Optimal Correlated Equilibria The algorithm provides a refinement of a polynomial CE scheme that is guaranteed to sample the strategy space of the given game G according to an optimal CE. It can be formulated as a linear program, the dual of which can be solved in polynomial time via the ellipsoid method. The ellipsoid method will only generate a polynomial number of dual constraints during its execution and these constraints suffice to compute an optimal dual solution The primal variables (strategy profiles) that correspond to these dual constraints then suffice to solve the primal problem optimally. Since this “reduced primal” has a polynomial number of variables and constraints, it can be solved in polynomial time, yielding an optimal CE.

Learning, Mutation, and Long Run Equilibria in Games Appliance in 2x2 Symmetric Games Natural Selection of Long-Run Equilibria through learning, bounded rationality and mutation 3 Hypothesis: 1. Inertia Hypothesis:Not all agents need to react instantaneously to their environment 2. Myopia Hypothesis:Players react myopically 3. Mutation Hypothesis:Some agents may change their strategies at random

A Symmetric 2x2 Game II I s1s2 s12,20,0 s20,01,1 Choice of computer system in a small community: Ten students, using one of the two systems s1,s2. They randomly meeting each other and when two are in the same computer they collaborate. s1 superior of s2 2 N.E.: E1=(s1,s1) E2(s2,s2) Path dependence: If at least 4 students are initially using computer s1, then all students eventually end up using s1 Mutation: student leaves, the newcomer chooses s1 according to s1-users of outside world. Thus, with mutations we change the two equilibria E1 and E2(Darwinian adjustments) E1E1 E2E2 p P’ (p’/(p+p’),p/(p+p’)) As e->0 (1,0) E1 Pareto dominant Eq. Long-Run Eq. Takes periods to be upset while E2 takes 78 Assume mutation rate is small

Coordination Games Critical level z* (0) Stage 6:{3,4,5,6} Stage 0:{0,1,2} The least-cost 6-tree: 3 mutations (0) (3) The least-cost 0-tree: 4 mutations Conclusion: Stage 6, which has the larger basin of attraction, achieves the minimum among all states. The Eq. with the largest basin of attraction is always selected in the long run Thus, upsetting equilibria by large jumps is a natural consequence of independent mutations. (4) (0) (2) (0) (2) Less likely than immediate jumps. 4 mutations

Rational Learning leads to NE Nash equilibrium central concept Only once played games – not fully understood process Repeated interaction – enough time to observe opponents’ behavior – obtain statistical learning theory leading to NE Repeated play among a small number of subjectively rational agents Shortcomings of ‘myopic’ behavior of player 1. Ignores the fact that his opponents use also a dynamic learning process 2. Would not perform a costly experiment 3. Ignores strategic considerations regarding the future Theoretic approach to overcome these flaws 1. standard perfect monitoring infinitely repeated game with discounting 2. fixed matrix of payoff for every action combination by the group of players 3. discount factor that is used to evaluate future payoffs 4. maximize the present value of total expected payoff 5. need not have any info about opponents’ payoff matrix nor their rationality

Rational Learning leads to NE Main message of paper if players start with a vector of subjectively rational strategies if their individual subjective beliefs regarding opponents’ actions are compatible with the truly chosen strategies then they must converge in finite time to play according to an ε-NE (for small ε) Means they will learn to predict the future play of the game and play ε-NE Assumptions of model 1. Players’ objective to maximize their long term expected discounted payoff Learning is not a goal itself but rather a consequence of overall plans 2. Learning through Bayesian updating of individual prior beliefs 3. Not requiring full knowledge of each other’s strategies nor have known prior distributions on parameters of game 4. Independence of strategies Learning in myopic theories may never converge or may converge to false beliefs Dynamic approaches have the potential to overcome this difficulty Experimentation is evaluated according to long run contribution to expected utility A player chooses randomly when and how to experiment Optimal experimentation leads to an individually optimal solution

Examples and elaboration (1) Infinitely repeated Prisoner’s Dilemma Game PII (she) A Ca>b>c>d PI (he) A C  Discount parameter λ1 (0<λ1<1) to evaluate infinite streams of payoff  Knowledge of own parameters and perfect monitoring  Prior subjective beliefs probability distribution  Set of pure strategies g t, t = 0,1,2,…,∞ with g ∞ : the trigger strategy  If not triggered earlier, g t will prescribe unprovoked aggression strategy A starting from time t on  PI believes that PII is likely to cooperate by playing her grim trigger strategy but also he believes that are positive probabilities that she will stop cooperating earlier for some reasons, he assigns her strategies g 0,g 1,…,g ∞ probabilities β = (β 0,β 1,…,β ∞ ) that sum to 1 with β t > 0.  A best response strategy of form g T1 for T1 = 0,1,….,∞  PII holds similar beliefs ( vector α ) about PI’s strategy and chooses a strategy g T2 as her best response  Now the game will be played according to the two strategies (g t1, g t2 )  Learning to predict the future play must occur.  Players’ beliefs only converge to the truth as time goes ‚ ca db

Examples and elaboration (2) Learning and Teaching 2-person infinite symmetric version of “Chicken Game” Not passive state of learning Optimizers, who believe their opponents are open to learning, may find it interesting to act as teachers Simultaneously, at the beginning and with perfect monitoring after every history, each player chooses to yield (Y), or insist (I) PII  Y I PI Y I S t strategy : the one that prescribes the initial yielding at time t No dominant strategies & 2 symmetric pure strategy NE (S 0,S ∞ : he yields immediately and she insists forever) and (S ∞,S 0 ) Vectors α and β for PII’s and PI’s beliefs respectively PI may think (putting himself partially in her shoes) that she is equally likely to wait any number of the first n periods before yielding, to see if he would yield first or that she may insist forever with probability ε because (unlike him) she assigns a very large loss to ever yielding If future is important enough to PI, his best response to β would be to wait n periods in case she yields first, but if she does not, then yield himself at time n+1 As long as she is willing to find out about him, he will try to convince her by his actions that he is tough If both players adopt such reasoning, a pair of strategies (s T1,s T2 ) will be chosen If T1=0 and T2>0, there was no attempt to teach on the part of PI and the play is in some NE If 0<T1<T2, PI failed to teach her & no NE If T1>T2, we obtain no NE and he wins 0,01,2 2,1-1,-1

Main Results Theorem 1 Let f and f i n-vectors of strategies actually chosen and the beliefs of player i, respectively. If f is absolutely continuous (w.r.t. f i ), then for every ε >0 and for almost every play path z, there is a time T s.t. for all t>= T, f z(t) plays like ε-like f z(t) I 1. This theorem states that if the vector of strategies actually chosen is absolutely continuous, then the player will learn to accurately predict the future play of the game 2. The real probability of any future history cannot differ from the beliefs of player i by more than ε Theorem 2 Let f and f 1,f 2,…,f n be strategy vectors respectively the one actually played and the beliefs of the players. suppose that for every player i : 1) f i is a best response to f i -i 2) f is absolutely continuous w.r.t. f i then for every ε>0 and for almost all play paths z there is a time T=T (z, e) s.t. for every t>=T there exists an ε-equilibrium f^ of the repeated game satisfying that f z(t) plays ε-like f^ 1. In other words, given any ε >0,with probability 1 there will be some time T after which the players will play ε-like an ε-NE. 2. If utility maximizing players start with individual subjective beliefs, then in the long run their behavior must be essentially the same as a behavior described by an ε-NE

Dynamic fictitious play (intro) Best Response strategy Empirical frequencies : running average of opponent actions A player jumps to the best response to the empirical frequencies of the opponent Continuous-time form of a repeated game in which players continually update strategies in response to observations of opponent actions, without knowledge of opponent intentions A fictitious player does not have the ability to be forward looking, accurately anticipate her opponents’ play, or understand how her behavioral rule will affect her opponents’ responses. Primary objective : how interacting players could converge to a NE By playing the optimized best response to the observed empirical frequencies, the optimizing player will eventually converge to its own optimal response to the fixed strategy opponent If both players presumed that the other player is using a constant strategy, their update mechanisms become intertwined FICTITIOUS PLAY (FP) The repeated game would be in equilibrium if the empirical frequencies converged Will repeated play converge to a NE??? Results that establish convergence of FP 2-player / zero sum games 2-player / 2 move games noisy 2-player / 2 move games with a unique NE Noisy 2-player / 2 move games with countable NE 2-player games where one player has only 2 moves Empirical frequencies need not converge Shapley game (2 players/ 3 moves each) Jordan counterexample (3 players/ 2 moves each)

Dynamic fictitious play (FP setup) STATIC GAME each player p i selects a strategy and receives a real-valued reward according to the utility function u i (p i, p -i ) u 1 (p 1,p 2 ) = p T M 1 p 2 + τH(p 1 ) u 2 (p 2,p 1 ) = p T M 2 p 1 + ΤH(p 2 ) τH (p i ) : weighted entropy of strategy u i (p i, p * -i ) <= u i (p * i, p * -i ) p i * = β i (p * -i ) : best response Strategy of player P i at time k is the optimal response to the running average of the opponent’s actions p i (k) =β i (q -I (k)) DYNAMIC FP Derivative action FP can lead in some cases to behaviors converging to NE in previously non convergent situations q’ 1 (t) = β 1 (q 2 (t)) – q 1 (t) q’ 2 (t) = β 2 (q 1 (t)) – q 2 (t) Each player’s strategy is a best response to a combination of empirical frequencies and a weighted derivative of empirical frequencies p i (t) = β i (q -i (t) + γq’ -i (t)) Exact derivative action FP (exact DAFP) q’ 1 = β 1 (q 2 + γq’ 2 ) – q 1 q’ 2 = β 2 (q 1 + γq’ 1 ) – q 2 Approximate derivative action FP (approximate DAFP) q’ 1 = β 1 (q 2 + γλ (q 2 – r 2 )) – q 1, with λ>0 q’ 2 = β 2 (q 1 + γλ (q 1 – r 1 )) – q 2 Noisy derivative measurements q’ 1 = β 1 (q 2 + q’ 2 + e 2 ) – q 1 q’ 2 = β 2 (q 1 + q’ 1 + e 1 ) – q 2 Empirical frequencies converge to a neighborhood of the set of Nash equilibria (size depends on the accuracy of the derivative measurements)

Shapley “Fashion” Game (1) 2 players, each with 3 moves: {Red, Green, Blue} Player1: Fashion leader wants to differ from Player2 Player2: Fashion follower wants to copy Player1 Key assumption: Players do not announce preferences Daily routine: – Play game – Observe actions – Update strategies

Shapley “Fashion” Game (2) Empirical frequencies approach to the (unique) NE Empirical frequencies converge to wrong values As λ increases, the oscillations are progressively reduced Derivative action FP can be locally convergent when standard FP is not convergent

Dynamic Gradient Play Better Response strategy A player adjusts a current strategy in a gradient direction suggested by the empirical frequencies of the opponent Utility function u i (p i, p -i) = p i T M i p -i strategy of each player p i (t) = Π Δ [ q i (t) + Μ i q -i (t) ] A combination of a player’s own empirical frequency and a projected gradient – step using the opponent’s empirical frequencies q’ 1 (t) = Π Δ [ q 1 (t) + Μ 1 q 2 (t) ] - q 1 (t) q’ 2 (t) = Π Δ [ q 2 (t) + Μ 2 q 1 (t) ] – q 2 (t) Equilibrium points of continuous – time GP are precisely NE Gradient based evolution cannot converge to a completely mixed NE q’ 1 = Π Δ [ q 1 + Μ 1 (q 2 + γq’ 2 )] - q 1 q’ 2 = Π Δ [ q 2 + Μ 2 (q 1 + γq’ 1 )] – q 2 In (ideal) case of exact DAGP there always exist a γ, s.t. a completely mixed NE is locally asymptotically stable Stability of approximate DAGP may or may not be achieved 1. Completely mixed NE : never asymptotically stable under standard GP but derivative action can be enable convergence 2. Strict NE : approximate DAGP always results in locally stable behavior near (strict) NE

Dynamic Gradient Play The empirical frequencies converge to completely mixed NE

Jordan anti-coordination Game 3 players, each with 2 moves: {Left, Right} Player1 wants to differ from Player2 Player2 wants to differ from Player3 Player3 wants to differ from Player1 Players do not announce preferences Daily routine: – Play game – Observe actions – Update strategies Standard FP does not converge