Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces.

Slides:



Advertisements
Similar presentations
Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
Advertisements

Game Theory Assignment For all of these games, P1 chooses between the columns, and P2 chooses between the rows.
Mixed Strategies CMPT 882 Computational Game Theory Simon Fraser University Spring 2010 Instructor: Oliver Schulte.
Introduction to Game theory Presented by: George Fortetsanakis.
Game Theoretical Insights in Strategic Patrolling: Model and Analysis Nicola Gatti – DEI, Politecnico di Milano, Piazza Leonardo.
Tacit Coordination Games, Strategic Uncertainty, and Coordination Failure John B. Van Huyck, Raymond C. Battalio, Richard O. Beil The American Economic.
Joint Strategy Fictitious Play Sherwin Doroudi. “Adapted” from J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for.
Chapter 6 Game Theory © 2006 Thomson Learning/South-Western.
Calibrated Learning and Correlated Equilibrium By: Dean Foster and Rakesh Vohra Presented by: Jason Sorensen.
MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)
EC3224 Autumn Lecture #04 Mixed-Strategy Equilibrium
EC941 - Game Theory Lecture 7 Prof. Francesco Squintani
M9302 Mathematical Models in Economics Instructor: Georgi Burlakov 2.5.Repeated Games Lecture
Learning in games Vincent Conitzer
Network Theory and Dynamic Systems Game Theory: Mixed Strategies
Economics 202: Intermediate Microeconomic Theory 1.HW #6 on website. Due Thursday. 2.No new reading for Thursday, should be done with Ch 8, up to page.
An Introduction to Game Theory Part I: Strategic Games
Chapter 6 © 2006 Thomson Learning/South-Western Game Theory.
by Vincent Conitzer of Duke
An Introduction to Game Theory Part II: Mixed and Correlated Strategies Bernhard Nebel.
Communication Networks A Second Course Jean Walrand Department of EECS University of California at Berkeley.
Game Theory and Applications following H. Varian Chapters 28 & 29.
Static Games of Complete Information: Equilibrium Concepts
Outline  In-Class Experiment on a Coordination Game  Test of Equilibrium Selection I :Van Huyck, Battalio, and Beil (1990)  Test of Equilibrium Selection.
1 Computing Nash Equilibrium Presenter: Yishay Mansour.
XYZ 6/18/2015 MIT Brain and Cognitive Sciences Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga th March, 2004.
AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.
Quantal Response Equilibrium APEC 8205: Applied Game Theory Fall 2007.
Nash Equilibrium - definition A mixed-strategy profile σ * is a Nash equilibrium (NE) if for every player i we have u i (σ * i, σ * -i ) ≥ u i (s i, σ.
Game Theoretic Analysis of Oligopoly lr L R 0000 L R 1 22 The Lane Selection Game Rational Play is indicated by the black arrows.
Multiple timescales for multiagent learning David Leslie and E. J. Collins University of Bristol David Leslie is supported by CASE Research Studentship.
EC941 - Game Theory Francesco Squintani Lecture 3 1.
1 Economics & Evolution. 2 Cournot Game 2 players Each chooses quantity q i ≥ 0 Player i’s payoff is: q i (1- q i –q j ) Inverse demand (price) No cost.
Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer
EC941 - Game Theory Prof. Francesco Squintani Lecture 2 1.
CPS Learning in games Vincent Conitzer
Game Theory.
CPS 170: Artificial Intelligence Game Theory Instructor: Vincent Conitzer.
A Projection Framework for Near- Potential Polynomial Games Nikolai Matni Control and Dynamical Systems, California.
September 15September 15 Multiagent learning using a variable learning rate Igor Kiselev, University of Waterloo M. Bowling and M. Veloso. Artificial Intelligence,
MAKING COMPLEX DEClSlONS
Changing Perspective… Common themes throughout past papers Repeated simple games with small number of actions Mostly theoretical papers Known available.
Exponential Moving Average Q- Learning Algorithm By Mostafa D. Awheda Howard M. Schwartz Presented at the 2013 IEEE Symposium Series on Computational Intelligence.
Derivative Action Learning in Games Review of: J. Shamma and G. Arslan, “Dynamic Fictitious Play, Dynamic Gradient Play, and Distributed Convergence to.
Learning in Multiagent systems
Nash equilibrium Nash equilibrium is defined in terms of strategies, not payoffs Every player is best responding simultaneously (everyone optimizes) This.
Dynamic Games of complete information: Backward Induction and Subgame perfection - Repeated Games -
Introduction to Game Theory application to networks Joy Ghosh CSE 716, 25 th April, 2003.
Game Theory: introduction and applications to computer networks Game Theory: introduction and applications to computer networks Introduction Giovanni Neglia.
The Science of Networks 6.1 Today’s topics Game Theory Normal-form games Dominating strategies Nash equilibria Acknowledgements Vincent Conitzer, Michael.
Strategic Behavior in Business and Econ Static Games of complete information: Dominant Strategies and Nash Equilibrium in pure and mixed strategies.
CPS 570: Artificial Intelligence Game Theory Instructor: Vincent Conitzer.
Game Theory (Microeconomic Theory (IV)) Instructor: Yongqin Wang School of Economics, Fudan University December, 2004.
Vincent Conitzer CPS Learning in games Vincent Conitzer
5.1.Static Games of Incomplete Information
Replicator Dynamics. Nash makes sense (arguably) if… -Uber-rational -Calculating.
Extensive Form (Dynamic) Games With Perfect Information (Theory)
ECO290E: Game Theory Lecture 8 Games in Extensive-Form.
9.2 Mixed Strategy Games In this section, we look at non-strictly determined games. For these type of games the payoff matrix has no saddle points.
Game Theory Georg Groh, WS 08/09 Verteiltes Problemlösen.
 When playing two person finitely repeated games, do people behave like they are adapting a policy directly(Gradient Accent) or do they behave like they.
Game theory basics A Game describes situations of strategic interaction, where the payoff for one agent depends on its own actions as well as on the actions.
Chapter 28 Game Theory.
The Duality Theorem Primal P: Maximize
Replicator Dynamics.
Communication Complexity as a Lower Bound for Learning in Games
Economics & Evolution.
Equlibrium Selection in Stochastic Games
Vincent Conitzer Learning in games Vincent Conitzer
Vincent Conitzer CPS Learning in games Vincent Conitzer
Presentation transcript:

Learning in Games

Fictitious Play

Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces S -1, S -2, …, S -n n Payoff Functions u 1, u 2,…, u n For each i and each s -i in S -i a set of Best Responses BR i (s -i )

What is Fictitious Play? Each player creates an assessment about the opponent’s strategies in form of a weight function:

Prediction Probability of player i assigning to player –i playing s -i at time t:

Fictious Play is … … any rule that assigns NOT UNIQUE!

Further Definitions In 2 Player games: Marginal empirical distributions of j’s play (j=-i)

Propositions:  Strict Nash equilibria are absorbing for the process of fictitious play.  Any pure-strategy steady state of fictitous play must be a Nash equilibrium Asymptotic Behavior

Example “matching pennies” 1,-1-1,1 1,-1 HT H T

Example “matching pennies” 1,-1-1,1 1, Weights: Row PlayerCol Player HT H T HTHT

Example “matching pennies” 1,-1-1,1 1, Weights: Row PlayerCol Player HT H T HTHTHTHT HTHT HTHT HTHT

Example “matching pennies” 1,-1-1,1 1, Weights: Row PlayerCol Player HT H T HTHT HTHT HTHT HTHT

Example “matching pennies” 1,-1-1,1 1, Weights: Row PlayerCol Player HT H T HTHT HTHT HTHT HTHT

Example “matching pennies” 1,-1-1,1 1, Weights: Row PlayerCol Player HT H T HTHT HTHT HTHT HTHT

Example “matching pennies” 1,-1-1,1 1, Weights: Row PlayerCol Player HT H T HTHT HTHT HTHT HTHT

Example “matching pennies” 1,-1-1,1 1, Weights: Row PlayerCol Player HT H T HTHT HTHT HTHT HTHT

Example “matching pennies” 1,-1-1,1 1, Weights: Row PlayerCol Player HT H T HTHT HTHT HTHT HTHT

Example “matching pennies” 1,-1-1,1 1, Weights: Row PlayerCol Player HT H T HTHT HTHT 1.52 HTHT HTHT

Weights: Row PlayerCol Player

Convergence? …but the marginal empirical distributions? Strategies cycle and do not converge …

MATLAB Simulation - Pennies Game PlayPayoffWeight / Time

Proposition Under fictitious play, if the empirical distributions over each player’s choices converge, the strategy profile corresponding to the product of these distributions is a Nash equilibrium.

Rock-Paper-Scissors Game PlayPayoffWeight / Time 0,11,0 0,1 1,0 AB A B C C

Rock-Paper-Scissors Game PlayPayoffWeight / Time 0,11,0 0,1 1,0

Shapley Game Game PlayPayoffWeight / Time 0,00,11,0 0,00,1 1,00,0

Persistent miscoordination Game PlayPayoffWeight / Time 0,01,1 0,0 AB B A Initial weights: Nash:(1,0) (0,1) (0.5,0.5)

Persistent miscoordination Game PlayPayoffWeight / Time 0,01,1 0,0 AB B A Initial weights: Nash:(1,0) (0,1) (0.5,0.5)

Persistent Miscoordination Game PlayPayoffWeight / Time 0,01,1 0,0 AB B A Initial weights: Nash:(1,0) (0,1) (0.5,0.5)

Summary on fictitious play In case of convergence, the time average of strategies forms a Nash Equilibrium The average payoff does not need to be the one of a Nash (e.g. Miscoordination) Time average may not converge at all (e.g. Shapley Game)

References Fudenberg D., Levine D. K. (1998) The Theory of Learning in Games MIT Press

Nash Convergence of Gradient Dynamics in General-Sum Games

Notation 2 Players:  Strategies and  Payoff matricies r 11 r 12 r 21 r 22 c 11 c 12 c 21 c 22 R= C=

Objective Functions Payoff Functions:  V r ( ,  )=r 11 (  )+r 22 ((1-  )(1-  )) +r 12 (  (1-  ))+r 21 ((1-  )  )  V c ( ,  )=c 11 (  )+c 22 ((1-  )(1-  )) +c 12 (  (1-  ))+c 21 ((1-  )  )

Hillclimbing Idea

Gradient Ascent for Iterated Games With u=(r 11 +r 22 )-(r 21 +r 12 ) u’=(c 11 +c 22 )-(c 21 +c 12 )

Update Rule can be arbitrary strategies

Problem Gradient can lead the players to an infeasible point outside the unit square

Solution: Redefine the gradient to the projection of the true gradient onto the boundary Let this denote the constrained dynamics!

Infinitesimal Gradient Ascent (IGA) Become functions of time!

1. Case: U is invertible The two possible qualitative forms of the unconstrained strategy pair:

2. Case: U is not invertible Some examples of qualitative forms of the unconstrained strategy pair:

Convergence If both players follow the IGA rule, then both player’s average payoffs will converge to the expected payoff of some Nash equilibrium If the strategy pair trajectory converges at all, then it converges to a Nash pair.

Proposition Both previous propositions also hold with finite decreasing step size

References Singh S., Kearns M., Yishay M. (2000) Nash Convergence of Gradient Dynamics in General-Sum Games Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, pages

Dynamic computation of Nash equilibria in Two- Player general-sum games.

2 Players:  Strategies and  Payoff matricies Notation R=C=

Objective Functions Payoff Functions: Payoff Functions:  Row Player:  Col Player:

This means: If then the value of p i the payoff. Observation! is linear in each p i and q j Let x i denote the pure strategy for action i. increases increasing

Hill climbing (again) Multiplicative Update Rules

Hill climbing (again) System of Differential Equations (i=1..n)

either or Fixed Points?

When is a Fixpoint a Nash? Proposition: Provided all p i (0) are neither 0 nor 1, then if (p,q) converges to (p *,q * ) then this is a Nash Equilibrium.

Unit Square? No Problem! p i =0 or p i =1 both set to zero!

Convergence of the average of the payoff If the (p,q) trajectory and both player’s payoffs converge in average, the average payoff must be the payoff of some Nash Equilibrium

2 Player 2 Action Case Either the strategies converge immediately to some pure strategy, or the difference between the Kullback-Leibler distances of (p,q) and some mixed Nash are constant.

Trajectories of the difference between the Kullback-Leibler Distances Nash

But… … for games with more than 2 actions, convergence is not guaranteed! Counterexample: Shapley Game