Derivative Action Learning in Games Review of: J. Shamma and G. Arslan, “Dynamic Fictitious Play, Dynamic Gradient Play, and Distributed Convergence to.

Slides:



Advertisements
Similar presentations
Finite Difference Discretization of Elliptic Equations: 1D Problem Lectures 2 and 3.
Advertisements

Advanced topics in Financial Econometrics Bas Werker Tilburg University, SAMSI fellow.
A study of Correlated Equilibrium Polytope By: Jason Sorensen.
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Mixed Strategies CMPT 882 Computational Game Theory Simon Fraser University Spring 2010 Instructor: Oliver Schulte.
Evolution and Repeated Games D. Fudenberg (Harvard) E. Maskin (IAS, Princeton)
6.896: Topics in Algorithmic Game Theory Lecture 11 Constantinos Daskalakis.
Game Theoretical Insights in Strategic Patrolling: Model and Analysis Nicola Gatti – DEI, Politecnico di Milano, Piazza Leonardo.
Congestion Games with Player- Specific Payoff Functions Igal Milchtaich, Department of Mathematics, The Hebrew University of Jerusalem, 1993 Presentation.
Joint Strategy Fictitious Play Sherwin Doroudi. “Adapted” from J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for.
Chapter 6 Game Theory © 2006 Thomson Learning/South-Western.
Calibrated Learning and Correlated Equilibrium By: Dean Foster and Rakesh Vohra Presented by: Jason Sorensen.
Eponine Lupo.  Questions from last time  3 player games  Games larger than 2x2—rock, paper, scissors  Review/explain Nash Equilibrium  Nash Equilibrium.
EC3224 Autumn Lecture #04 Mixed-Strategy Equilibrium
EC941 - Game Theory Lecture 7 Prof. Francesco Squintani
Common Welfare, Strong Currencies and the Globalization Process Esteban Guevara Hidalgo Center for Nonlinear and Complex Systems (Como, Italy) ICTP-TRIL.
Regret Minimization and the Price of Total Anarchy Paper by A. Blum, M. Hajiaghayi, K. Ligett, A.Roth Presented by Michael Wunder.
Learning in games Vincent Conitzer
What is a game?. Game: a contest between players with rules to determine a winner. Strategy: a long term plan of action designed to achieve a particular.
Network Theory and Dynamic Systems Game Theory: Mixed Strategies
Working Some Problems. Telephone Game How about xexed strategies? Let Winnie call with probability p and wait with probability 1-p. For what values of.
Equilibrium Concepts in Two Player Games Kevin Byrnes Department of Applied Mathematics & Statistics.
Chapter 6 © 2006 Thomson Learning/South-Western Game Theory.
A camper awakens to the growl of a hungry bear and sees his friend putting on a pair of running shoes, “You can’t outrun a bear,” scoffs the camper. His.
1 Economics & Evolution Number 2. 2 Reading List.
Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.
An Introduction to Game Theory Part II: Mixed and Correlated Strategies Bernhard Nebel.
Communication Networks A Second Course Jean Walrand Department of EECS University of California at Berkeley.
Autonomous Target Assignment: A Game Theoretical Formulation Gurdal Arslan & Jeff Shamma Mechanical and Aerospace Engineering UCLA AFOSR / MURI.
Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces.
XYZ 6/18/2015 MIT Brain and Cognitive Sciences Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga th March, 2004.
Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.
Lectures in Microeconomics-Charles W. Upton Minimax Strategies.
Development of Empirical Models From Process Data
Quantal Response Equilibrium APEC 8205: Applied Game Theory Fall 2007.
Simulating Normal Random Variables Simulation can provide a great deal of information about the behavior of a random variable.
Games in the normal form- An application: “An Economic Theory of Democracy” Carl Henrik Knutsen 5/
Multiple timescales for multiagent learning David Leslie and E. J. Collins University of Bristol David Leslie is supported by CASE Research Studentship.
Game Theory.
Minimax Strategies. Everyone who has studied a game like poker knows the importance of mixing strategies. –With a bad hand, you often fold –But you must.
1 Economics & Evolution. 2 Cournot Game 2 players Each chooses quantity q i ≥ 0 Player i’s payoff is: q i (1- q i –q j ) Inverse demand (price) No cost.
CPS Learning in games Vincent Conitzer
A Projection Framework for Near- Potential Polynomial Games Nikolai Matni Control and Dynamical Systems, California.
Efficiency Loss in a Network Resource Allocation Game Paper by: Ramesh Johari, John N. Tsitsiklis [ Informs] Presented by: Gayatree Ganu.
MAKING COMPLEX DEClSlONS
Exponential Moving Average Q- Learning Algorithm By Mostafa D. Awheda Howard M. Schwartz Presented at the 2013 IEEE Symposium Series on Computational Intelligence.
Learning Games Presented by: Aggelos Papazissis Alexandros Papapostolou.
Nash equilibrium Nash equilibrium is defined in terms of strategies, not payoffs Every player is best responding simultaneously (everyone optimizes) This.
Dynamic Games of complete information: Backward Induction and Subgame perfection - Repeated Games -
6.853: Topics in Algorithmic Game Theory Fall 2011 Constantinos Daskalakis Lecture 11.
Game-theoretic analysis tools Tuomas Sandholm Professor Computer Science Department Carnegie Mellon University.
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
Designing Games for Distributed Optimization Na Li and Jason R. Marden IEEE Journal of Selected Topics in Signal Processing, Vol. 7, No. 2, pp ,
6.853: Topics in Algorithmic Game Theory Fall 2011 Constantinos Daskalakis Lecture 22.
Auctions serve the dual purpose of eliciting preferences and allocating resources between competing uses. A less fundamental but more practical reason.
Chapter 6 Extensive Form Games With Perfect Information (Illustrations)
Alpcan, T., and T. Basar (2004) “A game theoretic analysis of intrusion detection in access control systems” Proceedings of 43 rd IEEE Conference on Decision.
دانشگاه صنعتي اميركبير دانشكده مهندسي پزشكي استاد درس دكتر فرزاد توحيدخواه بهمن 1389 کنترل پيش بين-دکتر توحيدخواه MPC Stability-2.
Extensive Form Games With Perfect Information (Illustrations)
Vincent Conitzer CPS Learning in games Vincent Conitzer
Replicator Dynamics. Nash makes sense (arguably) if… -Uber-rational -Calculating.
Econ 805 Advanced Micro Theory 1 Dan Quint Fall 2009 Lecture 1 A Quick Review of Game Theory and, in particular, Bayesian Games.
By: Donté Howell Game Theory in Sports. What is Game Theory? It is a tool used to analyze strategic behavior and trying to maximize his/her payoff of.
Information Design: A unified Perspective
Game Theory in Wireless and Communication Networks: Theory, Models, and Applications Lecture 2 Bayesian Games Zhu Han, Dusit Niyato, Walid Saad, Tamer.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Equlibrium Selection in Stochastic Games
Multiagent Systems Repeated Games © Manfred Huber 2018.
A Dynamic System Analysis of Simultaneous Recurrent Neural Network
Normal Form (Matrix) Games
Presentation transcript:

Derivative Action Learning in Games Review of: J. Shamma and G. Arslan, “Dynamic Fictitious Play, Dynamic Gradient Play, and Distributed Convergence to Nash Equilibria,” IEEE Transactions on Automatic Control, Vol. 50, no. 3, pp , March 2005

Overview The authors propose an extension of fictitious play (FP) and gradient play (GP) in which strategy adjustment is a function of both the estimated opponent strategy and its time derivative They demonstrate that when the learning rules are well-calibrated, convergence (or near- convergence) to Nash equilibria and asymptotic stability in the vicinity of equilibria can be achieved in games where static FP and GP fail to do so

Game Setup This paper addresses a class of two-player games in which each player selects an action a i from a finite set at each instance of the game according to his mixed strategy p i and experiences utility U(p 1,p 2 ) equal to his expected payoff plus some additional utility associated with playing a mixed strategy. The purpose of the entropy term is not discussed by the authors, but it may be there to avoid converging to inferior local maxima in the utility function. Actual payoff depends on the combined player actions a 1 and a 2, each randomly selected according to the mixed strategies p 1 and p 2.

Entropy function H() rewards mixed strategy Probability of selecting a 1 in 2-dimensional strategy space

Empirical Estimation and Best Response Player i’s strategy p i is, in general, mixed and exists within the simplex defined in  m i space, where m i is the number of available actions to player i, by vertices corresponding to the available actions. Further, he adjusts his strategy by observing his opponent’s actions, formulating an empirical estimate of his opponent’s strategy q -i, and calculating the best mixed strategy in response. The adjusted strategy then will direct his next move.

Best Response Function The best response is defined by the authors to be the mixed strategy that maximizes expected payoff. The authors claim (without proof) that, for  > 0, the utility-maximizing function is the logit function.

FP in Continuous Time The remaining discussion of Fictitious Play is conducted in the continuous time domain. This allows the authors to describe the system dynamics in terms of smooth differential equations, and player actions are equivalent to their mixed strategies. The discrete-time dynamics are then interpreted as stochastic approximations of continuous-time solutions to the differential equations. This transformation is discussed in [Benaim, Hofbauer and Sorin 2003] and, presumably in [Benaim and Hirsch 1996], though I have not seen the latter myself.

Achieving Nash Equilibrium Nash equilibria are reached at fixed points of the Best Response function. Convergence to fixed points occurs as the empirical frequency estimates converge to the actual strategies played.

Derivative Action FP (DAFP): Idealized Case – Exact DAFP Exact DAFP uses directly measured first order forecast of opponent strategy in addition to observed empirical frequency in order to calculate Best Response

Derivative Action FP (DAFP): Approximate DAFP Approximate DAFP uses estimated first order forecast of opponent strategy in addition to observed empirical frequency in order to calculate Best Response

Exact DAFP, Special Case:  = 1 System Inversion – Each player seeks to play best response against current opponent strategy

Convergence with Exact DAFP in Special Case (  = 1)

Convergence with Noisy Exact DAFP in Special Case (  = 1) Suppose In words, the derivative of empirical frequencies is measurable to within some error. The authors prove that for any arbitrarily small  >0, there exists a  >0 such that if the measurement error (e 1, e 2 ) eventually remains within a  -neighborhood of the origin, then the empirical frequencies (q 1, q 2 ) will remain within an  -neighborhood of a Nash equilibrium. This suggests that, if a sufficiently accurate approximation of empirical frequency can be constructed, Approximate DAFP will converge to an arbitrary neighborhood of the Nash equilibria.

Convergence with Approximate DAFP in Special Case (  = 1)

Convergence with Approximate DAFP in Special Case (  = 1) (CONTINUED)

Simulation Demonstration: Shapley Game Consider the 2-player 3×3 game invented by Lloyd Shapley to illustrate non-convergence of fictitious play in general. Standard FP in Discrete Time (top) and Continuous Time (bottom)

Simulation Demonstration: Shapley Game Shapley Game with Approximate DAFP in Continuous Time with increasing : 1(top), 10(middle), 100(bottom) Another interesting thing here is that the players enter a correlated equilibrium, and their average payoff is higher than the expected Nash payoff. For the “modified” game, where player utility matrices are not identical, the strategies converge to theoretically unsupported values, illustrating a violation of the weak continuity requirement for  i. This steady-state error can be corrected by setting the derivative gain according to the linearization-Routh- Hurwitz procedure noted earlier.

GP Review for 2-player Games Gradient Play: Player i adjusts his strategy by observing his own empirical action frequency and adding the gradient of his Utility, as determined by his opponent’s empirical action frequency

GP in Discrete Time

GP in Continuous Time

Achieving Nash Equilibrium Gradient Play:

Derivative Action Gradient Play Standard GP cannot converge asymptotically to completely mixed Nash equilibria because the linearized dynamics are unstable at mixed equilibria. Exact DAGP always enables asymptotic stability at mixed equilibria with proper selection of derivative gain. Under some conditions, Approximate DAGP also enables asymptotic stability near mixed equilibria. Approximate DAGP always ensures asymptotic stability in the vicinity of strict equilibria.

DAGP Simulation: Modified Shapley Game

Multiplayer Games Consider the 3-player Jordan game: The authors demonstrate that DAGP converges to the mixed Nash equilibrium.

Jordan Game Demonstration