Multiple timescales for multiagent learning David Leslie and E. J. Collins University of Bristol David Leslie is supported by CASE Research Studentship.

Slides:



Advertisements
Similar presentations
Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
Advertisements

Introduction to Game theory Presented by: George Fortetsanakis.
Congestion Games with Player- Specific Payoff Functions Igal Milchtaich, Department of Mathematics, The Hebrew University of Jerusalem, 1993 Presentation.
6-1 LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to MultiAgent Systems
ECO290E: Game Theory Lecture 5 Mixed Strategy Equilibrium.
Chapter 6 Game Theory © 2006 Thomson Learning/South-Western.
EC941 - Game Theory Lecture 7 Prof. Francesco Squintani
Learning in games Vincent Conitzer
Chapter 6 © 2006 Thomson Learning/South-Western Game Theory.
Social Networks 101 P ROF. J ASON H ARTLINE AND P ROF. N ICOLE I MMORLICA.
Satisfaction Equilibrium Stéphane Ross. Canadian AI / 21 Problem In real life multiagent systems :  Agents generally do not know the preferences.
Eponine Lupo.  Game Theory is a mathematical theory that deals with models of conflict and cooperation.  It is a precise and logical description of.
1 Duke PhD Summer Camp August 2007 Outline  Motivation  Mutual Consistency: CH Model  Noisy Best-Response: QRE Model  Instant Convergence: EWA Learning.
IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com.
Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.
Rational Learning Leads to Nash Equilibrium Ehud Kalai and Ehud Lehrer Econometrica, Vol. 61 No. 5 (Sep 1993), Presented by Vincent Mak
Equilibria in Social Belief Removal Thomas Meyer Meraka Institute Pretoria South Africa Richard Booth Mahasarakham University Thailand.
An Introduction to Game Theory Part II: Mixed and Correlated Strategies Bernhard Nebel.
Communication Networks A Second Course Jean Walrand Department of EECS University of California at Berkeley.
Lectures in Microeconomics-Charles W. Upton Simple Business Games.
Simple Business Games. How much to Produce Dominant Strategy Nash Equilibrium.
UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Review Midterm3/19 3/12.
INSTITUTO DE SISTEMAS E ROBÓTICA Minimax Value Iteration Applied to Robotic Soccer Gonçalo Neto Institute for Systems and Robotics Instituto Superior Técnico.
Outline  In-Class Experiment on a Coordination Game  Test of Equilibrium Selection I :Van Huyck, Battalio, and Beil (1990)  Test of Equilibrium Selection.
Outline  In-Class Experiment on a Coordination Game  Test of Equilibrium Selection I :Van Huyck, Battalio, and Beil (1990)  Test of Equilibrium Selection.
Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces.
XYZ 6/18/2015 MIT Brain and Cognitive Sciences Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga th March, 2004.
Nov 2003Group Meeting #2 Distributed Optimization of Power Allocation in Interference Channel Raul Etkin, Abhay Parekh, and David Tse Spectrum Sharing.
AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.
Correlated-Q Learning and Cyclic Equilibria in Markov games Haoqi Zhang.
Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
Rationality and information in games Jürgen Jost TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A AAA Max Planck.
Near-Optimal Network Design with Selfish Agents By Elliot Anshelevich, Anirban Dasgupta, Eva Tardos, Tom Wexler STOC’03 Presented by Mustafa Suleyman CIFTCI.
1 Mean Field Interaction Models for Computer and Communication Systems and the Decoupling Assumption Jean-Yves Le Boudec EPFL – I&C – LCA Joint work with.
Quantal Response Equilibrium APEC 8205: Applied Game Theory Fall 2007.
Reshef Meir School of Computer Science and Engineering Hebrew University, Jerusalem, Israel Joint work with Maria Polukarov, Jeffery S. Rosenschein and.
Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center
1 The epidemic in a closed population Department of Mathematical Sciences The University of Liverpool U.K. Roger G. Bowers.
Extending Implicit Negotiation to Repeated Grid Games Robin Carnow Computer Science Department Rutgers University.
Game Theoretic Analysis of Oligopoly lr L R 0000 L R 1 22 The Lane Selection Game Rational Play is indicated by the black arrows.
1 On the Agenda(s) of Research on Multi-Agent Learning by Yoav Shoham and Rob Powers and Trond Grenager Learning against opponents with bounded memory.
CPS Learning in games Vincent Conitzer
September 15September 15 Multiagent learning using a variable learning rate Igor Kiselev, University of Waterloo M. Bowling and M. Veloso. Artificial Intelligence,
By: Gang Zhou Computer Science Department University of Virginia 1 A Game-Theoretic Framework for Congestion Control in General Topology Networks SYS793.
Material Model Parameter Identification via Markov Chain Monte Carlo Christian Knipprath 1 Alexandros A. Skordos – ACCIS,
Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.
1 A unified approach to comparative statics puzzles in experiments Armin Schmutzler University of Zurich, CEPR, ENCORE.
Decentralised load balancing in closed and open systems A. J. Ganesh University of Bristol Joint work with S. Lilienthal, D. Manjunath, A. Proutiere and.
Exponential Moving Average Q- Learning Algorithm By Mostafa D. Awheda Howard M. Schwartz Presented at the 2013 IEEE Symposium Series on Computational Intelligence.
Algorithms for a large sparse nonlinear eigenvalue problem Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
Derivative Action Learning in Games Review of: J. Shamma and G. Arslan, “Dynamic Fictitious Play, Dynamic Gradient Play, and Distributed Convergence to.
CHAPTER 4 S TOCHASTIC A PPROXIMATION FOR R OOT F INDING IN N ONLINEAR M ODELS Organization of chapter in ISSO –Introduction and potpourri of examples Sample.
Introduction Many decision making problems in real life
Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.
Coevolution of Cooperative Strategies under Egoism Speaker: Ta-Chun Lien.
Evolving cooperation in one-time interactions with strangers Tags produce cooperation in the single round prisoner’s dilemma and it’s.
Equilibria in Network Games: At the Edge of Analytics and Complexity Rachel Kranton Duke University Research Issues at the Interface of Computer Science.
Equilibrium transitions in stochastic evolutionary games Dresden, ECCS’07 Jacek Miękisz Institute of Applied Mathematics University of Warsaw.
Designing Games for Distributed Optimization Na Li and Jason R. Marden IEEE Journal of Selected Topics in Signal Processing, Vol. 7, No. 2, pp ,
MAIN RESULT: We assume utility exhibits strategic complementarities. We show: Membership in larger k-core implies higher actions in equilibrium Higher.
On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.
Competitive Safety Analysis: Robust Decision-Making in Multi- Agent Systems Moshe Tennenholtz Kyle Rokos.
Mean Field Methods for Computer and Communication Systems Jean-Yves Le Boudec EPFL Network Science Workshop Hong Kong July
Convergence, Targeted Optimality, and Safety in Multiagent Learning
Multiagent Systems Game Theory © Manfred Huber 2018.
Multiagent Systems Repeated Games © Manfred Huber 2018.
Mutually-guided Multi-agent Learning
Multiple Regression Analysis: OLS Asymptotics
Information, Incentives, and Mechanism Design
Presentation transcript:

Multiple timescales for multiagent learning David Leslie and E. J. Collins University of Bristol David Leslie is supported by CASE Research Studentship from the UK Engineering and Physical Sciences Research Council in cooperation with BAE SYSTEMS.

NIPS 2002 workshop on multiagent learning Introduction n Learning in iterated normal form games. n Simple environment. n Theoretical properties of multiagent Q- learning.

NIPS 2002 workshop on multiagent learning Notation n players. n Player plays mixed strategy. n Opponent mixed strategy. n Expected reward for playing is. n estimated by.

NIPS 2002 workshop on multiagent learning Mixed strategies n Mixed equilibria necessary. n Mixed strategies from values. n Boltzmann smoothing with fixed temperature parameter.

NIPS 2002 workshop on multiagent learning Fixed temperatures n Nash distribution approximates Nash equilibrium. n No discontinuities. n True convergence to mixed strategies.

NIPS 2002 workshop on multiagent learning Q-learning n Standard Q-learning, except for division by. n is the indicator function, is the reward. n Learning parameters satisfy

NIPS 2002 workshop on multiagent learning Three player pennies Player 1 Player 3 Player 2 1 point if choice matches player 2 1 point if choice matches player 3 1 point if choice is opposite to player 1

NIPS 2002 workshop on multiagent learning A plot of Q values

NIPS 2002 workshop on multiagent learning Stochastic approximation n Relate to an ODE. n implies values track n Deterministic, continuous time system.

NIPS 2002 workshop on multiagent learning Analysis of the example n Unique fixed point. n Small temperatures make fixed point unstable - a periodic orbit is stable. n Explains cycling of values.

NIPS 2002 workshop on multiagent learning Multiple timescales - I n Generalise stochastic approximation. n for. n The quicker, the slower the process adapts.

NIPS 2002 workshop on multiagent learning Multiple timescales - II n Fast processes can fully adapt to slow processes. n Slow processes see fast processes as having completely converged. n Will work if the fast processes converge to a unique value for each fixed value of the slow processes.

NIPS 2002 workshop on multiagent learning Multiple-timescales Q-learning assumption n Assume that for fixed the values of will converge to a unique value, resulting in joint best response. n For example, holds for two-player games and for cyclic games.

NIPS 2002 workshop on multiagent learning Convergence of multiple- timescales Q-learning n Behaviour determined by the ODE n Can prove convergence if player 1 has only two actions. n Hence process converges for three player pennies.

NIPS 2002 workshop on multiagent learning Another plot of Q values

NIPS 2002 workshop on multiagent learning Conclusion n Theoretical study of multiagent learning. n Fixed temperature parameter to achieve mixed equilibria from values. n Multiple timescales assists convergence and enables theoretical study.

NIPS 2002 workshop on multiagent learning Future work n Investigate when the convergence assumption must hold. n Experiments with multiple-timescales learning in Markov games. n Theoretical results for multiple-timescales learning in Markov games.