Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Slides:



Advertisements
Similar presentations
Dialogue Policy Optimisation
Advertisements

Adaptively Learning Tolls to Induce Target Flows Aaron Roth Joint work with Jon Ullman and Steven Wu.
Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
Continuation Methods for Structured Games Ben Blum Christian Shelton Daphne Koller Stanford University.
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
(CS/SS 241) Introduction to SISL: Topics in Algorithmic game theory Adam Wierman – 258 Jorgensen John Ledyard – 102 Baxter Jason R. Marden – 335 Moore.
Congestion Games with Player- Specific Payoff Functions Igal Milchtaich, Department of Mathematics, The Hebrew University of Jerusalem, 1993 Presentation.
Joint Strategy Fictitious Play Sherwin Doroudi. “Adapted” from J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for.
Learning in games Vincent Conitzer
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
Satisfaction Equilibrium Stéphane Ross. Canadian AI / 21 Problem In real life multiagent systems :  Agents generally do not know the preferences.
1 Duke PhD Summer Camp August 2007 Outline  Motivation  Mutual Consistency: CH Model  Noisy Best-Response: QRE Model  Instant Convergence: EWA Learning.
IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com.
Rational Learning Leads to Nash Equilibrium Ehud Kalai and Ehud Lehrer Econometrica, Vol. 61 No. 5 (Sep 1993), Presented by Vincent Mak
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
Decentralised Coordination of Mobile Sensors using the Max-Sum Algorithm Ruben Stranders, Alex Rogers, Nick Jennings School of Electronics and Computer.
Decentralised Coordination of Mobile Sensors using the Max-Sum Algorithm School of Electronics and Computer Science University of Southampton {rs06r2,
Decentralised Coordination of Continuously Valued Control Parameters using the Max-Sum Algorithm Ruben Stranders, Alessandro Farinelli, Alex Rogers, Nick.
Communication Networks A Second Course Jean Walrand Department of EECS University of California at Berkeley.
Harsanyi transformation Players have private information Each possibility is called a type. Nature chooses a type for each player. Probability distribution.
Reinforcement Learning to Play an Optimal Nash Equilibrium in Coordination Markov Games XiaoFeng Wang and Tuomas Sandholm Carnegie Mellon University.
19 th November 2008 Agent-Based Decentralised Control of Complex Distributed Systems Alex Rogers School of Electronics and Computer Science University.
A Principled Information Valuation for Communications During Multi-Agent Coordination Simon A. Williamson, Enrico H. Gerding, Nicholas R. Jennings School.
Agent-Based Coordination of Sensor Networks Alex Rogers School of Electronics and Computer Science University of Southampton
INSTITUTO DE SISTEMAS E ROBÓTICA Minimax Value Iteration Applied to Robotic Soccer Gonçalo Neto Institute for Systems and Robotics Instituto Superior Técnico.
Autonomous Target Assignment: A Game Theoretical Formulation Gurdal Arslan & Jeff Shamma Mechanical and Aerospace Engineering UCLA AFOSR / MURI.
1 Computing Nash Equilibrium Presenter: Yishay Mansour.
XYZ 6/18/2015 MIT Brain and Cognitive Sciences Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga th March, 2004.
AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.
Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.
Reinforcement Learning
APEC 8205: Applied Game Theory Fall 2007
Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.
Lecture Slides Dixit and Skeath Chapter 4
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
Distributed Computing with Adaptive Heuristics Michael Schapira Princeton Innovations in Computer Science 09 January 2011 Partially supported by NSF Aaron.
1 A Game Theoretic Formulation of the Dynamic Sensor Coverage Problem Jason Marden ( UCLA ) Gürdal Arslan ( University of Hawaii ) Jeff Shamma ( UCLA )
Multiple timescales for multiagent learning David Leslie and E. J. Collins University of Bristol David Leslie is supported by CASE Research Studentship.
Constraints in Repeated Games. Rational Learning Leads to Nash Equilibrium …so what is rational learning? Kalai & Lehrer, 1993.
Reinforcement Learning (1)
A Decentralised Coordination Algorithm for Mobile Sensors School of Electronics and Computer Science University of Southampton {rs06r2, fmdf08r, acr,
A Dynamic Level-k Model in Games Teck Ho and Xuanming Su UC Berkeley April 2011 Teck Hua Ho 1.
1 On the Agenda(s) of Research on Multi-Agent Learning by Yoav Shoham and Rob Powers and Trond Grenager Learning against opponents with bounded memory.
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
Decentralised Coordination through Local Message Passing Alex Rogers School of Electronics and Computer Science University of Southampton
1 Economics & Evolution. 2 Cournot Game 2 players Each chooses quantity q i ≥ 0 Player i’s payoff is: q i (1- q i –q j ) Inverse demand (price) No cost.
CPS Learning in games Vincent Conitzer
Decentralised Coordination of Mobile Sensors School of Electronics and Computer Science University of Southampton Ruben Stranders,
Raga Gopalakrishnan University of Colorado at Boulder Sean D. Nixon (University of Vermont) Jason R. Marden (University of Colorado at Boulder) Stable.
September 15September 15 Multiagent learning using a variable learning rate Igor Kiselev, University of Waterloo M. Bowling and M. Veloso. Artificial Intelligence,
MAKING COMPLEX DEClSlONS
Bayesian and non-Bayesian Learning in Games Ehud Lehrer Tel Aviv University, School of Mathematical Sciences Including joint works with: Ehud Kalai, Rann.
Derivative Action Learning in Games Review of: J. Shamma and G. Arslan, “Dynamic Fictitious Play, Dynamic Gradient Play, and Distributed Convergence to.
Reinforcement Learning
Self‐Organising Sensors for Wide Area Surveillance using the Max‐Sum Algorithm Alex Rogers and Nick Jennings School of Electronics and Computer Science.
Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.
Modeling Reasoning in Strategic Situations Avi Pfeffer MURI Review Monday, December 17 th, 2007.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Competitive Scheduling in Wireless Networks with Correlated Channel State Ozan.
Equilibrium transitions in stochastic evolutionary games Dresden, ECCS’07 Jacek Miękisz Institute of Applied Mathematics University of Warsaw.
Designing Games for Distributed Optimization Na Li and Jason R. Marden IEEE Journal of Selected Topics in Signal Processing, Vol. 7, No. 2, pp ,
COMP 2208 Dr. Long Tran-Thanh University of Southampton Bandits.
Game Theory (Microeconomic Theory (IV)) Instructor: Yongqin Wang School of Economics, Fudan University December, 2004.
Alpcan, T., and T. Basar (2004) “A game theoretic analysis of intrusion detection in access control systems” Proceedings of 43 rd IEEE Conference on Decision.
MAIN RESULT: We assume utility exhibits strategic complementarities. We show: Membership in larger k-core implies higher actions in equilibrium Higher.
Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.
On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.
R. Brafman and M. Tennenholtz Presented by Daniel Rasmussen.
Convergence, Targeted Optimality, and Safety in Multiagent Learning
Structured Models for Multi-Agent Interactions
Economics & Evolution.
Presentation transcript:

Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University of Bristol and School of Electronics and Computer Science University of Southampton

Playing games?

Dense deployment of sensors to detect pedestrian and vehicle activity within an urban environment. Berkeley Engineering

Learning in games Adapt to observations of past play Hope to converge to something “good” Why?! –Bounded rationality justification of equilibrium –Robust to behaviour of “opponents” –Language to describe distributed optimisation

Notation Players Discrete action sets Joint action set Reward functions Mixed strategies Joint mixed strategy space Reward functions extend to

Best response / Equilibrium Mixed strategies of all players other than i is Best response of player i is An equilibrium is a satisfying, for all i,

Fictitious play Estimate strategies of other players Game matrix Select best action given estimates Update estimates

Belief updates Belief about strategy of player i is the MLE Online updating

Processes of the form where and F is set-valued (convex and u.s.c.) Limit points are chain-recurrent sets of the differential inclusion Stochastic approximation

Best-response dynamics Fictitious play has M and e identically 0, and Limit points are limit points of the best- response differential inclusion In potential games (and zero-sum games and some others) the limit points must be Nash equilibria

Best-response dynamics Limit points are limit points of the best- response differential inclusion In potential games (and zero-sum games and some others) the limit points must be Nash equilibria

Generalised weakened fictitious play Consider any process such that where and and also an interplay betweenand M. The convergence properties do not change

Fictitious play Estimate strategies of other players Game matrix Select best action given estimates Update estimates

Learning the game

Reinforcement learning Track the average reward for each joint action Play each joint action frequently enough Estimates will be close to the expected value Estimated game converges to the true game

Q-learned fictitious play Estimate strategies of other players Game matrix Select best action given estimates Estimated game matrix Select best action given estimates Update estimates

Theoretical result Theorem – If all joint actions are played infinitely often then beliefs follow a GWFP Proof: The estimated game converges to the true game, so selected strategies are  -best responses.

Claus and Boutilier Claus and Boutilier (1998) state a similar result It is restricted to team games

Playing games? Dense deployment of sensors to detect pedestrian and vehicle activity within an urban environment. Berkeley Engineering

It’s impossible! N players, each with A actions Game matrix has A N entries to learn Each individual must estimate the strategy of every other individual It’s just not possible for realistic game scenarios

Marginal contributions Marginal contribution of player i is total system reward – system reward if i absent Maximised marginal contributions implies system is at a (local) optimum Marginal contribution might depend only on the actions of a small number of neighbours

Sensors – rewards Global reward for action a is Marginal reward for i is Actually use

Marginal contributions

Graphical games

Local learning Estimate strategies of other players Game matrix Select best action given estimates Estimated game matrix Select best action given estimates Update estimates

Local learning Estimate strategies of neighbours Game matrix Select best action given estimates Estimated game matrix for local interactions Select best action given estimates Update estimates

Theoretical result Theorem – If all joint actions of local games are played infinitely often then beliefs follow a GWFP Proof: The estimated game converges to the true game, so selected strategies are  -best responses.

Sensing results

So what?! Play converges to (local) optimum with only noisy information and local communication An individual always chooses an action to maximise expected reward given information If an individual doesn’t “play cricket”, the other individuals will reach an optimal point conditional on the behaviour of the itinerant

Summary Learning the game while playing is essential This can be accommodated within the GWFP framework Exploiting the neighbourhood structure of marginal contributions is essential for feasibility