Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center

Slides:

Advertisements

Similar presentations

Vincent Conitzer CPS Repeated games Vincent Conitzer

Advertisements

Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.

Chapter 6 Game Theory © 2006 Thomson Learning/South-Western.

Game Theory and Computer Networks: a useful combination? Christos Samaras, COMNET Group, DUTH.

Chapter 6 Game Theory © 2006 Thomson Learning/South-Western.

EC3224 Autumn Lecture #04 Mixed-Strategy Equilibrium

Learning in games Vincent Conitzer

Introduction to Reinforcement Learning

Algorithms and Economics of Networks Abraham Flaxman and Vahab Mirrokni, Microsoft Research.

1 Introduction to Game Theoretic Multi- Agent Learning Game Theory University of Tehran Spring 2009.

Games What is ‘Game Theory’? There are several tools and techniques used by applied modelers to generate testable hypotheses Modeling techniques widely.

An Introduction to Game Theory Part I: Strategic Games

Chapter 6 © 2006 Thomson Learning/South-Western Game Theory.

Ecs289m Spring, 2008 Non-cooperative Games S. Felix Wu Computer Science Department University of California, Davis

Satisfaction Equilibrium Stéphane Ross. Canadian AI / 21 Problem In real life multiagent systems :  Agents generally do not know the preferences.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

A camper awakens to the growl of a hungry bear and sees his friend putting on a pair of running shoes, “You can’t outrun a bear,” scoffs the camper. His.

Chapter 12 Choices Involving Strategy McGraw-Hill/Irwin Copyright © 2008 by The McGraw-Hill Companies, Inc. All Rights Reserved.

Algoritmi per Sistemi Distribuiti Strategici

IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com.

Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Planning under Uncertainty

Game-Theoretic Approaches to Multi-Agent Systems Bernhard Nebel.

Complexity Results about Nash Equilibria

Algorithmic Game Theory Nicola Gatti and Marcello Restelli {ngatti, DEI, Politecnico di Milano, Piazza Leonardo da Vinci 32, Milano,

Communication Networks A Second Course Jean Walrand Department of EECS University of California at Berkeley.

INSTITUTO DE SISTEMAS E ROBÓTICA Minimax Value Iteration Applied to Robotic Soccer Gonçalo Neto Institute for Systems and Robotics Instituto Superior Técnico.

Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces.

AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.

Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Reinforcement Learning

Correlated-Q Learning and Cyclic Equilibria in Markov games Haoqi Zhang.

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Introduction to Game Theory and Behavior Networked Life CIS 112 Spring 2009 Prof. Michael Kearns.

QR 38, 2/22/07 Strategic form: dominant strategies I.Strategic form II.Finding Nash equilibria III.Strategic form games in IR.

Evolutionary Games The solution concepts that we have discussed in some detail include strategically dominant solutions equilibrium solutions Pareto optimal.

Multiple timescales for multiagent learning David Leslie and E. J. Collins University of Bristol David Leslie is supported by CASE Research Studentship.

1 On the Agenda(s) of Research on Multi-Agent Learning by Yoav Shoham and Rob Powers and Trond Grenager Learning against opponents with bounded memory.

1 Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Congestion Games, Potential Games and Price of Anarchy Liad Blumrosen ©

CPS Learning in games Vincent Conitzer

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

September 15September 15 Multiagent learning using a variable learning rate Igor Kiselev, University of Waterloo M. Bowling and M. Veloso. Artificial Intelligence,

MAKING COMPLEX DEClSlONS

Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.

Mechanisms for Making Crowds Truthful Andrew Mao, Sergiy Nesterko.

Chapter 12 Choices Involving Strategy Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written.

Derivative Action Learning in Games Review of: J. Shamma and G. Arslan, “Dynamic Fictitious Play, Dynamic Gradient Play, and Distributed Convergence to.

Learning in Multiagent systems

Dynamic Games of complete information: Backward Induction and Subgame perfection - Repeated Games -

Standard and Extended Form Games A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor, SIUC.

Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.

Game theory & Linear Programming Steve Gu Mar 28, 2008.

Game Theory: introduction and applications to computer networks Game Theory: introduction and applications to computer networks Lecture 2: two-person non.

Algorithmic, Game-theoretic and Logical Foundations

1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.

Reinforcement Learning

Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10

Vincent Conitzer CPS Learning in games Vincent Conitzer

5.1.Static Games of Incomplete Information

Replicator Dynamics. Nash makes sense (arguably) if… -Uber-rational -Calculating.

Game theory Chapter 28 and 29

Vincent Conitzer CPS Repeated games Vincent Conitzer

Game theory Chapter 28 and 29

Game Theory in Wireless and Communication Networks: Theory, Models, and Applications Lecture 2 Bayesian Games Zhu Han, Dusit Niyato, Walid Saad, Tamer.

Multiagent Systems Repeated Games © Manfred Huber 2018.

Vincent Conitzer Repeated games Vincent Conitzer

Normal Form (Matrix) Games

Vincent Conitzer CPS Repeated games Vincent Conitzer

Presentation transcript:

Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center

Outline  Statement of the problem  Tools and concepts from RL & game theory  “Naïve” approaches to multi-agent learning  ordinary single-agent RL; no-regret learning  fictitious play  evolutionary game theory  “Sophisticated” approaches  minimax-Q (Littman), Nash-Q (Hu & Wellman)  tinkering with learning rates: WoLF (Bowling), Multiple-timescale Q-learning (Leslie & Collins)  “strategic teaching” (Camerer talk)  Challenges and Opportunities

Normal single-agent learning Assume that environment has observable states, characterizable expected rewards and state transitions, and all of the above is stationary (MDP-ish) Non-learning, theoretical solution to fully specified problem: DP formalism Learning: solve by trial and error without a full specification: RL + exploration, Monte Carlo,...

Multi-Agent Learning Problem: Agent tries to solve its learning problem, while other agents in the environment also are trying to solve their own learning problems. Non-learning, theoretical solution to fully specified problem: game theory

Basics of game theory A game is specified by: players (1…N), actions, and payoff matrices (functions of joint actions) B’s action A’s action A’s payoff B’s payoff If payoff matrices are identical, game is cooperative, else non-cooperative (zero-sum = purely competitive)

Basic lingo…(2) Games with no states: (bi)-matrix games Games with states: stochastic games, Markov games; (state transitions are functions of joint actions) Games with simultaneous moves: normal form Games with alternating turns: extensive form No. of rounds = 1: one-shot game No. of rounds > 1: repeated game deterministic action choice: pure strategy non-deterministic action choice: mixed strategy

Basic Analysis A joint strategy x is Pareto-optimal if no x’ that improves everybody’s payoffs An agent’s x i is a dominant strategy if it’s always best regardless of others’ actions x i is a best-reponse to others’ x -i if it maximizes payoff given x -i A joint strategy x is an equilibrium if each agent’s strategy is simultaneously a best-response to everyone else’s strategy, i.e. no incentive to deviate (Nash, correlated) A Nash equilibrium always exists, but may be exponentially many of them, and not easy to compute

What about imperfect information games? Nash eqm. requires knowledge of all payoffs. For imperfect info. games, corresponding concept is Bayes-Nash equilibrium (Nash plus Bayesian inference over hidden information). Even more intractable than regular Nash.

Can we make game theory more tractable? Active area of research Symmetric games: payoffs are invariant under swapping of player labels.  Can look for symmetric equilibria, where all agents play same mixed strategy. Network games: agent payoffs only depend on interactions with a small # of neighbors Summarization games: payoffs are simple summarization functions of population joint actions (e.g. voting)

Summary: pros and cons of game theory Game theory provides a nice conceptual/theoretical framework for thinking about multi-agent learning. Game theory is appropriate provided that: Game is stationary and fully specified; Enough computer power to compute equilibrium; Can assume other agents are also game theorists; Can solve equilibrium coordination problem. Above conditions rarely hold in real applications  Multi-agent learning is not only a fascinating problem, it may be the only viable option.

Naïve Approaches to Multi-Agent Learning Basic idea: agent adapts, ignoring non-stationarity of other agents’ strategies 1. Fictitious play: Agent observes time-average frequency of other players’ action choices, and models: agent then plays best-response to this model Variants of fictitious play: exponential recency weighting, “smoothed” best response (~softmax), small adjustment toward best response,...

What if all agents use fictitious play? Strict Nash equilibria are absorbing points for fictitious play Typical result is limit-cycle behavior of strategies, with increasing period as N   In certain cases, product of empirical distributions converges to Nash even though actual play cycles (penny matching example)

More Naïve Approaches… 2. Evolutionary game theory: “Replicator Dynamics” models: large population of agents using different strategies, fittest agents breed more copies. Let x= population strategy vector, and x k = fraction of population playing strategy k. Growth rate then: Above equation also derived from an “imitation” model NE are fixed points of above equation, but not necessarily attractors (unstable or neutral stable)

Many possible dynamic behaviors... limit cycles attractors unstable f.p. Also saddle points, chaotic orbits,...

Replicator dynamics: auction bidding strategies

More Naïve Approaches… 3. Iterated Gradient Ascent: (Singh, Kearns and Mansour): Again does a myopic adaptation to other players’ current strategy. Coupled system of linear equations: u is linear in x i and x -i Analysis for two-player, two-action games: either converges to a Nash fixed point on the boundary (at least one pure strategy), or get limit cycles

Further Naïve Approaches… 4. Dumb Single-Agent Learning: Use a single-agent algorithm in a multi-agent problem & hope that it works No-regret learning by pricebots (Greenwald & Kephart) Simultaneous Q-learning by pricebots (Tesauro & Kephart) In many cases, this actually works: learners converge either exactly or approximately to self-consistent optimal strategies

“Sophisticated” approaches Takes into account the possibility that other agents’ strategies might change. 5. Multi-Agent Q-learning: Minimax-Q (Littman): convergent algorithm for two-player zero-sum stochastic games Nash-Q (Hu & Wellman): convergent algorithm for two-player general-sum stochastic games; requires use of Nash equilibrium solver

More sophisticated approaches Varying learning rates WoLF: “Win or Learn Fast” (Bowling): agent reduces its learning rate when performing well, and increases when doing badly. Improves convergence of IGA and policy hill-climbing Multi-timescale Q-Learning (Leslie): different agents use different power laws t -n for learning rate decay: achieves simultaneous convergence where ordinary Q-learning doesn’t

More sophisticated approaches “Strategic Teaching:” recognizes that other players’ strategy are adaptive “A strategic teacher may play a strategy which is not myopically optimal (such as cooperating in Prisoner’s Dilemma) in the hope that it induces adaptive players to expect that strategy in the future, which triggers a best-response that benefits the teacher.” (Camerer, Ho and Chong)

Theoretical Research Challenges Proper theoretical formulation? “No short-cut” hypothesis: Massive on-line search a la Deep Blue to maximize expected long-term reward (Bayesian) Model and predict behavior of other players, including how they learn based on my actions (beware of infinite model recursion) trial-and-error exploration continual Bayesian inference using all evidence over all uncertainties (Boutilier: Bayesian exploration) When can you get away with simpler methods?

Real-World Opportunities Multi-agent systems where you can’t do game theory (covers everything :-)) Electronic marketplaces (Kephart) Mobile networks (Chang) Self-managing computer systems (Kephart) Teams of robots (Bowling, Stone) Video games Military/counter-terrorism applications