Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.


Similar presentations
Vincent Conitzer CPS Repeated games Vincent Conitzer

Markov Decision Process
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Mixed Strategies CMPT 882 Computational Game Theory Simon Fraser University Spring 2010 Instructor: Oliver Schulte.
Learning in Multi-agent System
Response Regret Martin Zinkevich AAAI Fall Symposium November 5 th, 2005 This work was supported by NSF Career Grant #IIS
Reinforcement learning (Chapter 21)
1 Monte Carlo Methods Week #5. 2 Introduction Monte Carlo (MC) Methods –do not assume complete knowledge of environment (unlike DP methods which assume.
Markov Decision Processes
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Reinforcement learning
SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.
INSTITUTO DE SISTEMAS E ROBÓTICA Minimax Value Iteration Applied to Robotic Soccer Gonçalo Neto Institute for Systems and Robotics Instituto Superior Técnico.
Latent Learning in Agents iCML 03 Robotics/Vision Workshop Rati Sharma.
XYZ 6/18/2015 MIT Brain and Cognitive Sciences Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga th March, 2004.
Reinforcement Learning
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Correlated-Q Learning and Cyclic Equilibria in Markov games Haoqi Zhang.
Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.
Distributed Q Learning Lars Blackmore and Steve Block.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center
Extending Implicit Negotiation to Repeated Grid Games Robin Carnow Computer Science Department Rutgers University.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Reinforcement Learning (1)
1 On the Agenda(s) of Research on Multi-Agent Learning by Yoav Shoham and Rob Powers and Trond Grenager Learning against opponents with bounded memory.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
MDP Reinforcement Learning. Markov Decision Process “Should you give money to charity?” “Would you contribute?” “Should you give money to charity?” $
Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.
Reinforcement Learning
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Introduction Many decision making problems in real life
Reinforcement Learning (II.) Exercise Solutions Ata Kaban School of Computer Science University of Birmingham.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Rutgers University A Polynomial-time Nash Equilibrium Algorithm for Repeated Stochastic Games Enrique Munoz de Cote Michael L. Littman.
Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.
Dynamic Programming for Partially Observable Stochastic Games Daniel S. Bernstein University of Massachusetts Amherst in collaboration with Christopher.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Solving POMDPs through Macro Decomposition
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Neural Networks Chapter 7
INTRODUCTION TO Machine Learning
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
MDPs (cont) & Reinforcement Learning
Reinforcement Learning
Software Multiagent Systems: CS543 Milind Tambe University of Southern California
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO AUTOMATICO Lezione 12 - Reinforcement Learning Prof. Giancarlo Mauri.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
 When playing two person finitely repeated games, do people behave like they are adapting a policy directly(Gradient Accent) or do they behave like they.
CS 182 Reinforcement Learning. An example RL domain Solitaire –What is the state space? –What are the actions? –What is the transition function? Is it.
Announcements Grader office hours posted on course website
Making complex decisions
Markov Decision Processes
Vincent Conitzer CPS Repeated games Vincent Conitzer
Markov Decision Processes
Markov Decision Processes
Vincent Conitzer Repeated games Vincent Conitzer
Reinforcement Nisheeth 18th January 2019.
Collaboration in Repeated Games
Vincent Conitzer CPS Repeated games Vincent Conitzer
Presentation transcript:

Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s Minimax Q learning (zero-sum) Hu & Wellman’s Nash Q learning (general-sum)

/ SG / POSG Stochastic games (SG) Partially observable SG (POSG)

Immediate reward Expectation over next states Value of next state

Model-based reinforcement learning: 1.Learn the reward function and the state transition function 2.Solve for the optimal policy Model-free reinforcement learning: 1.Directly learn the optimal policy without knowing the reward function or the state transition function

#times action a has been executed in state s #times action a causes state transition s  s’ Total reward accrued when applying a in s


1.Start with arbitrary initial values of Q(s,a), for all s  S, a  A 2.At each time t the agent chooses an action and observes its reward r t 3.The agent then updates its Q-values based on the Q- learning rule 4.The learning rate  t needs to decay over time in order for the learning algorithm to converge

Famous game theory example

A co-operative game

Mixed strategy Generalization of MDP

Stationary: the agent’s policy does not change over time Deterministic: the same action is always chosen whenever the agent is in state s

Example State 1 State

v(s,  *)  v(s,  ) for all s  S,  

Max V Such that:  rock +  paper +  scissors = 1

Best response Worst case Expectation over all actions

Quality of a state-action pair Discounted value of all succeeding states weighted by their likelihood Discounted value of all succeeding states This learning rule converges to the correct values of Q and v

eplor controls how often the agent will deviate from its current policy Expected reward for taking action a when opponent chooses o from state s

Hu and Wellman general-sum Markov games as a framework for RL Theorem (Nash, 1951) There exists a mixed strategy Nash equilibrium for any finite bimatrix game