Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.

Slides:



Advertisements
Similar presentations
Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.
Advertisements

Markov Decision Process
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
Reinforcement Learning
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Markov Decision Processes
Infinite Horizon Problems
Planning under Uncertainty
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
This time: Outline Game playing The minimax algorithm
Announcements Homework 3: Games Project 2: Multi-Agent Pacman
91.420/543: Artificial Intelligence UMass Lowell CS – Fall 2010
CS 182/CogSci110/Ling109 Spring 2008 Reinforcement Learning: Algorithms 4/1/2008 Srini Narayanan – ICSI and UC Berkeley.
Games with Chance Other Search Algorithms CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 3 Adapted from slides of Yoonsuck Choe.
INSTITUTO DE SISTEMAS E ROBÓTICA Minimax Value Iteration Applied to Robotic Soccer Gonçalo Neto Institute for Systems and Robotics Instituto Superior Técnico.
Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.
Reinforcement Learning
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.
Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.
Distributed Q Learning Lars Blackmore and Steve Block.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Planning to learn. Progress report Last time: Transition functions & stochastic outcomes Markov chains MDPs defined Today: Exercise completed Value functions.
Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Department of Computer Science Undergraduate Events More
More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Reinforcement Learning (1)
Making Decisions CSE 592 Winter 2003 Henry Kautz.
1 On the Agenda(s) of Research on Multi-Agent Learning by Yoav Shoham and Rob Powers and Trond Grenager Learning against opponents with bounded memory.
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
CPSC 7373: Artificial Intelligence Lecture 11: Reinforcement Learning Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
MDP Reinforcement Learning. Markov Decision Process “Should you give money to charity?” “Would you contribute?” “Should you give money to charity?” $
Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
September 15September 15 Multiagent learning using a variable learning rate Igor Kiselev, University of Waterloo M. Bowling and M. Veloso. Artificial Intelligence,
MAKING COMPLEX DEClSlONS
Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.
Reinforcement Learning
林偉楷 Taiwan Evolutionary Intelligence Laboratory.
Introduction Many decision making problems in real life
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Rutgers University A Polynomial-time Nash Equilibrium Algorithm for Repeated Stochastic Games Enrique Munoz de Cote Michael L. Littman.
Dynamic Programming for Partially Observable Stochastic Games Daniel S. Bernstein University of Massachusetts Amherst in collaboration with Christopher.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Class 2 Please read chapter 2 for Tuesday’s class (Response due by 3pm on Monday) How was Piazza? Any Questions?
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
Games as adversarial search problems Dynamic state space search.
MDPs (cont) & Reinforcement Learning
Reinforcement Learning
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Adversarial Search. Game playing u Multi-agent competitive environment u The most common games are deterministic, turn- taking, two-player, zero-sum game.
Some Final Thoughts Abhijit Gosavi. From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Reinforcement Learning (1)
Markov Decision Processes
Announcements Homework 3 due today (grace period through Friday)
Reinforcement Learning
Reinforcement Learning
Reinforcement Learning (2)
Presentation transcript:

Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004

Markov Games as a Framework for Multi-agent Reinforcement Learning2 Overview MDP is capable of describing only single-agent environments. New mathematical framework is needed to support multi-agent reinforcement learning. Markov Games A single step in this direction is explored. 2-player zero-sum Markov Games

Markov Games as a Framework for Multi-agent Reinforcement Learning3 Definitions Markov Decision Process (MDP)

Markov Games as a Framework for Multi-agent Reinforcement Learning4 Definitions (cont.) Markov Game (MG)

Markov Games as a Framework for Multi-agent Reinforcement Learning5 Definitions (cont.) Two-player zero-sum Markov Game (2P-MG)

Markov Games as a Framework for Multi-agent Reinforcement Learning6 2P-MG Is Capable? Precludes cooperation! Generalizes MDPs (when |O|=1) The opponent has a constant behavior, which may be viewed as part of the environment. Matrix Games (when |S|=1) The environment doesn’t hold any information and rewards are totally decided by the actions. Yes

Markov Games as a Framework for Multi-agent Reinforcement Learning7 Matrix Games Example – “rock, paper, scissors”

Markov Games as a Framework for Multi-agent Reinforcement Learning8 What does ‘optimality’ exactly mean? MDP A stationary, deterministic, and undominated optimal policy always exists. MG The performance of a policy depends on the opponent’s policy, so we cannot evaluate them without context. New definition of ‘optimality’ in game theory Performs best at its worst case compared with others At least one optimal policy exists, which may or may not be deterministic because the agent is uncertain of its opponent’s move.

Markov Games as a Framework for Multi-agent Reinforcement Learning9 Finding Optimal Policy - Matrix Games The optimal agent’s minimum expected reward should be as large as possible. Use V to express the minimum value, then consider how to maximize it

Markov Games as a Framework for Multi-agent Reinforcement Learning10 Finding Optimal Policy - MDP Value of a state Quality of a state-action pair

Markov Games as a Framework for Multi-agent Reinforcement Learning11 Finding Optimal Policy – 2P-MG Value of a state Quality of a s-a-o triple

Markov Games as a Framework for Multi-agent Reinforcement Learning12 Learning Optimal Polices Q-learning minimax-Q learning

Markov Games as a Framework for Multi-agent Reinforcement Learning13 Minimax-Q Algorithm

Markov Games as a Framework for Multi-agent Reinforcement Learning14 Experiment - Problem Soccer

Markov Games as a Framework for Multi-agent Reinforcement Learning15 Experiment - Training 4 agents trained through 10 6 steps minimax-Q learning vs. random opponent - MR vs. itself - MM Q-learning vs. random opponent - QR vs. itself - QQ

Markov Games as a Framework for Multi-agent Reinforcement Learning16 Experiment - Testing Test 3 QR, QQ – 100% loser? Test 1 QR > MR? Test 2 QR<<QQ?

Markov Games as a Framework for Multi-agent Reinforcement Learning17 Contributions A solution to 2-player Markov games with a modified Q-learning method in which minimax is in place of max Minimax can also be used in single-agent environments to avoid risky behavior.

Markov Games as a Framework for Multi-agent Reinforcement Learning18 Future work Possible performance improvement of the minimax-Q learning method Linear programming caused large computational complexity. Iterative methods may be used to get approximate solutions to minimax much faster, which is sufficiently satisfactory.

Markov Games as a Framework for Multi-agent Reinforcement Learning19 Discussions The paper claims that the training is not sufficient for attaining the optimal policy for MR and MM. Then how soon will it possible for them to do so? It is claimed that MR and MM should break even with even the strongest opponent. Why? After training and before testing, the policies in agents are fixed. How about not fixing it and leaving learning abilities there? Thus we can examine how they adapt themselves over the long run, say how their winning rate changes. What is a “slow enough exponentially weighted average”?