Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider.

Slides:

Advertisements

Similar presentations

Markov Decision Process

Advertisements

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Partially Observable Markov Decision Process (POMDP)

This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.

Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.

Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

Bayesian Games Yasuhiro Kirihata University of Illinois at Chicago.

SARSOP Successive Approximations of the Reachable Space under Optimal Policies Devin Grady 4 April 2013.

CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Compressing Mental Model Spaces and Modeling Human Strategic Intent.

Fast approximate POMDP planning: Overcoming the curse of history! Joelle Pineau, Geoff Gordon and Sebastian Thrun, CMU Point-based value iteration: an.

Meeting 3 POMDP (Partial Observability MDP) 資工四阮鶴鳴李運寰 Advisor: 李琳山教授.

Markov Models for Multi-Agent Coordination Maayan Roth Multi-Robot Reading Group April 13, 2005.

CS594 Automated decision making University of Illinois, Chicago

10/19/2004TCSS435A Isabelle Bichindaritz1 Game and Tree Searching.

1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National.

MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.

Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Partially Observable Markov Decision Process By Nezih Ergin Özkucur.

Markov Decision Processes

Planning under Uncertainty

Game-Theoretic Approaches to Multi-Agent Systems Bernhard Nebel.

In practice, we run into three common issues faced by concurrent optimization algorithms. We alter our model-shaping to mitigate these by reasoning about.

This time: Outline Game playing The minimax algorithm

SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.

A Principled Information Valuation for Communications During Multi-Agent Coordination Simon A. Williamson, Enrico H. Gerding, Nicholas R. Jennings School.

Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.

Markov Decision Processes

1 Machine Learning: Symbol-based 9d 9.0Introduction 9.1A Framework for Symbol-based Learning 9.2Version Space Search 9.3The ID3 Decision Tree Induction.

1 Execution-Time Communication Decisions for Coordination of Multi-Agent Teams Maayan Roth Thesis Defense Carnegie Mellon University September 4, 2007.

Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.

MAKING COMPLEX DEClSlONS

Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.

Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.

Department of Computer Science Christopher Amato Carnegie Mellon University Feb 5 th, 2010 Increasing Scalability in Algorithms for Centralized and Decentralized.

Overview  Decision processes and Markov Decision Processes (MDP)  Rewards and Optimal Policies  Defining features of Markov Decision Process  Solving.

Generalized and Bounded Policy Iteration for Finitely Nested Interactive POMDPs: Scaling Up Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University.

K. J. O’Hara AMRS: Behavior Recognition and Opponent Modeling Oct Behavior Recognition and Opponent Modeling in Autonomous Multi-Robot Systems.

Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California

TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.

Dynamic Programming for Partially Observable Stochastic Games Daniel S. Bernstein University of Massachusetts Amherst in collaboration with Christopher.

Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.

Using Reinforcement Learning to Model True Team Behavior in Uncertain Multiagent Settings in Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS.

A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.

MDPs (cont) & Reinforcement Learning

1 Multiagent Teamwork: Analyzing the Optimality and Complexity of Key Theories and Models David V. Pynadath and Milind Tambe Information Sciences Institute.

Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.

Learning Team Behavior Using Individual Decision Making in Multiagent Settings Using Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS Department.

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.

Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.

Partial Observability “Planning and acting in partially observable stochastic domains” Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra;

Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.

1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Artificial Intelligence AIMA §5: Adversarial Search

Keep the Adversary Guessing: Agent Security by Policy Randomization

Networked Distributed POMDPs: DCOP-Inspired Distributed POMDPs

Markov Decision Processes

Markov Decision Processes

Equlibrium Selection in Stochastic Games

Approximate POMDP planning: Overcoming the curse of history!

Chapter 17 – Making Complex Decisions

CS 416 Artificial Intelligence

Reinforcement Learning Dealing with Partial Observability

Normal Form (Matrix) Games

Presentation transcript:

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider and Sebastian Thrun July 21, 2004AAMAS 2004

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Robot Teams

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Robot Teams With limited communication, existing paradigms for decentralized robot control are not sufficient Game theoretic methods are necessary for multi-robot coordination under these conditions

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Decentralized Decision Making

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Decentralized Decision Making A robot cannot choose actions based only on joint observations consistent with its own sensor readings It must consider all joint observations that are consistent with its possible sensor readings

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Relationship Between Decision Theoretic Models State Space Belief Space MDPPOMDP ? Distribution over Belief Space

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Models of Multi-Agent Systems Partially observable stochastic games Generalization of stochastic games to partially observable worlds Related models DEC-POMDP [Bernstein et al., 2000] MTDP [Pynadath and Tambe, 2002] I-POMDP [Gmystrasiewicz and Doshi, 2004] POIPSG [Peshkin et al., 2000]

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Partially Observable Stochastic Games POSG = {I, S, A, Z, T, R, O} I is the set of agents, I= {1, …,n} S is the set of states A is the set of actions, A= A 1   A n Z is the set of observations, Z= Z 1   Z n T is the transition function, T: S  A  S R is the reward function, R: S  A   O are the observation emission probabilities O: S  Z  A  [0,1]

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Solving POSGs POSGs are computationally infeasible to solve

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Solving POSGs Full POSG One-Step Lookahead Game at time t (Bayesian Game) We can approximate a POSG as a series of smaller Bayesian games

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Bayesian Games Private information relevant to game Uncertainty in utility Type Encapsulates private information Will limit selves to games with finite number of types In robot example Type 1: Robot doesn ’ t see anything Type 2: Robot sees intruder at location x

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Bayesian Games BG = {I, , A,p(  ), u}  is the joint type space,  =  1    n  is a specific joint type,  = {  1, …,  n } p(  ) is common prior on the distribution over  u is the utility function, u= {u 1, …,u n } u i (a i,a -i,(  i,  -i ))  i is a strategy for player i Defines what player i does for each of its possible types Actions are individual actions, not joint actions

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Bayesian-Nash Equilibrium Set of best response strategies Each agent tries to maximize its expected utility conditioned on its probability distribution over the other agents ’ types p(  ) Each agent has a policy  i that, given  -i, maximizes u i (  i,  -i,  -i )

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo POSG to Bayesian Game Approximation {I,S,A,Z,T,R,O} to {I, , A,p(  ), u} t I = I A = A Type space  i t = all possible histories of agent i ’ s actions and observations up to time t p(  ) t calculated from S 0,A,T,Z,O,  t-1 Prune low probability types Each joint type  maps to a joint belief u given by heuristic and u i = u j QMDP

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Algorithm Initialize t=0, h i = {},p(  0 )  0 =solveGame(  0,p(  0 )) Make Observation h i = obs i t U a i t-1 U h i Determine Type  i t = bestMatch(h i,  i t ) Execute Action a i t =  i t (  i t ) Propagate Forward  t+1,p(  t+1 ) Find Policy for t+1  t+1 =solveGame(  t,p(  t )) t= t+1 Agent i Initialize t=0, h j = {},p(  0 )  0 =solveGame(  0,p(  0 )) Make Observation h j = obs j t U a j t-1 U h j Determine Type  j t = bestMatch(h j,  2 t ) Execute Action a j t =  j t (  j t ) Propagate Forward  t+1,p(  t+1 ) Find Policy for t+1  t+1 =solveGame(  t,p(  t )) t= t+1 Agent j

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Robotic Team Tag Version of Team Tag Environment is portion of Gates Hall Full teammate observability Opponent can be captured by a single robot in any state QMDP used as heuristic Two pioneer-class robots

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Robot Policies

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Lady And The Tiger [Nair et al. 2003]

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Contributions Algorithm for finding approximate solutions to POSG with common payoffs Tractability achieved by modeling POSG as a sequence of Bayesian games Performs comparably to the full POSG for a small finite-horizon problem Improved performance over ‘ blind ’ application of utility heuristic in more complex problems Successful real-time game-theoretic controller for indoor robots

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Questions?

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Back-Up Slides

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Lady And The Tiger [Nair et al. 2003]

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Robotic Team Tag I = {1,2} S = S 1 X S 2 X S opponent S i = {s 0, …,s 28 }, s opponent = {s 0, …,s 28,s tagged } |S| = A i = {N,S,E,W,Tag} Z i = [{s i,-1},s -i,a -i ] T: adjacent cells O: see opponent if on same cell R: minimize capture time Modified from [Pineau et al. 2003]

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Environment

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Robotic Team Tag Results

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo Robotic Team Tag Results