Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

Slides:



Advertisements
Similar presentations
Stationary Probability Vector of a Higher-order Markov Chain By Zhang Shixiao Supervisors: Prof. Chi-Kwong Li and Dr. Jor-Ting Chan.
Advertisements

College of Information Technology & Design
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Partially Observable Markov Decision Process (POMDP)
Title: Intelligent Agents A uthor: Michael Woolridge Chapter 1 of Multiagent Systems by Weiss Speakers: Tibor Moldovan and Shabbir Syed CSCE976, April.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
SARSOP Successive Approximations of the Reachable Space under Optimal Policies Devin Grady 4 April 2013.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Compressing Mental Model Spaces and Modeling Human Strategic Intent.
Optimal Policies for POMDP Presented by Alp Sardağ.
Meeting 3 POMDP (Partial Observability MDP) 資工四 阮鶴鳴 李運寰 Advisor: 李琳山教授.
CS594 Automated decision making University of Illinois, Chicago
1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
An Introduction to Markov Decision Processes Sarah Hickmott
Partially Observable Markov Decision Process By Nezih Ergin Özkucur.
主講人:虞台文 大同大學資工所 智慧型多媒體研究室
Planning under Uncertainty
1 Policies for POMDPs Minqing Hu. 2 Background on Solving POMDPs MDPs policy: to find a mapping from states to actions POMDPs policy: to find a mapping.
POMDPs: Partially Observable Markov Decision Processes Advanced AI
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
Agent-Based Coordination of Sensor Networks Alex Rogers School of Electronics and Computer Science University of Southampton
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider.
An Introduction to PO-MDP Presented by Alp Sardağ.
Incremental Pruning: A simple, Fast, Exact Method for Partially Observable Markov Decision Processes Anthony Cassandra Computer Science Dept. Brown University.
Incremental Pruning CSE 574 May 9, 2003 Stanley Kok.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Predictive State Representation Masoumeh Izadi School of Computer Science McGill University UdeM-McGill Machine Learning Seminar.
Multi-Agent Model to Multi-Process Transformation A Housing Market Case Study Gerhard Zimmermann Informatik University of Kaiserslautern.
Influence Diagrams for Robust Decision Making in Multiagent Settings.
MAKING COMPLEX DEClSlONS
Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 15: Partially Observable Markov Decision Processes (POMDPs) Dr. Itamar Arel College.
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
GaTAC: A Scalable and Realistic Testbed for Multiagent Decision Making Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University of Georgia Athens,
Generalized and Bounded Policy Iteration for Finitely Nested Interactive POMDPs: Scaling Up Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University.
Dynamic Programming for Partially Observable Stochastic Games Daniel S. Bernstein University of Massachusetts Amherst in collaboration with Christopher.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering.
Using Reinforcement Learning to Model True Team Behavior in Uncertain Multiagent Settings in Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS.
Solving POMDPs through Macro Decomposition
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
Twenty Second Conference on Artificial Intelligence AAAI 2007 Improved State Estimation in Multiagent Settings with Continuous or Large Discrete State.
The set of SE models include s those that are BE. It further includes models that include identical distributions over the subject agent’s action observation.
Learning Team Behavior Using Individual Decision Making in Multiagent Settings Using Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS Department.
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.
Partial Observability “Planning and acting in partially observable stochastic domains” Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra;
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.
POMDP We maximize the expected cummulated reward.
AnyLogic Introductory Presentation
Yifeng Zeng Aalborg University Denmark
Keep the Adversary Guessing: Agent Security by Policy Randomization
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
POMDPs Logistics Outline No class Wed
Analytics and OR DP- summary.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Course: Autonomous Machine Learning
Markov Decision Processes
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Markov Decision Processes
ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 15: Partially Observable Markov Decision Processes (POMDPs) November 5, 2015 Dr.
Chapter 17 – Making Complex Decisions
Heuristic Search Value Iteration
Reinforcement Learning Dealing with Partial Observability
Reinforcement Nisheeth 18th January 2019.
ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 15: Partially Observable Markov Decision Processes (POMDPs) November 5, 2015 Dr.
Presentation transcript:

Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence Speaker Prashant Doshi University of Georgia Authors B. Rathnasabapathy, Prashant Doshi, and Piotr Gmytrasiewicz

2 Overview ● I-POMDP – Framework for sequential decision making for an agent in a multi-agent setting – Takes the perspective of an individual in an interaction ● Problem – Cardinality of the interactive state space → infinite ● Other agent's models (incl. beliefs) are part of an agent's state space (interactive epistemology) ● An algorithm for solving I-POMDPs exactly – Aggregate behaviorally equivalent models of other agents

3 Background – Properties of POMDPs and I-POMDPs Finitely nested –Beliefs are nested up to a finite strategic level l –Level 0 models are POMDPs Value function of POMDP and finitely nested I- POMDP is piecewise linear and convex (PWLC) Agents’ behaviors in POMDP and finitely nested I- POMDP can be represented using policy trees

4 Interactive POMDPs Definition Interactive state space –S: set of physical states : set of intentional models : set of subintentional models –Intentional models contain the other agent’s beliefs

5 Example: Single-Agent Tiger Problem ?

6 Behaviorally Equivalent Models P1 P2 P3 Equivalence Classes of Beliefs

7 Equivalence Classes of Interactive States Definition –Combination of a physical state and an equivalence class of models

8 Lossless Aggregation In a finitely nested I-POMDP, a probability distribution over, provides a sufficient statistic for the past history of i’s observations Transformation of the interactive state space into behavioral equivalence classes is value-preserving Optimal policy of the transformed finitely nested I- POMDP remains unchanged

9 Solving I-POMDPs Exactly Procedure Solve-IPOMDP ( AGENT i, Belief Nesting L ) : Returns Policy If L = 0 Then Return { Policy : = Solve-POMDP ( AGENT i ) } Else For all AGENT j AGENT i Policy j : = Solve-IPOMDP( AGENT j, L-1) End M j := Behavioral-Equivalence-Models(Policy j ) ECIS i : = S x { x j M j } Policy : = Modified-GIP(ECIS i, A i, T i, Ω i, O i, R i ) Return Policy End

10 Multi-Agent Persistent-Tiger Problem {Growl Left, Growl Right} X {Creak Right, Creak Left, Silence}

11 Beliefs on ECIS Agent j’s policy

Agent i’s policy in the presence of another agent j Policy becomes diverse as i’s ability of observing j’s actions improves

13

14 ● A method that enables exact solution of finitely nested interactive POMDPs ● Aggregate agent models into behavioral equivalence classes – Discretization is lossless ● Interesting behaviors emerge in the multi-agent Tiger problem Conclusions

Thank You and Please Stop by my Poster Questions