The set of SE models include s those that are BE. It further includes models that include identical distributions over the subject agent’s action observation.

Slides:



Advertisements
Similar presentations
Autonomic Scaling of Cloud Computing Resources
Advertisements

Dialogue Policy Optimisation
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Partially Observable Markov Decision Process (POMDP)
Modeling Maze Navigation Consider the case of a stationary robot and a mobile robot moving towards a goal in a maze. We can model the utility of sharing.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Compressing Mental Model Spaces and Modeling Human Strategic Intent.
11 Planning and Learning Week #9. 22 Introduction... 1 Two types of methods in RL ◦Planning methods: Those that require an environment model  Dynamic.
Optimal Policies for POMDP Presented by Alp Sardağ.
1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National.
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
Individual Localization and Tracking in Multi-Robot Settings with Dynamic Landmarks Anousha Mesbah Prashant Doshi Prashant Doshi University of Georgia.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
Planning under Uncertainty
POMDPs: Partially Observable Markov Decision Processes Advanced AI
Temporal Action-Graph Games: A New Representation for Dynamic Games Albert Xin Jiang University of British Columbia Kevin Leyton-Brown University of British.
In practice, we run into three common issues faced by concurrent optimization algorithms. We alter our model-shaping to mitigate these by reasoning about.
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
WiOpt’03: Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks March 3-5, 2003, INRIA Sophia-Antipolis, France Session : Energy Efficiency.
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider.
AFOSR MURI. Salem, MA. June 4, /10 Coordinated UAV Operations: Perspectives and New Results Vishwesh Kulkarni Joint Work with Jan De Mot, Sommer.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
A Multi-Agent Learning Approach to Online Distributed Resource Allocation Chongjie Zhang Victor Lesser Prashant Shenoy Computer Science Department University.
Decentralised Coordination of Mobile Sensors School of Electronics and Computer Science University of Southampton Ruben Stranders,
Influence Diagrams for Robust Decision Making in Multiagent Settings.
Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.
Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.
Chen Cai, Benjamin Heydecker Presentation for the 4th CREST Open Workshop Operation Research for Software Engineering Methods, London, 2010 Approximate.
Department of Computer Science Christopher Amato Carnegie Mellon University Feb 5 th, 2010 Increasing Scalability in Algorithms for Centralized and Decentralized.
History-Dependent Graphical Multiagent Models Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan, USA.
Approach Participants controlled a UAV, flying over the Bagram Air Force Base, and were tasked with the goal of reaching a designated target. Success was.
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
GaTAC: A Scalable and Realistic Testbed for Multiagent Decision Making Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University of Georgia Athens,
Generalized and Bounded Policy Iteration for Finitely Nested Interactive POMDPs: Scaling Up Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
1 A Heuristic Approach Towards Solving the Software Clustering Problem ICSM03 Brian S. Mitchell /
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
Using Reinforcement Learning to Model True Team Behavior in Uncertain Multiagent Settings in Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS.
Maximum a posteriori sequence estimation using Monte Carlo particle filters S. J. Godsill, A. Doucet, and M. West Annals of the Institute of Statistical.
Maximizing the lifetime of WSN using VBS Yaxiong Zhao and Jie Wu Computer and Information Sciences Temple University.
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
Twenty Second Conference on Artificial Intelligence AAAI 2007 Improved State Estimation in Multiagent Settings with Continuous or Large Discrete State.
Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.
Learning Team Behavior Using Individual Decision Making in Multiagent Settings Using Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS Department.
Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
1 Automated Planning and Decision Making 2007 Automated Planning and Decision Making Prof. Ronen Brafman Various Subjects.
On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.
Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):
Effects of Measurement Uncertainties on Adaptive Source Characterization in Water Distribution Networks Li Liu, E. Downey Brill, G. Mahinthakumar, James.
Yifeng Zeng Aalborg University Denmark
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
MSDM AAMAS-09 Two Level Recursive Reasoning by Humans Playing Sequential Fixed-Sum Games Authors: Adam Goodie, Prashant Doshi, Diana Young Depts.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Cristian Ferent and Alex Doboli
Model Averaging with Discrete Bayesian Network Classifiers
Thank you for the introduction and good morning to all of you
 Real-Time Scheduling via Reinforcement Learning
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
Approximate POMDP planning: Overcoming the curse of history!
 Real-Time Scheduling via Reinforcement Learning
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

The set of SE models include s those that are BE. It further includes models that include identical distributions over the subject agent’s action observation paths but these models could be behaviorally distinct over those paths that have zero probability. These models are not BE and are called (strictly) Observationally Equivalent. Fig 6 shows a recursive way to compute the distribution over the subject agent’s action- observation history. Test domains – Multi-agent tiger (Fig 2) and Multi-agent machine maintenance problems. Implementation: Exact and ε-SE using HUGIN Java API for DIDs Empirical comparisons with the state-of-the-art approach DMU [3] were done and the approximation error was bounded. Solving Multi-agent Sequential Decision Making Problems Using I-DIDs Muthukumaran Chandrasekaran Computer Science Department The University of Georgia Introduction Interactive dynamic influence diagrams (I-DIDs) are graphical models for sequential decision making in uncertain settings shared by other agents. The challenge is an exponentially growing space of candidate models ascribed to other agents, over time. Redundant and/or ‘equivalent’ models are then pruned to reduce the model space. We present a new approximation technique that reduces the candidate model space by removing models that are ε- subjectively equivalent (ε-SE) with representative ones (ε is the approximation factor). Background I-DIDs have nodes (decision (rectangle), chance (oval), utility(diamond), model(hexagon)), arcs (functional, conditional, informational), links (policy (dashed), model update (dotted)) (shown in Fig 1). I- DIDs are graphical counterparts of IPOMDPs [1]. Approach Goal is to reduce the model space. First, we prune Behaviorally Equivalent [2] - whose behavioral predictions for the agent are identical – models. We further reduce the space by pruning (SE) models that induce an approximately identical distribution over the subject agent’s future action-observation history (Fig 5) and replacing them with a representative. Introduction Interactive dynamic influence diagrams (I-DIDs) are graphical models for sequential decision making in uncertain settings shared by other agents. The challenge is an exponentially growing space of candidate models ascribed to other agents, over time. Redundant and/or ‘equivalent’ models are then pruned to reduce the model space. We present a new approximation technique that reduces the candidate model space by removing models that are ε- subjectively equivalent (ε-SE) with representative ones (ε is the approximation factor). Background I-DIDs have nodes (decision (rectangle), chance (oval), utility(diamond), model(hexagon)), arcs (functional, conditional, informational), links (policy (dashed), model update (dotted)) (shown in Fig 1). I- DIDs are graphical counterparts of IPOMDPs [1]. Approach Goal is to reduce the model space. First, we prune Behaviorally Equivalent [2] - whose behavioral predictions for the agent are identical – models. We further reduce the space by pruning (SE) models that induce an approximately identical distribution over the subject agent’s future action-observation history (Fig 5) and replacing them with a representative. Discussion The quality of the solution generated using ε-SE improves as we reduce ε and approaches that of the exact solution – Indicative of flexibility (Fig 3). Compared to DMU, ε-SE obtains higher rewards for identical number of initial models – Indicative of a more informed clustering and pruning although less efficient (Fig 4). Problems: Scalability issues due to the curse of history (distribution computations are time and space consuming) References 1. P. Doshi, Y. Zeng, and Q. Chen, Graphical models for interactive POMDPs: Representations and solutions, JAAMAS, B. Rathnas., P. Doshi, and P. J. Gmytrasiewicz, Exact solutions to interactive POMDPs using behavioral equivalence, AAMAS, P. Doshi, Y. Zeng, Improved approximation of interactive dynamic influence diagrams using discriminative model updates, AAMAS, Acknowledgments I thank Dr. Prashant Doshi and Dr. Yifeng Zeng for their valuable contributions in this paper. This research is partially supported by NSF CAREER grant, IIS , and AFOSR grant, #FA , to Prof. Prashant Doshi (PI).