Yifeng Zeng Aalborg University Denmark

Slides:

Advertisements

Similar presentations

SAMSI Discussion Session Random Sets/ Point Processes in Multi-Object Tracking: Vo Dr Daniel Clark EECE Department Heriot-Watt University UK.

Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Fast Algorithms For Hierarchical Range Histogram Constructions

Partially Observable Markov Decision Process (POMDP)

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

SARSOP Successive Approximations of the Reachable Space under Optimal Policies Devin Grady 4 April 2013.

CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Compressing Mental Model Spaces and Modeling Human Strategic Intent.

Fast approximate POMDP planning: Overcoming the curse of history! Joelle Pineau, Geoff Gordon and Sebastian Thrun, CMU Point-based value iteration: an.

Optimal Policies for POMDP Presented by Alp Sardağ.

Markov Models for Multi-Agent Coordination Maayan Roth Multi-Robot Reading Group April 13, 2005.

1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National.

MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.

Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration.

Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Planning under Uncertainty

POMDPs: Partially Observable Markov Decision Processes Advanced AI

Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Incremental Pruning CSE 574 May 9, 2003 Stanley Kok.

CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML)

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.

Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.

Instructor: Vincent Conitzer

Influence Diagrams for Robust Decision Making in Multiagent Settings.

MAKING COMPLEX DEClSlONS

Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.

Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

History-Dependent Graphical Multiagent Models Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan, USA.

CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Planning and Execution with Phase Transitions Håkan L. S. Younes Carnegie Mellon University Follow-up paper to Younes & Simmons’ “Solving Generalized Semi-Markov.

Generalized and Bounded Policy Iteration for Finitely Nested Interactive POMDPs: Scaling Up Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University.

Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.

Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.

Using Reinforcement Learning to Model True Team Behavior in Uncertain Multiagent Settings in Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS.

A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.

1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 8: Dynamic Programming – Value Iteration Dr. Itamar Arel College of Engineering Department.

Twenty Second Conference on Artificial Intelligence AAAI 2007 Improved State Estimation in Multiagent Settings with Continuous or Large Discrete State.

The set of SE models include s those that are BE. It further includes models that include identical distributions over the subject agent’s action observation.

Learning Team Behavior Using Individual Decision Making in Multiagent Settings Using Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS Department.

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

1 Automated Planning and Decision Making 2007 Automated Planning and Decision Making Prof. Ronen Brafman Various Subjects.

On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.

Partial Observability “Planning and acting in partially observable stochastic domains” Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra;

Dense-Region Based Compact Data Cube

Keep the Adversary Guessing: Agent Security by Policy Randomization

Root Finding Methods Fish 559; Lecture 15 a.

CS b659: Intelligent Robotics

POMDPs Logistics Outline No class Wed

Analytics and OR DP- summary.

Reinforcement Learning in POMDPs Without Resets

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Biomedical Data & Markov Decision Process

Analysis and design of algorithm

Markov Decision Processes

Introduction to particle filter

Markov Decision Processes

Particle Filtering ICS 275b 2002.

Introduction to particle filter

CASE − Cognitive Agents for Social Environments

Approximate POMDP planning: Overcoming the curse of history!

Chapter 17 – Making Complex Decisions

CS 416 Artificial Intelligence

Topological Signatures For Fast Mobility Analysis

Reinforcement Learning Dealing with Partial Observability

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

Yifeng Zeng Aalborg University Denmark Twenty Second Conference on Artificial Intelligence (AAAI’07) Approximate Solutions of Interactive Dynamic Influence Diagrams Using Model Clustering Yifeng Zeng Aalborg University Denmark Prashant Doshi Univ. of Georgia USA Qiongyu Chen National University of Singapore

Outline Interactive Dynamic Influence Diagrams (I-DIDs) Curses of History and Dimensionality Model Clustering Computational Savings and Error Bound Experimental Results

Interactive Dynamic Influence Diagrams (I-DIDs). (Doshi et al Interactive Dynamic Influence Diagrams (I-DIDs) (Doshi et al. AAMAS’07) Graphical models for decision-making in multiagent settings Sequential decision-making over multiple time steps in multiagent settings Generalize dynamic IDs to multiagent domains Differ from MAIDs (Koller&Milch01) and NIDs (Gal&Pfeffer04) Online solutions to I-POMDPs (Gmytrasiewicz&Doshi, JAIR’05) Allow nested modeling of agents

Overview of I-ID Ri Ai A generic level l Interactive-ID (I-ID) for agent i situated with one other agent j Model Node: Mj,l-1 Models of agent j at level l-1 Policy link: dashed line Distribution over the other agent’s actions given its models Beliefs on Mj,l-1 P(Mj,l-1|s) Update? Aj Mj,l-1 Level l I-ID S Oi

Details of the Model Node Members of the model node Different chance nodes are solutions of models mj,l-1 Mod[Mj] represents the different models of agent j CPT of the chance node Aj is a multiplexer Assumes the distribution of each of the action nodes (Aj1, Aj2) depending on the value of Mod[Mj] Mj,l-1 Aj S Mod[Mj] mj,l-11 Aj1 mj,l-11, mj,l-12 could be I-IDs or IDs mj,l-12 Aj2

Interactive Dynamic Influence Diagrams (I-DIDs) Ri Ait+1 Ri Oit+1 St+1 Ajt+1 Mj,l-1t+1 Ait Ajt St Oit Mj,l-1t Model Update Link

Semantics of Model Update Link Ajt+1 Mj,l-1t+1 Ajt st+1 Mj,l-1t Mod[Mjt+1] st mj,l-1t+1,1 Aj1 Mod[Mjt] mj,l-1t+1,2 Oj Aj2 mj,l-1t+1,3 mj,l-1t,1 Aj3 Aj1 Oj1 mj,l-1t+1,4 mj,l-1t,2 Aj4 Aj2 Oj2 These models differ in their initial beliefs, each of which is the result of j updating its beliefs due to its actions and possible observations

Curses of History and Dimensionality Primary complexity of solving I-DIDs is due to the large number of models that must be solved over time Curse of dimensionality At time step t: Nested property of modeling More Agents N+1 agent setting: (NM)l models (M is bounded # of models at each level) Curse of history of agent j

Model Clustering Idea: Prune the model space to K representative models from M candidate models, K << M, at each time step Approach Cluster Models k-means clustering method (MacQueen67) Note: k is not equal to K Clusters contain models that are likely behaviorally equivalent Select K representative models from the clusters

Selection of Initial Means Facilitate clustering of behaviorally equivalent models Behaviorally equivalent regions Prescribe the same optimal behavior for j [0,0.1], [0.1,0.9], [0.9,1] Select region boundary points as initial means 0, 0.1, 0.9, 1 10 -1 Value L OL OR 0.1 0.9 1 P(TR) Sensitivity points

Selection of Initial Means Sensitivity points Models that induce policies that are different from those by surrounding models Vertices of the belief simplex One dimension: 0, 1 Two dimensions: [0,0], [0,1],[1,0], and [1,1]

LP for Computing Sensitivity Points SPs are non-dominated points on intersections between value functions SP Non-dominated Intersection

Example of Iterative Clustering P(TR) 0.1 0.9 1 Initial Means Iteration 1 . . Iteration n Select K=10

K Model Selection Algorithm Compute SPs Clustering Select Initial Means Selection Compute SPs Cluster models Re-compute means Select K nearest models

Approximate Solution of I-DID Exact algorithm Expansion phase Expand all M models over time Look-ahead phase Approximation – Modify exact algorithm Prune model space using KModelSelection Maintain only K models over time

Computational Savings and Error Bound (NM)l V.S. (NK)l M grows exponentially over time Retain K models (Mk) and discard M-K models (M/K) Error bounded by finding the model among the K retained models that is the closest to the discarded one (PBVI; Pineau et al. 03)

Error Bound Let Error bound for agent j Expected error bound for agent i

Empirical Results Two Problem Domains Comparison with Measure Multiagent tiger Multiagent machine maintenance Comparison with Exact solution of I-DID for different M Interactive particle filtering on I-DID Measure Average rewards solving the level 1 I-DIDs Variance over 50 runs Run time

Run Time Comparison Slower than the I-PF Solve I-DIDs up to 8 horizons Reason: convergence step Solve I-DIDs up to 8 horizons Pro. Tiger Machine Exact 83.6s 99.2s K=20 K=50 MC 3.8s 10.5s 6.2s 18.7s I-PF 3.9s 9.5s 4.3s 10.8s

Future Work Variants of model clustering Application domains Compose our package for I-DIDs

Thank You!

Together: I-ID Ri Ai S Aj Oi Mod[Mj] Aj1 Aj2 mj,l-11 mj,l-12

Notes Updated set of models at time step (t+1) will have at most models :number of models at time step t :largest space of actions :largest space of observations New distribution over the updated models uses original distribution over the models probability of the other agent performing the action, and receiving the observation that led to the updated model

Exact Solution

Oit Ait Ri St Ait+1 Ri Oit+1 St+1 Ajt Oj Ajt+1 Mod[Mjt] Mod[Mjt+1] mj,l-1t+1,1 Aj1 mj,l-1t+1,2 Aj2 mj,l-1t+1,3 mj,l-1t,1 Aj2 Aj1 Oj1 mj,l-1t+1,4 mj,l-1t,2 Aj2 Aj2 Oj2

One Example

K Model Selection Initial Means Iteration Selection Sensitivity points + Vertices of the belief simplex Iteration Re-compute the cluster mean Assign new models to clusters Selection Select K models Kn: In proportion to the size of cluster n