Yifeng Zeng Aalborg University Denmark

Yifeng Zeng Aalborg University Denmark
Twenty Second Conference on Artificial Intelligence (AAAI’07) Approximate Solutions of Interactive Dynamic Influence Diagrams Using Model Clustering Yifeng Zeng Aalborg University Denmark Prashant Doshi Univ. of Georgia USA Qiongyu Chen National University of Singapore

Outline Interactive Dynamic Influence Diagrams (I-DIDs)
Curses of History and Dimensionality Model Clustering Computational Savings and Error Bound Experimental Results

Interactive Dynamic Influence Diagrams (I-DIDs). (Doshi et al
Interactive Dynamic Influence Diagrams (I-DIDs) (Doshi et al. AAMAS’07) Graphical models for decision-making in multiagent settings Sequential decision-making over multiple time steps in multiagent settings Generalize dynamic IDs to multiagent domains Differ from MAIDs (Koller&Milch01) and NIDs (Gal&Pfeffer04) Online solutions to I-POMDPs (Gmytrasiewicz&Doshi, JAIR’05) Allow nested modeling of agents

Overview of I-ID Ri Ai A generic level l Interactive-ID (I-ID) for agent i situated with one other agent j Model Node: Mj,l-1 Models of agent j at level l-1 Policy link: dashed line Distribution over the other agent’s actions given its models Beliefs on Mj,l-1 P(Mj,l-1|s) Update? Aj Mj,l-1 Level l I-ID S Oi

Details of the Model Node
Members of the model node Different chance nodes are solutions of models mj,l-1 Mod[Mj] represents the different models of agent j CPT of the chance node Aj is a multiplexer Assumes the distribution of each of the action nodes (Aj1, Aj2) depending on the value of Mod[Mj] Mj,l-1 Aj S Mod[Mj] mj,l-11 Aj1 mj,l-11, mj,l-12 could be I-IDs or IDs mj,l-12 Aj2

Interactive Dynamic Influence Diagrams (I-DIDs)
Ri Ait+1 Ri Oit+1 St+1 Ajt+1 Mj,l-1t+1 Ait Ajt St Oit Mj,l-1t Model Update Link

Semantics of Model Update Link
Ajt+1 Mj,l-1t+1 Ajt st+1 Mj,l-1t Mod[Mjt+1] st mj,l-1t+1,1 Aj1 Mod[Mjt] mj,l-1t+1,2 Oj Aj2 mj,l-1t+1,3 mj,l-1t,1 Aj3 Aj1 Oj1 mj,l-1t+1,4 mj,l-1t,2 Aj4 Aj2 Oj2 These models differ in their initial beliefs, each of which is the result of j updating its beliefs due to its actions and possible observations

Curses of History and Dimensionality
Primary complexity of solving I-DIDs is due to the large number of models that must be solved over time Curse of dimensionality At time step t: Nested property of modeling More Agents N+1 agent setting: (NM)l models (M is bounded # of models at each level) Curse of history of agent j

Model Clustering Idea:
Prune the model space to K representative models from M candidate models, K << M, at each time step Approach Cluster Models k-means clustering method (MacQueen67) Note: k is not equal to K Clusters contain models that are likely behaviorally equivalent Select K representative models from the clusters

Selection of Initial Means
Facilitate clustering of behaviorally equivalent models Behaviorally equivalent regions Prescribe the same optimal behavior for j [0,0.1], [0.1,0.9], [0.9,1] Select region boundary points as initial means 0, 0.1, 0.9, 1 10 -1 Value L OL OR 0.1 0.9 1 P(TR) Sensitivity points

Selection of Initial Means
Sensitivity points Models that induce policies that are different from those by surrounding models Vertices of the belief simplex One dimension: 0, 1 Two dimensions: [0,0], [0,1],[1,0], and [1,1]

LP for Computing Sensitivity Points
SPs are non-dominated points on intersections between value functions SP Non-dominated Intersection

Example of Iterative Clustering
P(TR) 0.1 0.9 1 Initial Means Iteration 1 . . Iteration n Select K=10

K Model Selection Algorithm Compute SPs Clustering
Select Initial Means Selection Compute SPs Cluster models Re-compute means Select K nearest models

Approximate Solution of I-DID
Exact algorithm Expansion phase Expand all M models over time Look-ahead phase Approximation – Modify exact algorithm Prune model space using KModelSelection Maintain only K models over time

Computational Savings and Error Bound
(NM)l V.S. (NK)l M grows exponentially over time Retain K models (Mk) and discard M-K models (M/K) Error bounded by finding the model among the K retained models that is the closest to the discarded one (PBVI; Pineau et al. 03)

Error Bound Let Error bound for agent j
Expected error bound for agent i

Empirical Results Two Problem Domains Comparison with Measure
Multiagent tiger Multiagent machine maintenance Comparison with Exact solution of I-DID for different M Interactive particle filtering on I-DID Measure Average rewards solving the level 1 I-DIDs Variance over 50 runs Run time

Run Time Comparison Slower than the I-PF Solve I-DIDs up to 8 horizons
Reason: convergence step Solve I-DIDs up to 8 horizons Pro. Tiger Machine Exact 83.6s 99.2s K=20 K=50 MC 3.8s 10.5s 6.2s 18.7s I-PF 3.9s 9.5s 4.3s 10.8s

Future Work Variants of model clustering Application domains
Compose our package for I-DIDs

Thank You!

Together: I-ID Ri Ai S Aj Oi Mod[Mj] Aj1 Aj2 mj,l-11 mj,l-12

Notes Updated set of models at time step (t+1) will have at most models :number of models at time step t :largest space of actions :largest space of observations New distribution over the updated models uses original distribution over the models probability of the other agent performing the action, and receiving the observation that led to the updated model

Exact Solution

Oit Ait Ri St Ait+1 Ri Oit+1 St+1 Ajt Oj Ajt+1 Mod[Mjt] Mod[Mjt+1] mj,l-1t+1,1 Aj1 mj,l-1t+1,2 Aj2 mj,l-1t+1,3 mj,l-1t,1 Aj2 Aj1 Oj1 mj,l-1t+1,4 mj,l-1t,2 Aj2 Aj2 Oj2

One Example

K Model Selection Initial Means Iteration Selection
Sensitivity points + Vertices of the belief simplex Iteration Re-compute the cluster mean Assign new models to clusters Selection Select K models Kn: In proportion to the size of cluster n

Yifeng Zeng Aalborg University Denmark

Similar presentations

Presentation on theme: "Yifeng Zeng Aalborg University Denmark"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Yifeng Zeng Aalborg University Denmark

Similar presentations

Presentation on theme: "Yifeng Zeng Aalborg University Denmark"— Presentation transcript:

Similar presentations

About project

Feedback