Download presentation
Presentation is loading. Please wait.
Published byMeghan Hodge Modified over 9 years ago
1
Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe
2
Motivating Domains Disaster Rescue Sensor Networks Characteristics of Domains: Uncertainty Coordinating multiple agents Sequential decision making
3
Meeting the challenges Problem: Multiple agents coordinating to perform multiple tasks in presence of uncertainty Sol: Represent as Distributed POMDPs and solve NEXP Complete for optimal solution Approximate algorithm to dynamically exploit structure in interactions Result: Vast improvement in performance over existing algorithms
4
Outline Illustrative Domain Model Approach: Exploit dynamic structure in interactions Results
5
Illustrative Domain Multiple types of robots Uncertainty in movements Reward Saving victims Collisions Clearing debris Maximize expected joint reward
6
Model DisPOMDPs with Coordination Locales, DPCL Joint model: Global state represents completion of tasks Agents independent except in coordination locales, CLs Two types of CLs: Same time CL (Ex: Agents colliding with each other) Future time CL (Ex: Cleaner robot cleaning the debris assists rescue robot in reaching the goal) Individual observability
7
Solving DPCLs with TREMOR Teams REshaping of MOdels for Rapid execution Two steps: 1. Branch and Bound search MDP based heuristics 2. Task Assignment evaluation By computing policies for every agent Perform only joint policy computation at CLs
8
1. Branch and Bound search
9
2. Task Assignment Evaluation Until convergence of policies or maximum iterations: 1) Solve individual POMDPs 2) Identify potential coordination locales 3) Based on type and value of coordination : Shape P and R of relevant individual agents Capture interactions Encourage/Discourage interactions 4) Go to step 1
10
Identifying potential CLs CL = Probability of CL occurring at a time step, T Given starting belief Standard belief update given policy Policy over belief states Probability of observing w, in belief state “b” Updating “b”
11
Type of CL STCL, if there exists “s” and “a” for which Transition/Reward function not decomposable, P(s,a,s’) ≠ Π 1≤i≤N P((s g,s i ),a i,(s g ’,s i ’)) OR R(s,a,s’) ≠ Σ 1≤i≤N R((s g,s i ),a i,(s g ’,s i ’)) FTCL, Completion of task (global state) by an agent at t’ affects transitions/rewards of other agents at t
12
Shaping Model (STCL) Shaping transition function Shaping reward function Joint transition probability when CL occurs New transition probability for agent “i”
13
Results Benchmark Algorithms Independent POMDPs Memory Bounded Dynamic Programming (MBDP) Criterion Decision quality Run-time Parameters: (i) agents; (ii) CLs; (iii) states; (iv) horizon
14
State space
15
Agents
16
Coordination Locales
17
Time Horizon
18
Related work Existing Research DEC-MDPs Assuming individual or collective full observability Task allocation and dependencies as input DEC-POMDPs JESP MBDP Exploiting independence in transition/reward/observation. Model Shaping Guestrin and Gordon, 2002
19
Conclusion DPCL, a specialization of Distributed POMDPs TREMOR exploits presence of few CLs in domains TREMOR depends on single agent POMDP solvers Results: TREMOR outperformed DisPOMDP algorithms, except in tightly coupled small problems
20
Questions?
21
Same Time CL (STCL) There is an STCL, if Transition function not decomposable, OR P(s,a,s’) ≠ Π 1≤i≤N P((s g,s i ),a i,(s g ’,s i ’)) Observation function not decomposable, OR O(s’,a,o) ≠ Π 1≤i≤N O(o i,a i,(s g ’,s i ’)) Reward function not decomposable R(s,a,s’) ≠ Σ 1≤i≤N R((s g,s i ),a i,(s g ’,s i ’)) Ex: Two robots colliding in a narrow corridor
22
Future Time CL Actions of one agent at “ t’ ” can affect transitions OR observations OR rewards of other agents at “ t ” P((s t g,s t i ),a t i,(s t g ’,s t i ’)|a j t’ ) ≠ P((s t g,s t i ),a t i,(s t g ’,s t i ’)), ¥ t’ < t R((s t g,s t i ),a t i,(s t g ’,s t i ’)|a j t’ ) ≠ R((s t g,s t i ),a t i,(s t g ’,s t i ’)), ¥ t’ < t O(w t i,a t i,(s t g ’,s t i ’)|a j t’ ) ≠ O(w t i,a t i,(s t g ’,s t i ’)), ¥ t’ < t Ex: Clearing of debris assists rescue robots in getting to victims faster
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.