Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Motivating Domains Disaster Rescue Sensor Networks  Characteristics of Domains: Uncertainty Coordinating multiple agents Sequential decision making

Meeting the challenges Problem: Multiple agents coordinating to perform multiple tasks in presence of uncertainty Sol: Represent as Distributed POMDPs and solve NEXP Complete for optimal solution Approximate algorithm to dynamically exploit structure in interactions Result: Vast improvement in performance over existing algorithms

Outline Illustrative Domain Model Approach: Exploit dynamic structure in interactions Results

Illustrative Domain  Multiple types of robots  Uncertainty in movements  Reward Saving victims Collisions Clearing debris  Maximize expected joint reward

Model DisPOMDPs with Coordination Locales, DPCL Joint model: Global state represents completion of tasks Agents independent except in coordination locales, CLs Two types of CLs: Same time CL (Ex: Agents colliding with each other) Future time CL (Ex: Cleaner robot cleaning the debris assists rescue robot in reaching the goal) Individual observability

Solving DPCLs with TREMOR Teams REshaping of MOdels for Rapid execution Two steps: 1. Branch and Bound search MDP based heuristics 2. Task Assignment evaluation By computing policies for every agent Perform only joint policy computation at CLs

1. Branch and Bound search

2. Task Assignment Evaluation  Until convergence of policies or maximum iterations: 1) Solve individual POMDPs 2) Identify potential coordination locales 3) Based on type and value of coordination :  Shape P and R of relevant individual agents  Capture interactions  Encourage/Discourage interactions 4) Go to step 1

Identifying potential CLs CL = Probability of CL occurring at a time step, T Given starting belief Standard belief update given policy Policy over belief states Probability of observing w, in belief state “b” Updating “b”

Type of CL STCL, if there exists “s” and “a” for which Transition/Reward function not decomposable, P(s,a,s’) ≠ Π 1≤i≤N P((s g,s i ),a i,(s g ’,s i ’)) OR R(s,a,s’) ≠ Σ 1≤i≤N R((s g,s i ),a i,(s g ’,s i ’)) FTCL, Completion of task (global state) by an agent at t’ affects transitions/rewards of other agents at t

Shaping Model (STCL) Shaping transition function Shaping reward function Joint transition probability when CL occurs New transition probability for agent “i”

Results Benchmark Algorithms Independent POMDPs Memory Bounded Dynamic Programming (MBDP) Criterion Decision quality Run-time Parameters: (i) agents; (ii) CLs; (iii) states; (iv) horizon

State space

Agents

Coordination Locales

Time Horizon

Related work Existing Research DEC-MDPs Assuming individual or collective full observability Task allocation and dependencies as input DEC-POMDPs JESP MBDP Exploiting independence in transition/reward/observation. Model Shaping Guestrin and Gordon, 2002

Conclusion DPCL, a specialization of Distributed POMDPs TREMOR exploits presence of few CLs in domains TREMOR depends on single agent POMDP solvers Results: TREMOR outperformed DisPOMDP algorithms, except in tightly coupled small problems

Questions?

Same Time CL (STCL) There is an STCL, if Transition function not decomposable, OR P(s,a,s’) ≠ Π 1≤i≤N P((s g,s i ),a i,(s g ’,s i ’)) Observation function not decomposable, OR O(s’,a,o) ≠ Π 1≤i≤N O(o i,a i,(s g ’,s i ’)) Reward function not decomposable R(s,a,s’) ≠ Σ 1≤i≤N R((s g,s i ),a i,(s g ’,s i ’)) Ex: Two robots colliding in a narrow corridor

Future Time CL Actions of one agent at “ t’ ” can affect transitions OR observations OR rewards of other agents at “ t ” P((s t g,s t i ),a t i,(s t g ’,s t i ’)|a j t’ ) ≠ P((s t g,s t i ),a t i,(s t g ’,s t i ’)), ¥ t’ < t R((s t g,s t i ),a t i,(s t g ’,s t i ’)|a j t’ ) ≠ R((s t g,s t i ),a t i,(s t g ’,s t i ’)), ¥ t’ < t O(w t i,a t i,(s t g ’,s t i ’)|a j t’ ) ≠ O(w t i,a t i,(s t g ’,s t i ’)), ¥ t’ < t Ex: Clearing of debris assists rescue robots in getting to victims faster

Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Similar presentations

Presentation on theme: "Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Similar presentations

Presentation on theme: "Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe."— Presentation transcript:

Similar presentations

About project

Feedback