Distributed Planning for Large Teams Prasanna Velagapudi Thesis Committee: Katia Sycara (co-chair) Paul Scerri (co-chair) J. Andrew Bagnell Edmund H. Durfee.

Slides:



Advertisements
Similar presentations
A Hierarchical Multiple Target Tracking Algorithm for Sensor Networks Songhwai Oh and Shankar Sastry EECS, Berkeley Nest Retreat, Jan
Advertisements

Adopt Algorithm for Distributed Constraint Optimization
Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E
Probabilistic Planning (goal-oriented) Action Probabilistic Outcome Time 1 Time 2 Goal State 1 Action State Maximize Goal Achievement Dead End A1A2 I A1.
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Partially Observable Markov Decision Process (POMDP)
Modeling Maze Navigation Consider the case of a stationary robot and a mobile robot moving towards a goal in a maze. We can model the utility of sharing.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
Meta-Level Control in Multi-Agent Systems Anita Raja and Victor Lesser Department of Computer Science University of Massachusetts Amherst, MA
Markov Models for Multi-Agent Coordination Maayan Roth Multi-Robot Reading Group April 13, 2005.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.
A Hybridized Planner for Stochastic Domains Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento.
Multi-agent Planning Amin Atrash. Papers Dynamic Planning for Multiple Mobile Robots –Barry L. Brummit, Anthony Stentz OBDD-based Universal Planning:
DESIGN OF A GENERIC PATH PATH PLANNING SYSTEM AILAB Path Planning Workgroup.
1 Sensor Relocation in Mobile Sensor Networks Guiling Wang, Guohong Cao, Tom La Porta, and Wensheng Zhang Department of Computer Science & Engineering.
Planning under Uncertainty
AAMAS 2009, Budapest1 Analyzing the Performance of Randomized Information Sharing Prasanna Velagapudi, Katia Sycara and Paul Scerri Robotics Institute,
Improving Market-Based Task Allocation with Optimal Seed Schedules IAS-11, Ottawa. September 1, 2010 G. Ayorkor Korsah 1 Balajee Kannan 1, Imran Fanaswala.
Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents Prasanna Velagapudi Pradeep Varakantham Paul Scerri Katia Sycara.
Decentralized prioritized planning in large multirobot teams Prasanna Velagapudi Paul Scerri Katia Sycara Carnegie Mellon University, Robotics Institute.
In practice, we run into three common issues faced by concurrent optimization algorithms. We alter our model-shaping to mitigate these by reasoning about.
Zach Ramaekers Computer Science University of Nebraska at Omaha Advisor: Dr. Raj Dasgupta 1.
Concurrent Markov Decision Processes Mausam, Daniel S. Weld University of Washington Seattle.
1 University of Southern California Towards A Formalization Of Teamwork With Resource Constraints Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider.
Path Planning for Multi Agent Systems by Kemal Kaplan.
Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle.
Ant Colonies As Logistic Processes Optimizers
Optimizing Schedules for Prioritized Path Planning of Multi-Robot Systems Maren Bennewitz Wolfram Burgard Sebastian Thrun.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Multirobot Coordination in USAR Katia Sycara The Robotics Institute
Aeronautics & Astronautics Autonomous Flight Systems Laboratory All slides and material copyright of University of Washington Autonomous Flight Systems.
Adapting Environment-Mediated Self-Organizing Emergent Systems by Exception Rules Holger Kasinger, Bernhard Bauer, Jörg Denzinger and Tom Holvoet.
A1A1 A4A4 A2A2 A3A3 Context-Specific Multiagent Coordination and Planning with Factored MDPs Carlos Guestrin Shobha Venkataraman Daphne Koller Stanford.
Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Planning Production of a Set of Semiconductor Components with Uncertain Wafer and Component Yield Frank W. Ciarallo Assistant Professor Biomedical, Industrial.
A Framework for Distributed Model Predictive Control
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Techniques for Analysis and Calibration of Multi- Agent Simulations Manuel Fehler Franziska Klügl Frank Puppe Universität Würzburg Lehrstuhl für Künstliche.
Generalized and Bounded Policy Iteration for Finitely Nested Interactive POMDPs: Scaling Up Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University.
Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California
Dynamic Programming for Partially Observable Stochastic Games Daniel S. Bernstein University of Massachusetts Amherst in collaboration with Christopher.
CS584 - Software Multiagent Systems Lecture 12 Distributed constraint optimization II: Incomplete algorithms and recent theoretical results.
1 Distributed and Optimal Motion Planning for Multiple Mobile Robots Yi Guo and Lynne Parker Center for Engineering Science Advanced Research Computer.
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Competitive Scheduling in Wireless Networks with Correlated Channel State Ozan.
Tetris Agent Optimization Using Harmony Search Algorithm
December 20, 2015 Decentralized Mission Planning for Heterogeneous Human-Robot Teams Sameera Ponda Prof. Jonathan How Department of Aeronautics and Astronautics.
Thrust IIB: Dynamic Task Allocation in Remote Multi-robot HRI Jon How (lead) Nick Roy MURI 8 Kickoff Meeting 2007.
Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents.
A PAC-Bayesian Approach to Formulation of Clustering Objectives Yevgeny Seldin Joint work with Naftali Tishby.
A PAC-Bayesian Approach to Formulation of Clustering Objectives Yevgeny Seldin Joint work with Naftali Tishby.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
Distributed cooperation and coordination using the Max-Sum algorithm
Algorithmic and Domain Centralization in Distributed Constraint Optimization Problems John P. Davin Carnegie Mellon University June 27, 2005 Committee:
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.
Keep the Adversary Guessing: Agent Security by Policy Randomization
Data Driven Resource Allocation for Distributed Learning
Reinforcement Learning in POMDPs Without Resets
Networked Distributed POMDPs: DCOP-Inspired Distributed POMDPs
Thrust IC: Action Selection in Joint-Human-Robot Teams
ISP and Egress Path Selection for Multihomed Networks
The story of distributed constraint optimization in LA: Relaxed
Market-based Dynamic Task Allocation in Mobile Surveillance Systems
CS 416 Artificial Intelligence
Multidisciplinary Optimization
Presentation transcript:

Distributed Planning for Large Teams Prasanna Velagapudi Thesis Committee: Katia Sycara (co-chair) Paul Scerri (co-chair) J. Andrew Bagnell Edmund H. Durfee Distributed Planning for Large Teams

Outline Motivation Background Approach – SI-Dec-POMDP – DIMS Preliminary Work – DPP – D-TREMOR Proposed Work Conclusion Distributed Planning for Large Teams2

Motivation 100s to 1000s of robots, agents, people Complex, collaborative tasks Dynamic, uncertain environment Offline planning Distributed Planning for Large Teams3

Motivation Scaling planning to large teams is hard – Need to plan (with uncertainty) for each agent in team – Agents must consider the actions of a growing number of teammates – Full, joint problem has NEXP complexity [Bernstein 2002] Optimality is going to be infeasible Find and exploit structure in the problem Make good plans in reasonable amount of time 4Distributed Planning for Large Teams

Motivation Exploit three characteristics of these domains 1. Explicit Interactions Specific combinations of states and actions where effects depend on more than one agent 2. Sparsity of Interactions Many potential interactions could occur between agents Only a few will occur in any given solution 3. Distributed Computation Each agent has access to local computation A centralized algorithm has access to 1 unit of computation A distributed algorithm has access to N units of computation Distributed Planning for Large Teams5

Example: Interactions Distributed Planning for Large Teams6 Rescue robot Cleaner robot Debris Victim

Example: Sparsity Distributed Planning for Large Teams7

Related Work Distributed Planning for Large Teams8 Scalability Generality

Related Work Distributed Planning for Large Teams9 Scalability Generality Structured Dec-(PO)MDP planners – JESP [Nair 2003] – TD-Dec-POMDP [Witwicki 2010] – EDI-CR [Mostafa 2009] – SPIDER [Marecki 2009] Restrict generality slightly to get scalability High optimality

Related Work Distributed Planning for Large Teams10 Scalability Generality Heuristic Dec-(PO)MDP planners – TREMOR [Varakantham 2009] – OC-Dec-MDP [Beynier 2005] Sacrifice optimality for scalability High generality

Related Work Distributed Planning for Large Teams11 Scalability Generality Structured multiagent path planners – DPC [Bhattacharya 2010] – Optimal Decoupling [Van den Berg 2009] Sacrifice generality further to get scalability High optimality

Related Work Distributed Planning for Large Teams12 Scalability Generality Heuristic multiagent path planners – Dynamic Networks [Clark 2003] – Prioritized Planning [Van den Berg 2005] Sacrifice optimality to get scalability

Scalability Generality Related Work Distributed Planning for Large Teams13 Our approach: Fix high scalability and generality Explore what level of optimality is possible

Distributed, Iterative Planning Inspiration: – TREMOR [Varankantham 2009] – JESP [Nair 2003] Reduce the full joint problem into a set of smaller, independent sub-problems Solve independent sub- problems with local algorithm Modify sub-problems to push locally optimal solutions towards high-quality joint solution Distributed Planning for Large Teams14

Thesis Statement Agents in a large team with known sparse interactions can find computationally efficient high-quality solutions to planning problems through an iterative process of estimating the actions of teammates, locally planning based on these estimates, and refining their estimates by exchanging coordination messages. Distributed Planning for Large Teams15

Outline Motivation Background Approach – SI-Dec-POMDP – DIMS Preliminary Work – DPP – D-TREMOR Proposed Work Conclusion Distributed Planning for Large Teams16 Problem Formulation Proposed Algorithm

Problem Formulation POMDP Dec-POMDP Sparse-Interaction Dec-POMDP Distributed Planning for Large Teams17

Review: POMDP Distributed Planning for Large Teams18 : Set of States : Set of Actions : Set of Observations : Transition function : Reward function : Observation function

Review: Dec-POMDP Distributed Planning for Large Teams19 : Joint Transition : Joint Reward : Joint Observation

Dec-POMDP  SI-Dec-POMDP Distributed Planning for Large Teams20

Sparse Interaction Dec-POMDP Distributed Planning for Large Teams21 : : :

Proposed Approach: DIMS Distributed Iterative Model Shaping Reduce the full joint problem into a set of smaller, independent sub-problems (one for each agent) Solve independent sub-problems with existing state-of-the-art algorithms Modify sub-problems such that local optimum solution approaches high- quality joint solution Distributed Planning for Large Teams22 Task Allocation Local Planning Interaction Exchange Model Shaping

Proposed Approach: DIMS Distributed Iterative Model Shaping Distributed Planning for Large Teams23 Task Allocation Local Planning Interaction Exchange Model Shaping Assign tasks to agents Reduce search space considered by agent Define local sub-problem for each robot

Proposed Approach: DIMS Distributed Iterative Model Shaping Distributed Planning for Large Teams24 Task Allocation Local Planning Interaction Exchange Model Shaping Assign tasks to agents Reduce search space considered by agent Define local sub-problem for each robot Full SI-Dec-POMDP Local (Independent) POMDP

Proposed Approach: DIMS Distributed Iterative Model Shaping Distributed Planning for Large Teams25 Task Allocation Local Planning Interaction Exchange Model Shaping Solve local sub-problems using off-the- shelf centralized solver Result: Locally-optimal policy

Proposed Approach: DIMS Distributed Iterative Model Shaping Distributed Planning for Large Teams26 Task Allocation Local Planning Interaction Exchange Model Shaping Given local policy: estimate local probability and value of interactions Communicate local probability and value of relevant interactions to team members Sparsity  Relatively small # of messages

Proposed Approach: DIMS Distributed Iterative Model Shaping Distributed Planning for Large Teams27 Task Allocation Local Planning Interaction Exchange Model Shaping Modify local sub-problems to account for presence of interactions

Proposed Approach: DIMS Distributed Iterative Model Shaping Distributed Planning for Large Teams28 Task Allocation Local Planning Interaction Exchange Model Shaping Reallocate tasks or re-plan using modified local sub-problem

Any decentralized allocation mechanism (e.g. auctions) Stock graph, MDP, POMDP solver Lightweight local evaluation and low-bandwidth messaging Methods to alter local problem to incorporate non-local effects Proposed Approach: DIMS Distributed Iterative Model Shaping Distributed Planning for Large Teams29 Task Allocation Local Planning Interaction Exchange Model Shaping

Outline Motivation Background Approach – SI-Dec-POMDP – DIMS Preliminary Work – DPP – D-TREMOR Proposed Work Conclusion Distributed Planning for Large Teams30

Preliminary Results Distributed Prioritized Planning (DPP) Distributed Team REshaping of MOdels for Rapid execution (D-TREMOR) Distributed Planning for Large Teams31 P. Velagapudi, K. Sycara, and P. Scerri, “Decentralized prioritized planning in large multirobot teams,” IROS P. Velagapudi, P. Varakantham, K. Sycara, and P. Scerri, “Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents,” AAMAS 2011 (in submission).

Preliminary Results Distributed Prioritized Planning (DPP) Distributed Team REshaping of MOdels for Rapid execution (D-TREMOR) Distributed Planning for Large Teams32 No uncertainty Many potential interactions Simple interactions Action/Observation uncertainty Fewer potential interactions Complex interactions

Multiagent Path Planning 33Distributed Planning for Large Teams Start Goal

Multiagent Path Planning  SI-Dec-POMDP Only one interaction: collision Many potential collisions Few collisions in any solution Distributed Planning for Large Teams34

(Given) A* Path messages Prioritized configuration-time obstacles DIMS: Distributed Prioritized Planning Distributed Planning for Large Teams35 Task Allocation Local Planning Interaction Exchange Model Shaping

Assign priorities to agents based on path length Longer path length estimate  higher priority Distributed Planning for Large Teams36 [van den Berg, et al 2005] Prioritized Planning [van den Berg, et al 2005]

Sequentially plan from highest to lowest priority – Takes n steps for n agents Use previous agents as dynamic obstacles Distributed Planning for Large Teams37

Distributed Prioritized Planning 38Distributed Planning for Large Teams

Large-Scale Path Solutions 39Distributed Planning for Large Teams

Large-Scale Path Solutions 40Distributed Planning for Large Teams

Experimental Results Scaling Dataset – # robots varied: {40, 60, 80, 120, 160, 240} – Density of map constant: 8 cells per robot Density Dataset – # robots constant: 240 – Density of map varied: {32, 24, 16, 12, 8} cells per robot Cellular automata to generate 15 random maps Maps solved with centralized prioritized planning Distributed Planning for Large Teams41

High quality solutions Distributed Planning for Large Teams42 DPP Varying Team SizeVarying Density (240 agents)

Few sequential iterations Varying Team Size Varying Density (240 agents) Distributed Planning for Large Teams43

Summary of DPP DPP achieves high-quality solutions – Same quality as centralized PP Prioritization + sparsity = rapid convergence – Able to handle large numbers of collision interactions – Far fewer sequential planning iterations Distributed Planning for Large Teams44

Preliminary Results Distributed Prioritized Planning (DPP) Distributed Team REshaping of MOdels for Rapid execution (D-TREMOR) Distributed Planning for Large Teams45 No uncertainty Many potential interactions Simple interactions Action/Observation uncertainty Fewer potential interactions Complex interactions

A Simple Rescue Domain 46Distributed Planning for Large Teams Rescue Agent Cleaner Agent Narrow Corridor Victim Unsafe Cell Clearable Debris Clearable Debris

A Simple (Large) Rescue Domain 47Distributed Planning for Large Teams

Distributed POMDP with Coordination Locales [Varakantham, et al 2009] Subset of SI-Dec-POMDP: only modifies, Coordination locales (CLs) are subtypes of interactions: Distributed Planning for Large Teams48 Explicit time Explicit time constraint Implicitly construct interaction functions CL =

Decentralized auction EVA POMDP solver Policy sub-sampling and Coordination Locale (CL) messages Policy sub-sampling and Coordination Locale (CL) messages Prioritized/randomized reward and transition shaping Prioritized/randomized reward and transition shaping DIMS: D-TREMOR (extending [Varakantham, et al 2009]) Distributed Planning for Large Teams49 Task Allocation Local Planning Interaction Exchange Model Shaping

D-TREMOR: Task Allocation Assign “tasks” using decentralized auction – Greedy, nearest allocation Create local, independent sub-problem: Distributed Planning for Large Teams50

D-TREMOR: Local Planning Solve using off-the-shelf algorithm (EVA) Result: locally-optimal policies Distributed Planning for Large Teams51

D-TREMOR: Interaction Exchange Finding Pr CLi Evaluate local policy Compute frequency of associated s i, a i Distributed Planning for Large Teams52 [Kearns 2002] : Entered corridor in 95 of 100 runs: Pr CLi = 0.95

D-TREMOR: Interaction Exchange Finding Val CLi Sample local policy value with/without interactions – Test interactions independently Compute change in value if interaction occurred Distributed Planning for Large Teams53 No collision Collision Val CLi = -7 [Kearns 2002] :

D-TREMOR: Interaction Exchange Send CL messages to teammates: Sparsity  Relatively small # of messages Distributed Planning for Large Teams54

D-TREMOR: Model Shaping Shape local model rewards/transitions based on remote interactions Distributed Planning for Large Teams55 Probability of interaction Interaction model functions Independent model functions

D-TREMOR: Local Planning (again) Re-solve shaped local models to get new policies Result: new locally-optimal policies  new interactions Distributed Planning for Large Teams56

D-TREMOR: Adv. Model Shaping In practice, we run into three common issues faced by concurrent optimization algorithms: – Slow convergence – Oscillation – Local optima We can alter our model-shaping to mitigate these by reasoning about the types of interactions we have Distributed Planning for Large Teams57

D-TREMOR: Adv. Model Shaping Slow convergence  Prioritization – Majority of interactions are collisions – Assign priorities to agents, only model-shape collision interactions for higher priority agents – From DPP: prioritization can quickly resolve collision interactions – Similar properties for any purely negative interaction Negative interaction: when every agent is guaranteed to have a lower-valued local policy if an interaction occurs Distributed Planning for Large Teams58

D-TREMOR: Adv. Model Shaping Oscillation  Probabilistic shaping – Often caused by time dynamics between agents Agent 1 shapes based on Agent 2’s old policy Agent 2 shapes based on Agent 1’s old policy – Each agent only applies model-shaping with probability δ [Zhang 2005] – Breaks out of cycles between agent policies Distributed Planning for Large Teams59

D-TREMOR: Adv. Model Shaping Local Optima  Optimistic initialization – Agents cannot detect mixed interactions (e.g. debris) Rescue agent policies can only improve if debris is cleared Cleaner agent policies can only worsen if they clear debris Distributed Planning for Large Teams60 Pr CL = low, Val CL = low If (Val CL = low): optimal policy  do nothing Pr CL = low, Val CL = low

D-TREMOR: Adv. Model Shaping Local Optima  Optimistic initialization – Agents cannot detect mixed interactions (e.g. debris) Rescue agent policies can only improve if debris is cleared Cleaner agent policies can only worsen if they clear debris – Let each agent solve an initial model that uses an optimistic assumption of interaction condition Distributed Planning for Large Teams61

Preliminary Results Scaling Dataset Density Dataset Distributed Planning for Large Teams62

Experimental Setup D-TREMOR policies – Max-joint-value – Last iteration Comparison policies – Independent – Optimistic – Do-nothing – Random Scaling: – 10 to 100 agents – Random maps Density – 100 agents – Concentric ring maps 3 problems/condition 20 planning iterations 7 time step horizon 1 CPU per agent Distributed Planning for Large Teams63 D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs. (with some caveats) D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs. (with some caveats)

Preliminary Results: Scaling Distributed Planning for Large Teams64 Naïve Policies D-TREMOR Policies

Preliminary Results: Density Distributed Planning for Large Teams65 Do-nothing does the best? Ignoring interactions = poor performance

Preliminary Results: Density Distributed Planning for Large Teams66 D-TREMOR rescues the most victims D-TREMOR does not resolve every collision

Preliminary Results: Time Distributed Planning for Large Teams67 Why is this increasing?

Preliminary Results: Time Distributed Planning for Large Teams68 Increase in time related to # of CLs, not # of agents

Summary of D-TREMOR D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs. – Partially-observable, uncertain world – Multiple types of interactions & agents Improves over independent planning Resolved interactions in large problems Still some convergence/efficiency issues Distributed Planning for Large Teams69

Outline Motivation Background Approach – SI-Dec-POMDP – DIMS Preliminary Work – DPP – D-TREMOR Proposed Work Conclusion Distributed Planning for Large Teams70

Proposed Work Distributed Planning for Large Teams71

A* (Graph) A* (Graph) Policy evaluation & Prioritized exchange Prioritized shaping Prioritized shaping Given Proposed Work Distributed Planning for Large Teams72 Task Allocation Local Planning Interaction Exchange Model Shaping EVA (POMDP) EVA (POMDP) Auction Policy sub-sampling & Full exchange Stochastic shaping Stochastic shaping Optimistic initialization Optimistic initialization

Proposed Work Consolidate and generalize DIMS: 1. Interaction classification 2. Model-shaping heuristics 3. Domain evaluation – Search & Rescue – Humanitarian Convoy 73Distributed Planning for Large Teams D-TREMOR DIMS Framework DIMS Framework DPP

Interaction Classification What are the different classes of possible interactions between agents in DIMS? 74Distributed Planning for Large Teams ? ? Collisions (Reward Only) Collisions (Neg. Reward + Transition) Collisions (Neg. Reward + Transition) Debris Clearing (Transition + Delay) Debris Clearing (Transition + Delay) Model-shaping Terms:RewardTransitionObs. Collisions (DPP) x Collisions (D-TREMOR) xx Debris Clearing x Policy Effects:NegativePositiveMixed Collisions (DPP) x Collisions (D-TREMOR) x Debris Clearing x

Interaction Classification 1. Determine the sets of interactions that occur in the domains of interest 2. Formalize the characteristics of useful classes of interactions from this relevant set – Start with identifying differences between interactions in preliminary work: Collisions: Reward-only, same-time Collisions: Reward + Transition, same-time Debris-Clearing: Transition-only, different-time 3. Classify potential interactions by common features 75Distributed Planning for Large Teams

Model-shaping Heuristics Given classes of relevant interactions, what do we need to do to find good solutions? 76Distributed Planning for Large Teams Prioritized shaping Prioritized shaping Model Shaping Stochastic shaping Stochastic shaping Optimistic initialization Optimistic initialization ? ? Collisions (Reward Only) Collisions (Neg. Reward + Transition) Collisions (Neg. Reward + Transition) Debris Clearing (Transition + Delay) Debris Clearing (Transition + Delay) ? ?

Model-shaping Heuristics Explore which if, any, of the existing heuristics apply to each class of interaction Apply targeted heuristics for newly-identified classes of interactions Attempt to bound the performance of the heuristics for particular classes of interaction – e.g. Proved that prioritization converges for negative interactions 77Distributed Planning for Large Teams

Search and RescueHumanitarian Convoy Using our approach, how well can we do in realistic planning scenarios? Domain Evaluation Distributed Planning for Large Teams78

Domain Evaluation 79Distributed Planning for Large Teams Domain Search and RescueHumanitarian Convoy Model Simple ModelUSARSimSimple ModelVBS2 GraphDPP  MDP  POMDPD-TREMOR  DIMS

Proposed Work: Timeline DateDescription Nov 2010 – Feb 2011Develop classification of interactions Feb 2011 – Mar 2011Design heuristics for common interactions Mar 2011 – Jul 2011Implementation of DIMS solver Jul 2011 – Oct 2011Rescue experiments Oct 2011 – Jan 2012Convoy experiments Feb 2012 – May 2012Thesis preparation May 2012Defend Thesis Distributed Planning for Large Teams80

Outline Motivation Background Approach – SI-Dec-POMDP – DIMS Preliminary Work – DPP – D-TREMOR Proposed Work Conclusion Distributed Planning for Large Teams81

Conclusions (1/3): Work-to-date DPP: Distributed path planning for large teams D-TREMOR: Decentralized planning for sparse Dec-POMDPs with many agents Demonstrated complete distributability, fast heuristic interaction detection, and local message exchange to achieve high scalability Empirical results in simulated search and rescue domain Distributed Planning for Large Teams82

Conclusions (2/3): Contributions 1. DIMS: a modular algorithm for solving planning problems in large teams with sparse interactions – Single framework, applied to path planning, MDP, POMDP 2. Empirical results of distributed planning using DIMS in teams of at least 100 agents across two domains 3. Study of characteristics of interaction in sparse planning problems – Provide classification of interactions – Determine features for distinguishing interaction behaviors Distributed Planning for Large Teams83

Conclusions (3/3): Take-home Message This thesis will demonstrate that it is possible to efficiently, distributedly find high-quality solutions to planning problems with known sparse interactions with and without uncertainty for teams of at least a hundred agents. Distributed Planning for Large Teams84

Distributed Planning for Large Teams85

The VECNA Bear (Yes, it exists!) Distributed Planning for Large Teams86

SI-Dec-POMDP vs. other models DPCL – Extends DPCL model – Adds observational interactions – Time integrated in state rather than explicit EDI/EDI-CR – Adds complex transitions and observations TD-Dec-MDP – Allows simultaneous interaction (within epoch) Factored MDP/POMDP – Adds interactions that span epochs Distributed Planning for Large Teams87

Assign tasks to agents Create local sub-problems Assign tasks to agents Create local sub-problems Use local solver to find optimal solution to sub-problem Compute and exchange probability and expected value of interactions Alter local sub-problem to incorporate non-local effects Proposed Approach: DIMS Distributed Iterative Model Shaping Distributed Planning for Large Teams88 Task Allocation Local Planning Interaction Exchange Model Shaping

Motivation Distributed Planning for Large Teams89

D-TREMOR Distributed Planning for Large Teams90

D-TREMOR Distributed Planning for Large Teams91

D-TREMOR: Reward functions Probability that a debris will not allow a robot to enter the cell: – P_Debris = 0.9; Probability of action failure – P_ActionFailure = 0.2; Probability that success is observed if the action succeeded. – P_ObsSuccessOnSuccess = 0.8; Probability that success is observed if the action failed – P_ObsSuccessOnFailure = 0.2; Probability that a robot will return to the same cell after collision – P_ReboundAfterCollision = 0.5; Reward of saving a victim – R_Victim = 10.0; Reward of cleaning debris – R_Cleaning = 0.25; Reward of moving – R_Move = -0.5; Reward of observing – R_Observe = -0.25; Reward for a collision – R_Collision = -5.0; Reward for landing in an unsafe cell – R_Unsafe = -1; Distributed Planning for Large Teams92