Hierarchical mission control of automata with human supervision Prof. David A. Castañon Boston University
Problem of Interest Coordination of heterogeneous teams to accomplish tasks in uncertain, risky environments -Vehicles with different capabilities, resources -Some resources are renewable (sensors), others are not -Tasks are spatially distributed, require combinations of capabilities -Successful completion of tasks not guaranteed -Likelihood of success depends on resources assigned -Tasks arrive, depart randomly -Task types may be unknown until observed -Vehicles may fail randomly, depending on trajectories Key aspect: Real-time adaptation to events Human Supervision -Determine task priority/value -Modify individual vehicle task assignments when desired -Determine specific vehicle schedules when desired
Problem Illustration
Experiment model Multiple robots search for and perform tasks at BU’s Mechatronics Lab
Why is this a hard problem Uncertain environment and dynamics -Unknown targets -Uncertain effectiveness of sensing, actions Requires highly adaptive system, anticipative of and responsive to new information Hedge against loss of assets, new arrivals, action failures, … Diverse set of vehicles with multiple capabilities -Dynamic role selection, ad hoc teaming Dual control problems: Manage both information acquisition and action -Trade off search and sensing versus actions -Dynamic coupling of available capabilities to achieve desired effects Support and adapt to human control inputs -Goals, constraints, fixed decisions -Provide information to assess effects of changes
Classes of algorithms Operations Research -Deterministic and stochastic multi-vehicle task assignment and scheduling -Large vehicles, small tasks, limited cooperation, homogeneous activities -No risk, limited uncertainty to new task arrivals, departures independent of vehicle actions -Search theory and sensor management -Large-scale resource allocation and integer programming Stochastic Control -Control of stochastic queuing systems in communications -Single vehicle routing and low level vehicle trajectory control -Swarm control approaches with stability and performance guarantees -Homogeneous vehicles -Approximate dynamic programming techniques -Not focused on combinatorial optimization in general, rare exceptions -Model predictive control of complex stochastic systems Artificial Intelligence/Computer Science -Constraint satisfaction, temporal planning systems -Non-real time, off-line combinatorial constraint-based search -Limited incorporation of risk/reward, information dynamics -Behavioral control in robotics for simple tasks -Reinforcement learning for stochastic planning in well-defined repeated environments (e.g. games)
Proposed Approach: Hierarchical Model Predictive Control Hierarchical approach: avoid combinatorial explosion of complexity through decomposition Team strategy selection: address uncertainty -Allocate team capabilities to tasks, hedging against task type uncertainty, new task arrivals, action success probabilities -Simplify distribution of resources across vehicles Team activity scheduling: address combinatorial complexity -Allocate team activities to platforms -Select schedules and routes Model Predictive Control: resolve algorithms in response to new information or human directives -Receding horizon control -Respond to new tasks, changes in task status, platform loss, …. -Adapt to human guidance and constraints Requires fast algorithms for real-time control
Team Strategy Selection Stochastic dynamic programming formulation -Multistage formulation, with outcomes observed after each stage Resources Stage 1Stage 2Stage 3 Task 1 Task N Task 1 Task N Task 1 Task N Type 1 Type 2 Type 3 Type 4 Task N+1 Task N+M
Notation N tasks i = 1, …, N M resource types j = 1, …, M Assume independence of all task completion events
Example: Two-Stage Single Resource Problem Define a task completion state after each stage -Task completion state observed after each stage Decisions are now feedback policies Task completion state dynamics: Controlled Markov chain -Resources assigned determine transition probabilities -Independence of completion event outcomes decouples transition dynamics across tasks
Two-Stage Problem Statement Objective: minimize expected uncompleted task value plus expected resource use costs Constraints: Resource limits
Relaxed Two-Stage Problem Original problem is stochastic integer program -P-space complete, hard Expand set of admissible feedback strategies in second stage -Generates lower bound to optimal value function -New constraint on average number of resources -Relaxes exponential number of constraints to a single constraint -Simple result: All feasible strategies in original problem are feasible in current problem -Lower bound on original performance -Idea: select optimal strategies for lower bound
Characterization of Optimal Strategies Important concept: Mixed local strategies -Local strategies: feedback strategies such that the actions on a given task depend only on the state of that task -Mixed strategy: random combination of pure strategies -Mixed strategies may achieve better performance than pure strategies in relaxed problem Theorem: In relaxed problem, for every pure strategy, there is a mixed local strategy which uses same resources and achieves same expected performance -Proven by construction -Restricts search to local mixed strategies -Fast algorithm for solution of optimal strategies using convex optimization principles! -Can solve exactly in Complexity O((M 1 +N)log(N))
Comments and Extensions MPC approach guarantees feasibility of approximate problem solution in terms of original problem -Obtain approximate solution, but implement only first stage allocations -Resolve problem when new observations are available, with receding horizon -Fast algorithm allows for rapid computation Main extensions: -Multiple stages -Multiple resource types -Multiple renewable and non-renewable resources -Solution NP-hard, but can solve approximately -Multiple task types: sensing and action -Must sense to observe outcomes -New task arrivals, discovered by searching -Unknown task types: Detect presence, but must observe to determine task type -Task departures, deadlines
Team Activity Scheduling Inputs from team strategy selection -Desired resources assigned to each task in current period -Desired resources held in reserve when future information is collected Guidance and constraints from human operators -Task values, select platform task assignments, select task resource assignments Known parameters -Vehicle locations and resources in each vehicle, task locations Problem: assign resource deliveries for tasks to individual vehicles, and select sequence of activities for vehicle -Deterministic multi-vehicle routing problem (VRP) -NP-hard, with many useful approximate approaches available
Team Activity Assignment Formulation Problem Formulation Visit Customers Subject to: N vehicles to route Integrality VRP is an NP-hard problem (traveling salesman) wrapped in an NP-hard problem (bin packing). Classical Application: Truck Routing where Discounted Cost
Team Activity Assignment Algorithm Candidate algorithm: Tabu Search -Locally perturbs trial solutions -Uses “Tabu” list to avoid local minima -Evaluated by AFIT for UAV routing -Fast replanning, leads to rapid response to events -Handles time window constraints instead of precedence constraint Significant extensions to date -Multiple task types -Multiple resource types -Compound tasks involving multiple vehicles Alternative algorithms (AFOSR-sponsored) -Mixed Integer-Linear Programming, J. How, MIT -Receding horizon controller, C. Cassandras, BU
Comments Algorithms available for dynamic control of automata performing tasks in uncertain, risky environments -Fast generation of desired courses of action -Hedge against uncertain outcomes, adapt to new information Operator interaction through value structure, plus fixed decision variables and constraints -Allows for “micro”-management -Very limited insight into effects of operator inputs on automata behavior and performance Fundamental problem for this MURI research: prediction of course of action in the presence of uncertainty -Not a single plan, but a contingency tree of possible actions/responses -Hard to modify, approve
Experimental Platform for Research Multiple robots search for and perform tasks at BU’s Mechatronics Lab -Can provide operator control of some platforms: human-automata teams -Control information displayed, risk to each operator using video
Future Activities Implement research experiments involving tasks with performance uncertainty in test facility -Vary tempo, size, uncertainty, information Develop algorithms to interact with operators in alternative roles -Supervisory control -Team partners Extend existing algorithms to different classes of tasks -Area search, task discovery, risk to platforms Develop algorithms to assist operators in predicting behavior of automata teams in uncertain environments Collaborate with MURI team to design and analyze experiments involving alternative structures for human-automata teams