Planning to Gather Information Richard Dearden University of Birmingham Joint work with Moritz Göbelbecker (ALU), Charles Gretton, Bramley Merton (NOC), Zeyn Saigol, Mohan Sridharan (Texas Tech), Jeremy Wyatt
Underwater Vent Finding AUV used to find vents Can detect vent itself (reliably), plume of fresh water emitted Problem is where to go to collect data to find the vents as efficiently as possible Hard because plume detection is unreliable, can’t easily assign ‘blame’ for the detections we do make
Vision Algorithm Planning Goal: Answer queries and execute commands. Is there a red triangle in the scene? Move the mug to the right of the blue circle. Our operators: colour, shape, SIFT identification, viewpoint change, zoom etc. Problem: Build a plan to achieve the goal with high confidence
Assumptions The visual operators are unreliable Reliability can be represented by a confusion matrix, computed from data Speed of response and answering the query correctly are what really matters We want to build the fastest plan that is ‘reliable enough’ We should include planning time in our performance estimate too Observed Actual Square Circle Triangle 0.85 0.1 0.05 0.80 Don’t forget that the confusion matrices are learned from data, colour is much more reliable than shape, SIFT.
POMDPs Partially Observable Markov Decision Problems (discrete) States, stochastic actions, reward Maximise expected (discounted) long-term reward Assumption: state is completely observable POMDPs: MDPs with observations Infer state from (sequence of) observations Typically maintain belief state, plan over that $
POMDP Formulation States: Cartesian product of individual state vectors Actions: A = {Colour, Shape, SIFT, terminal actions} Observations: {red, green, blue, circle, triangle, square, empty, unknown} Transition function Observation function given by confusion matrices Reward specification time cost of actions, large +ve/-ve rewards on terminal actions Maintain belief over states, likelihood of action outcomes 6
POMDP Formulation For a broad query: ‘what is that?’ For each ROI: 26 states (5 colours x 5 shapes + term) 12 actions (2 operations, 10 terminal actions SayBlueSquare, SayRedTriangle, SayUnknown, …) 8 observations For n ROIs: 25n + 1 states Impractical for even a very small number of ROIs BUT: There’s lots of structure. How to exploit it?
A Hierarchical POMDP Proposed solution: Hierarchical Planning in POMDPs – HiPPo One LL-POMDP for planning the actions in each ROI Higher-level POMDP to choose which LL-POMDP to use at each step Significantly reduces complexity of the state-action-observation space Which Region to Process? HL POMDP Model creation and policy generation are automatic, based on the input query How to Process? LL POMDP
Low-level POMDP The LL-POMDP is the same as the flat POMDP Only ever operates on a single ROI 26 states, 12 Actions Reward combines time-based cost for actions and answer quality Terminal actions are answering the query for this region
Example Query: ‘where is the blue circle?’ State space: Actions: {RedCircle, RedTriangle, BlueCircle, BlueTriangle, …, Terminal} Actions: {Colour, Shape, …, SayFound, …} Observations: {Red, Blue, NoColour, UnknownColour, Triangle, Circle, NoShape, UnknownShape, …} Observation probabilities given by confusion matrix
Policy Policy tree for uniform prior initial state We limit all LL policies to a fixed maximum number of steps Colour B R Shape sNotFound C T Shape Shape T C C T . sFound sNotFound sFound
High-level POMDP State space consists of the regions the object of interest is in Actions are regions to process Observations are whether the object of interest was found in a particular region We derive the observation function and action costs for the HL-POMDP from the policy tree for the LL-POMDP Treat the LL-POMDP as a black box that returns definite labels (not belief densities)
Example Query: ‘where is the blue circle?’ State space: Actions: {DoR1, DoR2, SayR1, SayR2, SayR1^R2, SayNo} Observations: {FoundR1, ¬FoundR1, FoundR2, ¬FoundR2} Observation probabilities are computed from the LL-POMDP
Results (very briefly) Approach Reliability (%) No Planning 76.67 CP Hier-P 91.67
Vent Finding Approach Assume mapping using occupancy grid Rewards only for visiting cells with vents in State space also too large to solve POMDP Instead do fixed length lookahead in belief space Reasoning in belief space allows us to account for value of information gained from observations Use P(vent|all observations so far) as heuristic value at end of lookahead
What we’re working on now Most of these POMDPs are too big to solve Take a domain, problem description in a very general language, generate a classical planning problem for it Assume we can observe any variable we care about For each such observation, use a POMDP planner to determine the value of the variable with high confidence