Based on slides by Nicholas Roy, MIT Finding Approximate POMDP Solutions through Belief Compression
Reliable Navigation Conventional trajectories may not be robust to localisation error Estimated robot position Robot position distribution True robot position Goal position
Perception and Control PerceptionControl World state Control algorithms
Perception and Control Assumed full observability Exact POMDP planning Probabilistic Perception Model P(x) argmax P(x)Control World state Probabilistic Perception Model P(x)Control Brittle Intractable
Perception and Control Assume full observability Exact POMDP planning Brittle World state Probabilistic Perception Model P(x)Compressed P(x)Control Intractable
Main Insight World state Probabilistic Perception Model P(x)Low-dimensional P(x)Control Good policies for real world POMDPs can be found by planning over low-dimensional representations of the belief space.
but not usually. The controller may be globally uncertain... Belief Space Structure
Coastal Navigation Represent beliefs using Discretise into low-dimensional belief space MDP
Coastal Navigation
A Hard Navigation Problem Distance in M Average Distance to Goal
Dimensionality Reduction Principal Components Analysis Original Beliefs Weights Characteristic Beliefs
Principal Components Analysis Given belief b n, we want b m, m«n. Collection of beliefs drawn from 200 state problem Probability of being in state State ~
One sample distribution m=9 gives this representation for one sample distribution Principal Components Analysis Given belief b n, we want b m, m«n. Probability of being in state State ~
Principal Components Analysis Many real world POMDP distributions are characterised by large regions of low probability. Idea: Create fitting criterion that is (exponentially) stronger in low-probability regions (E-PCA)
1 basis2 bases3 bases4 bases Example EPCA State Probability of being in state
Example Reduction
E-PCA will indicate appropriate number of bases, depending on beliefs encountered Finding Dimensionality
Planning S1S1 S2S2 S3S3 Original POMDP Low-dimensional belief space B E-PCA Discrete belief space MDP Discretise ~
Model Parameters Reward function R(b) s1s1 s2s2 s3s3 p(s) Back-project to high dimensional belief Compute expected reward from belief: ~ ~
Model Parameters Low dimension Full dimension ~ 1. For each belief b i and action a bibi ~ 3.Propagate according to action bjbj 4.Propagate according to observation bjbj ~ ~ 5. Recover b j 6. Set T(b i, a, b j ) to probability of observation ~~ bibi ~ 2. Recover full belief b i
Robot Navigation Example True (hidden) robot position Goal position Goal state Initial Distribution
Robot Navigation Example True robot position Goal position
Policy Comparison Average Distance to Goal Distance in M 6 bases
People Finding
People Finding as a POMDP Fully Observable Robot Position of person unknown Robot position True person position
Finding and Tracking People Robot position True person position
People Finding as a POMDP Factored belief space 2 dimensions: fully-observable robot position 6 dimensions: distribution over person positions Regular grid gives ≈ states
Variable Resolution Non-regular grid using samples b1b1 b2b2 b3b3 b4b4 b5b5 T(b 1, a 1, b 2 ) T(b 1, a 2, b 5 ) Compute model parameters using nearest-neighbour ~~ ~ ~ ~ ~ ~ ~ ~
Refining the Grid V(b 1 ) ~ V(b' 1 ) ~ Sample beliefs according to policy b1b1 ~ b'b' ~ Construct new model ~ ~ Keep new belief if V(b'1) > V(b1)
The Optimal Policy Original distribution Reconstruction using EPCA and 6 bases Robot position True person position
E-PCA Policy Comparison Average time to find person Average # of Actions to find Person E-PCA: 72 states Refined E-PCA: 260 states Fully observable MDP
Nick’s Thesis Contributions Good policies for real world POMDPs can be found by planning over a low-dimensional representation of the belief space, using E-PCA. POMDPs can scale to bigger, more complicated real-world problems. POMDPs can be used for real deployed robots.