Download presentation
Presentation is loading. Please wait.
Published byAnnabel Joy Parks Modified over 9 years ago
1
Representing hierarchical POMDPs as DBNs for multi-scale robot localization G. Thocharous, K. Murphy, L. Kaelbling Presented by: Hannaneh Hajishirzi
2
Outline Define H-HMM –Flattening H-HMM Define H-POMDP –Flattening H-POMDP Approximate H-POMDP with DBN Inference and Learning in H-POMDP
3
Introduction H-POMDPs represent state-space at multiple levels of abstraction –Scale much better to larger environments –Simplify planning Abstract states are more deterministic –Simplify learning Number of free parameters is reduced
4
Hierarchical HMMs A generalization of HMM to model hierarchical structure domains –Application: NLP Concrete states: emit single observation Abstract states: emit strings of observations Emitted strings by abstract states are governed by sub-HMMs
5
Example HHMM representing a(xy) + b | c(xy) + d When the sub-HHMM is finished, control is returned to wherever it was called from
6
HHMM to HMM Create a state for every leaf in HHMM
7
HHMM to HMM Create a state for every leaf in HHMM Flat transition probability = Sum( P( all paths in HHMM)) Disadvantages: Flattening loses modularity Learning requires more samples
8
Representing HHMMs as DBNs : state at level d if HMM at level d finished
9
H-POMDPs HHMMs with inputs and reward function Problems: –Planning: Find mapping from belief states to actions –Filtering: Compute the belief state online –Smoothing: Compute offline –Learning: Find MLE of model parameters
10
H-POMDP for Robot Navigation Flat model Hierarchical model 4 * Abstract state: X t 1 (1..4) * Concrete state: X t 2 (1..3) * Observation: Y t (4 bits) * Robot position: X t (1..10) In this paper, Ignore the problem of how to choose the actions
11
State Transition Diagram for 2-H-POMDP Sample path:
12
State Transition Diagram for Corridor Environment Abstract States Entry States Exit States Concrete States
13
Flattening H-POMDPs Advantages of H-POMDP over corresponding POMDP: –Learning is easier: Learn sub-models –Planning is easier: Reason in terms of “macro” actions
14
STATE POMDP FACTORED DBN POMDP 0.08 0.01 0.7 0.05 0.08 0.01 Dynamic Bayesian Networks # of parameters
15
STATE H-POMDP FACTORED DBN H-POMDP EASTWESTEAST WEST EAST WEST EAST Representing H-POMDPs as DBNs
16
STATE H-POMDP FACTORED DBN H-POMDP EASTWESTEAST WEST EAST WEST EAST Representing H-POMDPs as DBNs
17
STATE H-POMDP FACTORED DBN H-POMDP EASTWESTEAST WEST EAST WEST EAST Representing H-POMDPs as DBNs
18
STATE H-POMDP FACTORED DBN H-POMDP EASTWESTEAST WEST EAST WEST EAST Representing H-POMDPs as DBNs
19
STATE H-POMDP FACTORED DBN H-POMDP EASTWESTEAST WEST EAST WEST EAST Representing H-POMDPs as DBNs
20
H-POMDPs as DBNs : Abstract location : Orientation: Concrete location : Exit node (5 values) : Observation : Action node Representing no-exit, s, n, l, r -exit
21
Transition Model If e = no-exit otherwise Abstract horizontal transition matrix
22
Transition Model If e = no-exit otherwise If e = no-exit otherwise Concrete vertical entry vector Concrete horizontal transition matrix Probability of entering exit state e
23
Observation Model Probability of seeing a wall or opening on each of 4 sides of the robot Naïve Bayes assumption: where Map global coordinate frame to robot’s local coordinate frame Then, Learn the appearance of the cell in all directions
24
Example
25
Inference Online filtering: –Input of controller: MLE of the abstract and concrete states Offline smoothing: –O(DK 1.5D T) D: # of dimensions K: # of states in each level –1.5D: size of largest clique in DBN = The state nodes at t-1 + half of the state nodes at t –Approximation (belief propagation): O(DKT)
26
Learning Maximum likelihood parameter estimate using EM In E step, compute: In M step, compute normalizing matrix of expected counts :
27
Learning (Cont.) Concrete horizontal transition matrix: Exit probabilities: Vertical transition vector:
28
Estimating Observation Model Map local observations into world- centered Probability of observing y, facing North
29
Hierarchical Localizes better Factored DBN H-POMDP H-POMDP STATE POMDP Before training
30
Conclusions Represent H-POMDPs with DBNs –Learn large models with less data Difference with SLAM: –SLAM is harder to generalize
31
Complexity of Inference STATE H-POMDP FACTORED DBN H-POMDP EASTWESTEAST WEST EAST WEST EAST Number of states:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.