Download presentation
Presentation is loading. Please wait.
Published byIrene Watson Modified over 9 years ago
2
Classical Situation hellheaven World deterministic State observable
3
MDP-Style Planning hellheaven World stochastic State observable [Koditschek 87, Barto et al. 89] Policy Universal Plan Navigation function
4
Stochastic, Partially Observable sign hell?heaven? [Sondik 72] [Littman/Cassandra/Kaelbling 97]
5
Stochastic, Partially Observable sign hellheaven sign heavenhell
6
Stochastic, Partially Observable sign heavenhell sign ?? hellheaven start 50%
7
Robot Planning Frameworks Classical AI/robot planning State/actionsdiscrete & continuous Stateobservable Environmentdeterministic PlansSequences of actions CompletenessYes OptimalityRarely State space size Huge, often continuous, 6 dimensions Computationa l Complexity varies
8
MDP-Style Planning hellheaven World stochastic State observable [Koditschek 87, Barto et al. 89] Policy Universal Plan Navigation function
9
Markov Decision Process (discrete) s2s2 s3s3 s4s4 s5s5 s1s1 0.7 0.3 0.9 0.1 0.3 0.4 0.99 0.1 0.2 0.8 r= 10 r= 0 r=0 r=1 r=0 [Bellman 57] [Howard 60] [Sutton/Barto 98]
10
Value Iteration Value function of policy Bellman equation for optimal value function Value iteration: recursively estimating value function Greedy policy: [Bellman 57] [Howard 60] [Sutton/Barto 98]
11
Value Iteration for Motion Planning (assumes knowledge of robot’s location)
12
Continuous Environments From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995
13
Approximate Cell Decomposition [Latombe 91] From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995
14
Parti-Game [Moore 96] From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995
15
Robot Planning Frameworks Classical AI/robot planning Value Iteration in MDPs Parti-Game State/actionsdiscrete & continuous discretecontinuous Stateobservable Environmentdeterministicstochastic PlansSequences of actions policy CompletenessYes OptimalityRarelyYesNo State space size Huge, often continuous, 6 dimensions millionsn/a Computationa l Complexity variesquadraticn/a
16
Stochastic, Partially Observable sign ?? start sign heavenhell sign hellheaven 50% sign ?? start
17
A Quiz -dim continuous* stochastic 1-dim continuous stochastic actions# statessize belief space?sensors 3: s 1, s 2, s 3 deterministic3perfect 3: s 1, s 2, s 3 stochastic3perfect 2 3 -1: s 1, s 2, s 3, s 12, s 13, s 23, s 123 deterministic3 abstract states deterministic3stochastic 2-dim continuous*: p ( S=s 1 ), p ( S=s 2 ) stochastic3none 2-dim continuous*: p ( S=s 1 ), p ( S=s 2 ) *) countable, but for all practical purposes -dim continuous* deterministic 1-dim continuous stochastic aargh!stochastic -dim continuous stochastic
18
Introduction to POMDPs (1 of 3) 80 100 ba 0 ba 40 s2s2 s1s1 action a action b p(s1)p(s1) [Sondik 72, Littman, Kaelbling, Cassandra ‘97] s2s2 s1s1 100 0 100 action aaction b
19
Introduction to POMDPs (2 of 3) 80 100 ba 0 ba 40 s2s2 s1s1 80% c 20% p(s1)p(s1) s2s2 s1’s1’ s1s1 s2’s2’ p(s 1 ’) p(s1)p(s1) s2s2 s1s1 100 0 100 [Sondik 72, Littman, Kaelbling, Cassandra ‘97]
20
Introduction to POMDPs (3 of 3) 80 100 ba 0 ba 40 s2s2 s1s1 80% c 20% p(s1)p(s1) s2s2 s1s1 100 0 100 p(s1)p(s1) s2s2 s1s1 s1s1 s2s2 p(s 1 ’|A) B A 50% 30% 70% B A p(s 1 ’|B) [Sondik 72, Littman, Kaelbling, Cassandra ‘97]
21
Value Iteration in POMDPs Value function of policy Bellman equation for optimal value function Value iteration: recursively estimating value function Greedy policy: Substitute b for s
22
Missing Terms: Belief Space Expected reward: Next state density: Bayes filters! (Dirac distribution)
23
Value Iteration in Belief Space.... next belief state b’ observation o.... belief state b max Q(b’, a) next state s’, reward r’state s Q(b, a) value function
24
Why is This So Complex? State Space Planning (no state uncertainty) Belief Space Planning (full state uncertainties) ?
25
Augmented MDPs: [Roy et al, 98/99] conventional state space uncertainty (entropy)
26
Path Planning with Augmented MDPs information gainConventional plannerProbabilistic Planner [Roy et al, 98/99]
27
Robot Planning Frameworks Classical AI/robot planning Value Iteration in MDPs Parti-GamePOMDPAugmented MDP State/actionsdiscrete & continuous discretecontinuousdiscrete Stateobservable partially observable Environmentdeterministicstochastic PlansSequences of actions policy CompletenessYes No OptimalityRarelyYesNoYesNo State space size Huge, often continuous, 6 dimensions millionsn/adozensthousands Computationa l Complexity variesquadraticn/aexponentialO(N 4 )
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.