Download presentation
Presentation is loading. Please wait.
Published byJoseph Nash Modified over 6 years ago
1
Planning under Uncertainty Or why POMDPs are a roboticist’s best friend
Joelle Pineau Robotics Institute Carnegie Mellon University Dagstuhl Workshop on Plan-based Control of Robotic Agents June, 2003
2
Structured Environments
Planning under Uncertainty
3
Unstructured Environments
Planning under Uncertainty
4
Highlights from the household robot challenge
“… accomplish jobs in changing and partly unknown environments; resolve conflicts between interfering jobs …” “… switch between different representations of the world with low effort and transfer reasoning results accordingly.” “… generate a plan that has a high expected utility with respect to the robots’ goals and beliefs.” “… percepts might be incomplete, ambiguous, outdated, incorrect.” “… proper modeling and reasoning about movements.” “use crude models of the physics of the system...” Planning under Uncertainty
5
The Nursebot project in its early days
Planning under Uncertainty
6
Talk Outline Why uncertainty matters in plan-based robotics
Partially Observable Markov Decision Processes Planning in large POMDPs Experiments with POMDP-controlled robots Planning under Uncertainty
7
Types of uncertainty in robotics
1. Non-deterministic effects of actions 2. Partial and noisy sensor information 3. Inaccurate model of the environment Planning under Uncertainty
8
Non-deterministic motion commands
Estimated robot position True robot position Goal position Planning under Uncertainty
9
Noisy speech recognition
Planning under Uncertainty
10
Inaccurate model of person
Robot position True person position Planning under Uncertainty
11
Why use a POMDP? POMDPs provide a rich framework for sequential decision-making, which can model: varying rewards across actions and goals actions with random effects uncertainty in the state of the world Planning under Uncertainty
12
Existing applications of POMDPs
Maintenance scheduling [Puterman, 1994] Robot navigation [Koenig & Simmons, 1995; Roy & Thrun, 1999] Helicopter control [Bagnell & Schneider, 2001; Ng et al., 2002] Dialogue modeling [Roy, Pineau & Thrun, 2000; Peak&Horvitz, 2000] Preference elicitation [Boutilier, 2002] Planning under Uncertainty
13
POMDP Model st-1 st ot-1 ot POMDP is n-tuple { S, A, , T, O, R }:
S = state set A = action set = observation set What goes on: st-1 st What we see: ot-1 ot at-1 at Planning under Uncertainty
14
POMDP Model st-1 st ot-1 ot POMDP is n-tuple { S, A, , T, O, R }:
S = state set A = action set = observation set T(s,a,s’) = state-to-state transition probabilities What goes on: st-1 st What we see: ot-1 ot at-1 at Planning under Uncertainty
15
POMDP Model st-1 st ot-1 ot POMDP is n-tuple { S, A, , T, O, R }:
S = state set A = action set = observation set T(s,a,s’) = state-to-state transition probabilities O(s,a,o) = observation generation probabilities What goes on: st-1 st What we see: ot-1 ot at-1 at Planning under Uncertainty
16
POMDP Model st-1 st rt-1 rt ot-1 ot
POMDP is n-tuple { S, A, , T, O, R }: S = state set A = action set = observation set T(s,a,s’) = state-to-state transition probabilities O(s,a,o) = observation generation probabilities R(s,a) = reward function What goes on: st-1 st rt-1 rt What we see: ot-1 ot at-1 at Planning under Uncertainty
17
POMDP Model st-1 st rt-1 rt ot-1 ot bt-1 bt
POMDP is n-tuple { S, A, , T, O, R }: S = state set A = action set = observation set T(s,a,s’) = state-to-state transition probabilities O(s,a,o) = observation generation probabilities R(s,a) = reward function What goes on: st-1 st rt-1 rt What we see: ot-1 ot at-1 at What we infer: bt-1 bt Planning under Uncertainty
18
Plausible robot beliefs
Belief over robot location P(s) state Planning under Uncertainty
19
Plausible robot beliefs
Belief over robot location P(s) state Belief over person location Planning under Uncertainty
20
Understanding the belief state
A belief is a probability distribution over states Where Dim(B) = |S|-1 E.g. Let S={s1, s2} 1 P(s1) Planning under Uncertainty
21
Understanding the belief state
A belief is a probability distribution over states Where Dim(B) = |S|-1 E.g. Let S={s1, s2, s3} 1 P(s1) P(s2) 1 Planning under Uncertainty
22
Understanding the belief state
A belief is a probability distribution over states Where Dim(B) = |S|-1 E.g. Let S={s1, s2, s3 , s4} 1 P(s3) P(s1) P(s2) 1 Planning under Uncertainty
23
The first curse of POMDP planning
The curse of dimensionality: dimension of planning problem = # of states Planning under Uncertainty
24
POMDP planning Goal: Pick actions such that we maximize the expected sum of rewards: To select appropriate actions, learn an action-selection policy: To learn the policy, first learn the value function: Planning under Uncertainty
25
POMDP value function Represent V as the upper surface of a set of hyper-planes. V is piecewise-linear convex Backup operator T: V TV V(b) b P(s1) 2-states Planning under Uncertainty
26
POMDP value function Represent V as the upper surface of a set of hyper-planes. V is piecewise-linear convex Backup operator T: V TV V(b) V(b) b P(s2) P(s1) b P(s1) 2-states 3-states Planning under Uncertainty
27
Exact value iteration for POMDPs
Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes V0(b) P(s1) b Planning under Uncertainty
28
Exact value iteration for POMDPs
Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes V1(b) P(s1) b Planning under Uncertainty
29
Exact value iteration for POMDPs
Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes 2 27 V2(b) P(s1) b Planning under Uncertainty
30
Exact value iteration for POMDPs
Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes 2 27 V2(b) P(s1) b Planning under Uncertainty
31
Exact value iteration for POMDPs
Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes 2 27 ,348,907 V2(b) P(s1) b Planning under Uncertainty
32
Exact value iteration for POMDPs
Simple problem: |S|=2, |A|=3, ||=2 Many hyper-planes can be pruned away Iteration # hyper-planes 0 1 1 3 2 5 3 9 4 7 V2(b) P(s1) b Planning under Uncertainty
33
Iteration # hyper-planes
Is pruning sufficient? Larger problem: |S|=20, |A|=6, ||=8 Iteration # hyper-planes 0 1 1 5 2 213 3 ??? Not for this problem! Goal [Hauskrecht, 2000] Planning under Uncertainty
34
Certainly not for this problem!
104 (navigation) x 103 (dialogue) states 1000+ observations 100+ actions Planning under Uncertainty
35
The second curse of POMDP planning
The curse of dimensionality: the dimension of each hyper-plane = # of states The curse of history: the number of hyper-planes grows exponentially with the planning horizon Planning under Uncertainty
36
The second curse of POMDP planning
The curse of dimensionality: the dimension of each hyper-plane = # of states The curse of history: the number of hyper-planes grows exponentially with the planning horizon dimensionality history Complexity of POMDP value iteration: What is wrong with this??? Planning under Uncertainty
37
Exact planning assumes all beliefs are equally likely
Planning under Uncertainty
38
Possible approximation approaches
1. Ignore the belief [Littman et al., 1995] 3. Compress the belief [Poupart&Boutilier, 2002; Roy&Gordon, 2002] 2. Discretize the belief [Lovejoy, 1991; Brafman 1997; Hauskrecht, 2000; Zhou&Hansen, 2001] 4. Plan for trajectories [Baxter&Bartlett, 2000; Ng&Jordan, 2002] s1 s0 s2 Planning under Uncertainty
39
A new algorithm: Point-based value iteration
Main idea: Select a small set of belief points V(b) P(s1) b1 b0 b2 Planning under Uncertainty
40
A new algorithm: Point-based value iteration
Main idea: Select a small set of belief points Plan for those belief points only V(b) P(s1) b1 b0 b2 Planning under Uncertainty
41
A new algorithm: Point-based value iteration
Main idea: Select a small set of belief points Focus on reachable beliefs Plan for those belief points only V(b) P(s1) b1 b0 b2 a,o a,o Planning under Uncertainty
42
A new algorithm: Point-based value iteration
Main idea: Select a small set of belief points Focus on reachable beliefs Plan for those belief points only Learn value and its gradient V(b) P(s1) b1 b0 b2 a,o a,o Planning under Uncertainty
43
Selecting good belief points
1. Leverage insight from policy search methods: Focus on reachable beliefs. |A||||B| new points a1,o1 a2,o1 P(s1) ba1,o1 ba2,o1 ba2,o2 b ba1,o2 a2,o2 a1,o2 Planning under Uncertainty
44
Selecting good belief points
1. Leverage insight from policy search methods: Focus on reachable beliefs. 2. Avoid including all reachable beliefs: Consider all actions, but stochastic observation choice. |A||||B| new points |A||B| new points a2,o1 P(s1) ba1,o1 ba2,o1 ba2,o2 b ba1,o2 a1,o2 Planning under Uncertainty
45
Selecting good belief points
1. Leverage insight from policy search methods: Focus on reachable beliefs. 2. Avoid including all reachable beliefs: Consider all actions, but stochastic observation choice. 3. Use the error bound on point-based value updates: Select widely-spaced beliefs, rather than near-by beliefs. |A||||B| new points |A||B| new points |B| new points a2,o1 P(s1) ba2,o1 b ba1,o2 a1,o2 Planning under Uncertainty
46
The anytime PBVI algorithm
Alternate between: 1. Growing the set of belief point 2. Planning for those belief points Terminate when you run out of time or have a good policy. Planning under Uncertainty
47
Experimental results: Lasertag domain
State space = RobotPosition OpponentPosition Observable: RobotPosition - always OpponentPosition - only if same as Robot Action space = {North, South, East, West, Tag} Opponent strategy: Move away from robot w/ Pr=0.8 |S|=870, |A|=5, ||=30 Planning under Uncertainty
48
Performance of PBVI on Lasertag domain
Opponent tagged 70% of trials Opponent tagged 17% of trials Planning under Uncertainty
49
Performance on well-known POMDPs
Maze33 |S|=36, |A|=5, ||=17 Hallway |S|=60, |A|=5, ||=20 Hallway2 |S|=92, |A|=5, ||=17 Method QMDP Grid PBUA PBVI Reward 0.198 0.94 2.30 2.25 Time(s) 0.19 n.v. 12166 3448 B - 174 660 470 %Goal 47 n.v 100 95 Reward 0.261 n.v. 0.53 Time(s) 0.51 n.v. 450 288 B - n.v. 300 86 %Goal 22 98 100 Reward 0.109 n.v. 0.35 0.34 Time(s) 1.44 n.v. 27898 360 B - 337 1840 95 Planning under Uncertainty
50
Validation of the belief expansion heuristic
Hallway domain: |S|=60, |A|=5, ||=20 Planning under Uncertainty
51
Validation of the belief expansion heuristic
Tag domain: |S|=870, |A|=5, ||=30 Planning under Uncertainty
52
Back to the grand challenge
How can we go from 870 states to real-world problems? Planning under Uncertainty
53
High-level controller
Structured POMDPs Many real-world decision-making problems exhibit structure inherent to the problem domain. High-level controller Navigation Cognitive support Social interaction Planning under Uncertainty
54
High-level controller
Structured POMDPs Many real-world decision-making problems exhibit structure inherent to the problem domain. High-level controller Navigation Cognitive support Social interaction Move AskWhere Left Right Forward Backward Planning under Uncertainty
55
Structured POMDP approaches
Factored models [Boutilier & Poole, 1996; Hansen & Feng, 2000; Guestrin et al., 2001] Idea: Represent state space with multi-valued state features. Insight: Independencies between state features can be leveraged to overcome the curse of dimensionality. Hierarchical POMDPs [Wiering & Schmidhuber, 1997; Theocharous et al., 2000; Hernandez-Gardiol & Mahadevan, 2000; Pineau & Thrun, 2000] Idea: Exploit domain knowledge to divide one POMDP into many smaller ones. Insight: Smaller action sets further help overcome the curse of history. Planning under Uncertainty
56
A hierarchy of POMDPs subtask abstract action primitive action Act
ExamineHealth Navigate Move ClarifyGoal VerifyFluids VerifyMeds primitive action North South East West Planning under Uncertainty
57
PolCA+: Planning with a hierarchy of POMDPs
Step 1: Select the action set Navigate AMove = {N,S,E,W} Move ClarifyGoal North South East West ACTIONS North South East West ClarifyGoal VerifyPulse VerifyMeds Planning under Uncertainty
58
PolCA+: Planning with a hierarchy of POMDPs
Step 1: Select the action set Step 2: Minimize the state set Navigate AMove = {N,S,E,W} SMove = {X,Y} Move ClarifyGoal North South East West ACTIONS North South East West ClarifyGoal VerifyPulse VerifyMeds STATE FEATURES X-position Y-position X-goal Y-goal HealthStatus Planning under Uncertainty
59
PolCA+: Planning with a hierarchy of POMDPs
Step 1: Select the action set Step 2: Minimize the state set Step 3: Choose parameters Navigate AMove = {N,S,E,W} SMove = {X,Y} Move ClarifyGoal North South East West ACTIONS North South East West ClarifyGoal VerifyPulse VerifyMeds STATE FEATURES X-position Y-position X-goal Y-goal HealthStatus PARAMETERS {bh,Th,Oh,Rh} Planning under Uncertainty
60
PolCA+: Planning with a hierarchy of POMDPs
Step 1: Select the action set Step 2: Minimize the state set Step 3: Choose parameters Step 4: Plan task h Navigate AMove = {N,S,E,W} SMove = {X,Y} Move ClarifyGoal North South East West ACTIONS North South East West ClarifyGoal VerifyPulse VerifyMeds STATE FEATURES X-position Y-position X-goal Y-goal HealthStatus PARAMETERS {bh,Th,Oh,Rh} PLAN h Planning under Uncertainty
61
PolCA+ in the Nursebot domain
Goal: A robot is deployed in a nursing home, where it provides reminders to elderly users and accompanies them to appointments. Planning under Uncertainty
62
The effects of confirmation actions
Planning under Uncertainty
63
Comparing user performance
0.18 0.1 0.1 Planning under Uncertainty
64
The hierarchical POMDP planner in action
Planning under Uncertainty
65
Summary POMDPs offer a rich framework for large-scale real-world robot planning problems. POMDP planning is made difficult by: the curse of dimensionality the curse of history A combination of abstraction and structure can lead to tractable POMDP plans for large domains. Planning under Uncertainty
66
Project information: www.cs.cmu.edu/~nursebot Navigation software:
Papers and more: Joint work with: Geoff Gordon, Michael Montemerlo, Judy Matthews, Martha Pollack, Nicholas Roy, Sebastian Thrun
67
Point-based value update
V(b) P(s1) b1 b0 b2 Planning under Uncertainty
68
Point-based value update
Initialize the value function (…and skip ahead a few iterations) Vn(b) P(s1) b1 b0 b2 Planning under Uncertainty
69
Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: Vn(b) P(s1) b Planning under Uncertainty
70
Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Vn(b) P(s1) ba1,o1 ba2,o1 ba2,o2 b ba1,o2 Planning under Uncertainty
71
Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Vn(b) ba1,o1, ba2,o1 ba1,o2 ba2,o2 P(s1) ba1,o1 ba2,o1 ba2,o2 b ba1,o2 Planning under Uncertainty
72
Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Sum over observations: Vn(b) ba1,o1, ba2,o1 ba1,o2 ba2,o2 P(s1) ba1,o1 ba2,o1 ba2,o2 b ba1,o2 Planning under Uncertainty
73
Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Sum over observations: Vn(b) ba1,o1, ba2,o1 ba1,o2 ba2,o2 P(s1) b Planning under Uncertainty
74
Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Sum over observations: Vn+1(b) ba2 ba1 P(s1) b Planning under Uncertainty
75
Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Sum over observations: Max over actions: Vn+1(b) ba2 ba1 P(s1) b Planning under Uncertainty
76
Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Sum over observations: Max over actions: Vn+1(b) P(s1) b1 b0 b2 Planning under Uncertainty
77
Complexity of value update
Exact Update Point-based Update I - Projection S2An S2AB II - Sum SAn SAB2 III - Max SAn SAB where: S = # states n = # solution vectors at iteration n A = # actions B = # belief points = # observations n+1 Planning under Uncertainty
78
A bound on the approximation error
Bound error of the point-based value update. Bound depends on how densely we sample belief points. Let be the set of reachable beliefs. Let B be the set of belief points. Theorem: For any belief set B and any horizon n, the error of the PBVI algorithm n=||VnB-Vn*|| is bounded by: Planning under Uncertainty
79
A closer look at the error bound
V(b) Err Err P(s1) b1 b0 b2 a,o a,o Planning under Uncertainty
80
The anytime PBVI algorithm
Alternate between: Growing the set of belief point (e.g. B doubles in size everytime) Planning for those belief points Terminate when you run out of time or have a good policy. Lasertag results: 13 phases: |B|=1334 ran out of time! Planning under Uncertainty
81
The anytime PBVI algorithm
Alternate between: Growing the set of belief point (e.g. B doubles in size everytime) Planning for those belief points Terminate when you run out of time or have a good policy. Lasertag results: 13 phases: |B|=1334 ran out of time! Hallway2 results: 8 phases: |B|=95 found good policy. Planning under Uncertainty
82
Results on small dialogue domain
|S|=12, |A|=20, |O|=3 Planning under Uncertainty
83
Achieving a flexible trade-off
POMDP PolCA+ D1 PolCA+ D2 Reward FIB QMDP Planning time Planning under Uncertainty
84
Addressing the curse of dimensionality
Planning under Uncertainty
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.