Joelle Pineau Robotics Institute Carnegie Mellon University

Planning under Uncertainty Or why POMDPs are a roboticist’s best friend
Joelle Pineau Robotics Institute Carnegie Mellon University Dagstuhl Workshop on Plan-based Control of Robotic Agents June, 2003

Structured Environments
Planning under Uncertainty

Unstructured Environments

Highlights from the household robot challenge
“… accomplish jobs in changing and partly unknown environments; resolve conflicts between interfering jobs …” “… switch between different representations of the world with low effort and transfer reasoning results accordingly.” “… generate a plan that has a high expected utility with respect to the robots’ goals and beliefs.” “… percepts might be incomplete, ambiguous, outdated, incorrect.” “… proper modeling and reasoning about movements.” “use crude models of the physics of the system...” Planning under Uncertainty

The Nursebot project in its early days

Talk Outline Why uncertainty matters in plan-based robotics
Partially Observable Markov Decision Processes Planning in large POMDPs Experiments with POMDP-controlled robots Planning under Uncertainty

Types of uncertainty in robotics
1. Non-deterministic effects of actions 2. Partial and noisy sensor information 3. Inaccurate model of the environment Planning under Uncertainty

Non-deterministic motion commands
Estimated robot position True robot position Goal position Planning under Uncertainty

Noisy speech recognition

Inaccurate model of person
Robot position True person position Planning under Uncertainty

Why use a POMDP? POMDPs provide a rich framework for sequential decision-making, which can model: varying rewards across actions and goals actions with random effects uncertainty in the state of the world Planning under Uncertainty

Existing applications of POMDPs
Maintenance scheduling [Puterman, 1994] Robot navigation [Koenig & Simmons, 1995; Roy & Thrun, 1999] Helicopter control [Bagnell & Schneider, 2001; Ng et al., 2002] Dialogue modeling [Roy, Pineau & Thrun, 2000; Peak&Horvitz, 2000] Preference elicitation [Boutilier, 2002] Planning under Uncertainty

POMDP Model st-1 st ot-1 ot POMDP is n-tuple { S, A, , T, O, R }:
S = state set A = action set  = observation set What goes on: st-1 st What we see: ot-1 ot at-1 at Planning under Uncertainty

S = state set A = action set  = observation set T(s,a,s’) = state-to-state transition probabilities What goes on: st-1 st What we see: ot-1 ot at-1 at Planning under Uncertainty

S = state set A = action set  = observation set T(s,a,s’) = state-to-state transition probabilities O(s,a,o) = observation generation probabilities What goes on: st-1 st What we see: ot-1 ot at-1 at Planning under Uncertainty

POMDP Model st-1 st rt-1 rt ot-1 ot
POMDP is n-tuple { S, A, , T, O, R }: S = state set A = action set  = observation set T(s,a,s’) = state-to-state transition probabilities O(s,a,o) = observation generation probabilities R(s,a) = reward function What goes on: st-1 st rt-1 rt What we see: ot-1 ot at-1 at Planning under Uncertainty

POMDP Model st-1 st rt-1 rt ot-1 ot bt-1 bt
POMDP is n-tuple { S, A, , T, O, R }: S = state set A = action set  = observation set T(s,a,s’) = state-to-state transition probabilities O(s,a,o) = observation generation probabilities R(s,a) = reward function What goes on: st-1 st rt-1 rt What we see: ot-1 ot at-1 at What we infer: bt-1 bt Planning under Uncertainty

Plausible robot beliefs
Belief over robot location P(s) state Planning under Uncertainty

Plausible robot beliefs
Belief over robot location P(s) state Belief over person location Planning under Uncertainty

Understanding the belief state
A belief is a probability distribution over states Where Dim(B) = |S|-1 E.g. Let S={s1, s2} 1 P(s1) Planning under Uncertainty

A belief is a probability distribution over states Where Dim(B) = |S|-1 E.g. Let S={s1, s2, s3} 1 P(s1) P(s2) 1 Planning under Uncertainty

A belief is a probability distribution over states Where Dim(B) = |S|-1 E.g. Let S={s1, s2, s3 , s4} 1 P(s3) P(s1) P(s2) 1 Planning under Uncertainty

The first curse of POMDP planning
The curse of dimensionality: dimension of planning problem = # of states Planning under Uncertainty

POMDP planning Goal: Pick actions such that we maximize the expected sum of rewards: To select appropriate actions, learn an action-selection policy: To learn the policy, first learn the value function: Planning under Uncertainty

POMDP value function Represent V as the upper surface of a set of hyper-planes. V is piecewise-linear convex Backup operator T: V  TV V(b) b P(s1) 2-states Planning under Uncertainty

POMDP value function Represent V as the upper surface of a set of hyper-planes. V is piecewise-linear convex Backup operator T: V  TV V(b) V(b) b P(s2) P(s1) b P(s1) 2-states 3-states Planning under Uncertainty

Exact value iteration for POMDPs
Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes V0(b) P(s1) b Planning under Uncertainty

Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes V1(b) P(s1) b Planning under Uncertainty

Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes 2 27 V2(b) P(s1) b Planning under Uncertainty

Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes 2 27 ,348,907 V2(b) P(s1) b Planning under Uncertainty

Simple problem: |S|=2, |A|=3, ||=2  Many hyper-planes can be pruned away Iteration # hyper-planes 0 1 1 3 2 5 3 9 4 7 V2(b) P(s1) b Planning under Uncertainty

Iteration # hyper-planes
Is pruning sufficient? Larger problem: |S|=20, |A|=6, ||=8 Iteration # hyper-planes 0 1 1 5 2 213 3 ??? Not for this problem! Goal [Hauskrecht, 2000] Planning under Uncertainty

Certainly not for this problem!
104 (navigation) x 103 (dialogue) states 1000+ observations 100+ actions Planning under Uncertainty

The second curse of POMDP planning
The curse of dimensionality: the dimension of each hyper-plane = # of states The curse of history: the number of hyper-planes grows exponentially with the planning horizon Planning under Uncertainty

The second curse of POMDP planning
The curse of dimensionality: the dimension of each hyper-plane = # of states The curse of history: the number of hyper-planes grows exponentially with the planning horizon dimensionality history Complexity of POMDP value iteration: What is wrong with this??? Planning under Uncertainty

Exact planning assumes all beliefs are equally likely

Possible approximation approaches
1. Ignore the belief [Littman et al., 1995] 3. Compress the belief [Poupart&Boutilier, 2002; Roy&Gordon, 2002] 2. Discretize the belief [Lovejoy, 1991; Brafman 1997; Hauskrecht, 2000; Zhou&Hansen, 2001] 4. Plan for trajectories [Baxter&Bartlett, 2000; Ng&Jordan, 2002] s1 s0 s2 Planning under Uncertainty

A new algorithm: Point-based value iteration
Main idea: Select a small set of belief points V(b) P(s1) b1 b0 b2 Planning under Uncertainty

Main idea: Select a small set of belief points Plan for those belief points only V(b) P(s1) b1 b0 b2 Planning under Uncertainty

Main idea: Select a small set of belief points  Focus on reachable beliefs Plan for those belief points only V(b) P(s1) b1 b0 b2 a,o a,o Planning under Uncertainty

Main idea: Select a small set of belief points  Focus on reachable beliefs Plan for those belief points only  Learn value and its gradient V(b) P(s1) b1 b0 b2 a,o a,o Planning under Uncertainty

Selecting good belief points
1. Leverage insight from policy search methods: Focus on reachable beliefs.  |A||||B| new points a1,o1 a2,o1 P(s1) ba1,o1 ba2,o1 ba2,o2 b ba1,o2 a2,o2 a1,o2 Planning under Uncertainty

1. Leverage insight from policy search methods: Focus on reachable beliefs. 2. Avoid including all reachable beliefs: Consider all actions, but stochastic observation choice.  |A||||B| new points  |A||B| new points a2,o1 P(s1) ba1,o1 ba2,o1 ba2,o2 b ba1,o2 a1,o2 Planning under Uncertainty

1. Leverage insight from policy search methods: Focus on reachable beliefs. 2. Avoid including all reachable beliefs: Consider all actions, but stochastic observation choice. 3. Use the error bound on point-based value updates: Select widely-spaced beliefs, rather than near-by beliefs.  |A||||B| new points  |A||B| new points  |B| new points a2,o1 P(s1) ba2,o1 b ba1,o2 a1,o2 Planning under Uncertainty

The anytime PBVI algorithm
Alternate between: 1. Growing the set of belief point 2. Planning for those belief points Terminate when you run out of time or have a good policy. Planning under Uncertainty

Experimental results: Lasertag domain
State space = RobotPosition  OpponentPosition Observable: RobotPosition - always OpponentPosition - only if same as Robot Action space = {North, South, East, West, Tag} Opponent strategy: Move away from robot w/ Pr=0.8 |S|=870, |A|=5, ||=30 Planning under Uncertainty

Performance of PBVI on Lasertag domain
Opponent tagged 70% of trials Opponent tagged 17% of trials Planning under Uncertainty

Performance on well-known POMDPs
Maze33 |S|=36, |A|=5, ||=17 Hallway |S|=60, |A|=5, ||=20 Hallway2 |S|=92, |A|=5, ||=17 Method QMDP Grid PBUA PBVI Reward 0.198 0.94 2.30 2.25 Time(s) 0.19 n.v. 12166 3448 B - 174 660 470 %Goal 47 n.v 100 95 Reward 0.261 n.v. 0.53 Time(s) 0.51 n.v. 450 288 B - n.v. 300 86 %Goal 22 98 100 Reward 0.109 n.v. 0.35 0.34 Time(s) 1.44 n.v. 27898 360 B - 337 1840 95 Planning under Uncertainty

Validation of the belief expansion heuristic
Hallway domain: |S|=60, |A|=5, ||=20 Planning under Uncertainty

Validation of the belief expansion heuristic
Tag domain: |S|=870, |A|=5, ||=30 Planning under Uncertainty

Back to the grand challenge
How can we go from 870 states to real-world problems? Planning under Uncertainty

High-level controller
Structured POMDPs  Many real-world decision-making problems exhibit structure inherent to the problem domain. High-level controller Navigation Cognitive support Social interaction Planning under Uncertainty

High-level controller
Structured POMDPs  Many real-world decision-making problems exhibit structure inherent to the problem domain. High-level controller Navigation Cognitive support Social interaction Move AskWhere Left Right Forward Backward Planning under Uncertainty

Structured POMDP approaches
Factored models [Boutilier & Poole, 1996; Hansen & Feng, 2000; Guestrin et al., 2001] Idea: Represent state space with multi-valued state features. Insight: Independencies between state features can be leveraged to overcome the curse of dimensionality. Hierarchical POMDPs [Wiering & Schmidhuber, 1997; Theocharous et al., 2000; Hernandez-Gardiol & Mahadevan, 2000; Pineau & Thrun, 2000] Idea: Exploit domain knowledge to divide one POMDP into many smaller ones. Insight: Smaller action sets further help overcome the curse of history. Planning under Uncertainty

A hierarchy of POMDPs subtask abstract action primitive action Act
ExamineHealth Navigate Move ClarifyGoal VerifyFluids VerifyMeds primitive action North South East West Planning under Uncertainty

PolCA+: Planning with a hierarchy of POMDPs
Step 1: Select the action set Navigate AMove = {N,S,E,W} Move ClarifyGoal North South East West ACTIONS North South East West ClarifyGoal VerifyPulse VerifyMeds Planning under Uncertainty

Step 1: Select the action set Step 2: Minimize the state set Navigate AMove = {N,S,E,W} SMove = {X,Y} Move ClarifyGoal North South East West ACTIONS North South East West ClarifyGoal VerifyPulse VerifyMeds STATE FEATURES X-position Y-position X-goal Y-goal HealthStatus Planning under Uncertainty

Step 1: Select the action set Step 2: Minimize the state set Step 3: Choose parameters Navigate AMove = {N,S,E,W} SMove = {X,Y} Move ClarifyGoal North South East West ACTIONS North South East West ClarifyGoal VerifyPulse VerifyMeds STATE FEATURES X-position Y-position X-goal Y-goal HealthStatus PARAMETERS {bh,Th,Oh,Rh} Planning under Uncertainty

Step 1: Select the action set Step 2: Minimize the state set Step 3: Choose parameters Step 4: Plan task h Navigate AMove = {N,S,E,W} SMove = {X,Y} Move ClarifyGoal North South East West ACTIONS North South East West ClarifyGoal VerifyPulse VerifyMeds STATE FEATURES X-position Y-position X-goal Y-goal HealthStatus PARAMETERS {bh,Th,Oh,Rh} PLAN h Planning under Uncertainty

PolCA+ in the Nursebot domain
Goal: A robot is deployed in a nursing home, where it provides reminders to elderly users and accompanies them to appointments. Planning under Uncertainty

The effects of confirmation actions

Comparing user performance
0.18 0.1 0.1 Planning under Uncertainty

The hierarchical POMDP planner in action

Summary POMDPs offer a rich framework for large-scale real-world robot planning problems. POMDP planning is made difficult by: the curse of dimensionality the curse of history A combination of abstraction and structure can lead to tractable POMDP plans for large domains. Planning under Uncertainty

Project information: www.cs.cmu.edu/~nursebot Navigation software:
Papers and more: Joint work with: Geoff Gordon, Michael Montemerlo, Judy Matthews, Martha Pollack, Nicholas Roy, Sebastian Thrun

Point-based value update
V(b) P(s1) b1 b0 b2 Planning under Uncertainty

Initialize the value function (…and skip ahead a few iterations) Vn(b) P(s1) b1 b0 b2 Planning under Uncertainty

Initialize the value function (…and skip ahead a few iterations) For each bB: Vn(b) P(s1) b Planning under Uncertainty

Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Vn(b) P(s1) ba1,o1 ba2,o1 ba2,o2 b ba1,o2 Planning under Uncertainty

Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Vn(b) ba1,o1, ba2,o1 ba1,o2 ba2,o2 P(s1) ba1,o1 ba2,o1 ba2,o2 b ba1,o2 Planning under Uncertainty

Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Sum over observations: Vn(b) ba1,o1, ba2,o1 ba1,o2 ba2,o2 P(s1) ba1,o1 ba2,o1 ba2,o2 b ba1,o2 Planning under Uncertainty

Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Sum over observations: Vn(b) ba1,o1, ba2,o1 ba1,o2 ba2,o2 P(s1) b Planning under Uncertainty

Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Sum over observations: Vn+1(b) ba2 ba1 P(s1) b Planning under Uncertainty

Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Sum over observations: Max over actions: Vn+1(b) ba2 ba1 P(s1) b Planning under Uncertainty

Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Sum over observations: Max over actions: Vn+1(b) P(s1) b1 b0 b2 Planning under Uncertainty

Complexity of value update
Exact Update Point-based Update I - Projection S2An S2AB II - Sum SAn SAB2 III - Max SAn SAB where: S = # states n = # solution vectors at iteration n A = # actions B = # belief points  = # observations n+1 Planning under Uncertainty

A bound on the approximation error
Bound error of the point-based value update. Bound depends on how densely we sample belief points. Let  be the set of reachable beliefs. Let B be the set of belief points. Theorem: For any belief set B and any horizon n, the error of the PBVI algorithm n=||VnB-Vn*|| is bounded by: Planning under Uncertainty

A closer look at the error bound
V(b) Err Err P(s1) b1 b0 b2 a,o a,o Planning under Uncertainty

Alternate between: Growing the set of belief point (e.g. B doubles in size everytime) Planning for those belief points Terminate when you run out of time or have a good policy. Lasertag results: 13 phases: |B|=1334 ran out of time! Planning under Uncertainty

Alternate between: Growing the set of belief point (e.g. B doubles in size everytime) Planning for those belief points Terminate when you run out of time or have a good policy. Lasertag results: 13 phases: |B|=1334 ran out of time! Hallway2 results: 8 phases: |B|=95 found good policy. Planning under Uncertainty

Results on small dialogue domain
|S|=12, |A|=20, |O|=3 Planning under Uncertainty

Achieving a flexible trade-off
POMDP PolCA+ D1 PolCA+ D2 Reward FIB QMDP Planning time Planning under Uncertainty

Addressing the curse of dimensionality

Joelle Pineau Robotics Institute Carnegie Mellon University

Similar presentations

Presentation on theme: "Joelle Pineau Robotics Institute Carnegie Mellon University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Joelle Pineau Robotics Institute Carnegie Mellon University

Similar presentations

Presentation on theme: "Joelle Pineau Robotics Institute Carnegie Mellon University"— Presentation transcript:

Similar presentations

About project

Feedback