Presentation is loading. Please wait.

Presentation is loading. Please wait.

Joelle Pineau Robotics Institute Carnegie Mellon University

Similar presentations


Presentation on theme: "Joelle Pineau Robotics Institute Carnegie Mellon University"— Presentation transcript:

1 Planning under Uncertainty Or why POMDPs are a roboticist’s best friend
Joelle Pineau Robotics Institute Carnegie Mellon University Dagstuhl Workshop on Plan-based Control of Robotic Agents June, 2003

2 Structured Environments
Planning under Uncertainty

3 Unstructured Environments
Planning under Uncertainty

4 Highlights from the household robot challenge
“… accomplish jobs in changing and partly unknown environments; resolve conflicts between interfering jobs …” “… switch between different representations of the world with low effort and transfer reasoning results accordingly.” “… generate a plan that has a high expected utility with respect to the robots’ goals and beliefs.” “… percepts might be incomplete, ambiguous, outdated, incorrect.” “… proper modeling and reasoning about movements.” “use crude models of the physics of the system...” Planning under Uncertainty

5 The Nursebot project in its early days
Planning under Uncertainty

6 Talk Outline Why uncertainty matters in plan-based robotics
Partially Observable Markov Decision Processes Planning in large POMDPs Experiments with POMDP-controlled robots Planning under Uncertainty

7 Types of uncertainty in robotics
1. Non-deterministic effects of actions 2. Partial and noisy sensor information 3. Inaccurate model of the environment Planning under Uncertainty

8 Non-deterministic motion commands
Estimated robot position True robot position Goal position Planning under Uncertainty

9 Noisy speech recognition
Planning under Uncertainty

10 Inaccurate model of person
Robot position True person position Planning under Uncertainty

11 Why use a POMDP? POMDPs provide a rich framework for sequential decision-making, which can model: varying rewards across actions and goals actions with random effects uncertainty in the state of the world Planning under Uncertainty

12 Existing applications of POMDPs
Maintenance scheduling [Puterman, 1994] Robot navigation [Koenig & Simmons, 1995; Roy & Thrun, 1999] Helicopter control [Bagnell & Schneider, 2001; Ng et al., 2002] Dialogue modeling [Roy, Pineau & Thrun, 2000; Peak&Horvitz, 2000] Preference elicitation [Boutilier, 2002] Planning under Uncertainty

13 POMDP Model st-1 st ot-1 ot POMDP is n-tuple { S, A, , T, O, R }:
S = state set A = action set  = observation set What goes on: st-1 st What we see: ot-1 ot at-1 at Planning under Uncertainty

14 POMDP Model st-1 st ot-1 ot POMDP is n-tuple { S, A, , T, O, R }:
S = state set A = action set  = observation set T(s,a,s’) = state-to-state transition probabilities What goes on: st-1 st What we see: ot-1 ot at-1 at Planning under Uncertainty

15 POMDP Model st-1 st ot-1 ot POMDP is n-tuple { S, A, , T, O, R }:
S = state set A = action set  = observation set T(s,a,s’) = state-to-state transition probabilities O(s,a,o) = observation generation probabilities What goes on: st-1 st What we see: ot-1 ot at-1 at Planning under Uncertainty

16 POMDP Model st-1 st rt-1 rt ot-1 ot
POMDP is n-tuple { S, A, , T, O, R }: S = state set A = action set  = observation set T(s,a,s’) = state-to-state transition probabilities O(s,a,o) = observation generation probabilities R(s,a) = reward function What goes on: st-1 st rt-1 rt What we see: ot-1 ot at-1 at Planning under Uncertainty

17 POMDP Model st-1 st rt-1 rt ot-1 ot bt-1 bt
POMDP is n-tuple { S, A, , T, O, R }: S = state set A = action set  = observation set T(s,a,s’) = state-to-state transition probabilities O(s,a,o) = observation generation probabilities R(s,a) = reward function What goes on: st-1 st rt-1 rt What we see: ot-1 ot at-1 at What we infer: bt-1 bt Planning under Uncertainty

18 Plausible robot beliefs
Belief over robot location P(s) state Planning under Uncertainty

19 Plausible robot beliefs
Belief over robot location P(s) state Belief over person location Planning under Uncertainty

20 Understanding the belief state
A belief is a probability distribution over states Where Dim(B) = |S|-1 E.g. Let S={s1, s2} 1 P(s1) Planning under Uncertainty

21 Understanding the belief state
A belief is a probability distribution over states Where Dim(B) = |S|-1 E.g. Let S={s1, s2, s3} 1 P(s1) P(s2) 1 Planning under Uncertainty

22 Understanding the belief state
A belief is a probability distribution over states Where Dim(B) = |S|-1 E.g. Let S={s1, s2, s3 , s4} 1 P(s3) P(s1) P(s2) 1 Planning under Uncertainty

23 The first curse of POMDP planning
The curse of dimensionality: dimension of planning problem = # of states Planning under Uncertainty

24 POMDP planning Goal: Pick actions such that we maximize the expected sum of rewards: To select appropriate actions, learn an action-selection policy: To learn the policy, first learn the value function: Planning under Uncertainty

25 POMDP value function Represent V as the upper surface of a set of hyper-planes. V is piecewise-linear convex Backup operator T: V  TV V(b) b P(s1) 2-states Planning under Uncertainty

26 POMDP value function Represent V as the upper surface of a set of hyper-planes. V is piecewise-linear convex Backup operator T: V  TV V(b) V(b) b P(s2) P(s1) b P(s1) 2-states 3-states Planning under Uncertainty

27 Exact value iteration for POMDPs
Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes V0(b) P(s1) b Planning under Uncertainty

28 Exact value iteration for POMDPs
Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes V1(b) P(s1) b Planning under Uncertainty

29 Exact value iteration for POMDPs
Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes 2 27 V2(b) P(s1) b Planning under Uncertainty

30 Exact value iteration for POMDPs
Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes 2 27 V2(b) P(s1) b Planning under Uncertainty

31 Exact value iteration for POMDPs
Simple problem: |S|=2, |A|=3, ||=2 Iteration # hyper-planes 2 27 ,348,907 V2(b) P(s1) b Planning under Uncertainty

32 Exact value iteration for POMDPs
Simple problem: |S|=2, |A|=3, ||=2  Many hyper-planes can be pruned away Iteration # hyper-planes 0 1 1 3 2 5 3 9 4 7 V2(b) P(s1) b Planning under Uncertainty

33 Iteration # hyper-planes
Is pruning sufficient? Larger problem: |S|=20, |A|=6, ||=8 Iteration # hyper-planes 0 1 1 5 2 213 3 ??? Not for this problem! Goal [Hauskrecht, 2000] Planning under Uncertainty

34 Certainly not for this problem!
104 (navigation) x 103 (dialogue) states 1000+ observations 100+ actions Planning under Uncertainty

35 The second curse of POMDP planning
The curse of dimensionality: the dimension of each hyper-plane = # of states The curse of history: the number of hyper-planes grows exponentially with the planning horizon Planning under Uncertainty

36 The second curse of POMDP planning
The curse of dimensionality: the dimension of each hyper-plane = # of states The curse of history: the number of hyper-planes grows exponentially with the planning horizon dimensionality history Complexity of POMDP value iteration: What is wrong with this??? Planning under Uncertainty

37 Exact planning assumes all beliefs are equally likely
Planning under Uncertainty

38 Possible approximation approaches
1. Ignore the belief [Littman et al., 1995] 3. Compress the belief [Poupart&Boutilier, 2002; Roy&Gordon, 2002] 2. Discretize the belief [Lovejoy, 1991; Brafman 1997; Hauskrecht, 2000; Zhou&Hansen, 2001] 4. Plan for trajectories [Baxter&Bartlett, 2000; Ng&Jordan, 2002] s1 s0 s2 Planning under Uncertainty

39 A new algorithm: Point-based value iteration
Main idea: Select a small set of belief points V(b) P(s1) b1 b0 b2 Planning under Uncertainty

40 A new algorithm: Point-based value iteration
Main idea: Select a small set of belief points Plan for those belief points only V(b) P(s1) b1 b0 b2 Planning under Uncertainty

41 A new algorithm: Point-based value iteration
Main idea: Select a small set of belief points  Focus on reachable beliefs Plan for those belief points only V(b) P(s1) b1 b0 b2 a,o a,o Planning under Uncertainty

42 A new algorithm: Point-based value iteration
Main idea: Select a small set of belief points  Focus on reachable beliefs Plan for those belief points only  Learn value and its gradient V(b) P(s1) b1 b0 b2 a,o a,o Planning under Uncertainty

43 Selecting good belief points
1. Leverage insight from policy search methods: Focus on reachable beliefs.  |A||||B| new points a1,o1 a2,o1 P(s1) ba1,o1 ba2,o1 ba2,o2 b ba1,o2 a2,o2 a1,o2 Planning under Uncertainty

44 Selecting good belief points
1. Leverage insight from policy search methods: Focus on reachable beliefs. 2. Avoid including all reachable beliefs: Consider all actions, but stochastic observation choice.  |A||||B| new points  |A||B| new points a2,o1 P(s1) ba1,o1 ba2,o1 ba2,o2 b ba1,o2 a1,o2 Planning under Uncertainty

45 Selecting good belief points
1. Leverage insight from policy search methods: Focus on reachable beliefs. 2. Avoid including all reachable beliefs: Consider all actions, but stochastic observation choice. 3. Use the error bound on point-based value updates: Select widely-spaced beliefs, rather than near-by beliefs.  |A||||B| new points  |A||B| new points  |B| new points a2,o1 P(s1) ba2,o1 b ba1,o2 a1,o2 Planning under Uncertainty

46 The anytime PBVI algorithm
Alternate between: 1. Growing the set of belief point 2. Planning for those belief points Terminate when you run out of time or have a good policy. Planning under Uncertainty

47 Experimental results: Lasertag domain
State space = RobotPosition  OpponentPosition Observable: RobotPosition - always OpponentPosition - only if same as Robot Action space = {North, South, East, West, Tag} Opponent strategy: Move away from robot w/ Pr=0.8 |S|=870, |A|=5, ||=30 Planning under Uncertainty

48 Performance of PBVI on Lasertag domain
Opponent tagged 70% of trials Opponent tagged 17% of trials Planning under Uncertainty

49 Performance on well-known POMDPs
Maze33 |S|=36, |A|=5, ||=17 Hallway |S|=60, |A|=5, ||=20 Hallway2 |S|=92, |A|=5, ||=17 Method QMDP Grid PBUA PBVI Reward 0.198 0.94 2.30 2.25 Time(s) 0.19 n.v. 12166 3448 B - 174 660 470 %Goal 47 n.v 100 95 Reward 0.261 n.v. 0.53 Time(s) 0.51 n.v. 450 288 B - n.v. 300 86 %Goal 22 98 100 Reward 0.109 n.v. 0.35 0.34 Time(s) 1.44 n.v. 27898 360 B - 337 1840 95 Planning under Uncertainty

50 Validation of the belief expansion heuristic
Hallway domain: |S|=60, |A|=5, ||=20 Planning under Uncertainty

51 Validation of the belief expansion heuristic
Tag domain: |S|=870, |A|=5, ||=30 Planning under Uncertainty

52 Back to the grand challenge
How can we go from 870 states to real-world problems? Planning under Uncertainty

53 High-level controller
Structured POMDPs  Many real-world decision-making problems exhibit structure inherent to the problem domain. High-level controller Navigation Cognitive support Social interaction Planning under Uncertainty

54 High-level controller
Structured POMDPs  Many real-world decision-making problems exhibit structure inherent to the problem domain. High-level controller Navigation Cognitive support Social interaction Move AskWhere Left Right Forward Backward Planning under Uncertainty

55 Structured POMDP approaches
Factored models [Boutilier & Poole, 1996; Hansen & Feng, 2000; Guestrin et al., 2001] Idea: Represent state space with multi-valued state features. Insight: Independencies between state features can be leveraged to overcome the curse of dimensionality. Hierarchical POMDPs [Wiering & Schmidhuber, 1997; Theocharous et al., 2000; Hernandez-Gardiol & Mahadevan, 2000; Pineau & Thrun, 2000] Idea: Exploit domain knowledge to divide one POMDP into many smaller ones. Insight: Smaller action sets further help overcome the curse of history. Planning under Uncertainty

56 A hierarchy of POMDPs subtask abstract action primitive action Act
ExamineHealth Navigate Move ClarifyGoal VerifyFluids VerifyMeds primitive action North South East West Planning under Uncertainty

57 PolCA+: Planning with a hierarchy of POMDPs
Step 1: Select the action set Navigate AMove = {N,S,E,W} Move ClarifyGoal North South East West ACTIONS North South East West ClarifyGoal VerifyPulse VerifyMeds Planning under Uncertainty

58 PolCA+: Planning with a hierarchy of POMDPs
Step 1: Select the action set Step 2: Minimize the state set Navigate AMove = {N,S,E,W} SMove = {X,Y} Move ClarifyGoal North South East West ACTIONS North South East West ClarifyGoal VerifyPulse VerifyMeds STATE FEATURES X-position Y-position X-goal Y-goal HealthStatus Planning under Uncertainty

59 PolCA+: Planning with a hierarchy of POMDPs
Step 1: Select the action set Step 2: Minimize the state set Step 3: Choose parameters Navigate AMove = {N,S,E,W} SMove = {X,Y} Move ClarifyGoal North South East West ACTIONS North South East West ClarifyGoal VerifyPulse VerifyMeds STATE FEATURES X-position Y-position X-goal Y-goal HealthStatus PARAMETERS {bh,Th,Oh,Rh} Planning under Uncertainty

60 PolCA+: Planning with a hierarchy of POMDPs
Step 1: Select the action set Step 2: Minimize the state set Step 3: Choose parameters Step 4: Plan task h Navigate AMove = {N,S,E,W} SMove = {X,Y} Move ClarifyGoal North South East West ACTIONS North South East West ClarifyGoal VerifyPulse VerifyMeds STATE FEATURES X-position Y-position X-goal Y-goal HealthStatus PARAMETERS {bh,Th,Oh,Rh} PLAN h Planning under Uncertainty

61 PolCA+ in the Nursebot domain
Goal: A robot is deployed in a nursing home, where it provides reminders to elderly users and accompanies them to appointments. Planning under Uncertainty

62 The effects of confirmation actions
Planning under Uncertainty

63 Comparing user performance
0.18 0.1 0.1 Planning under Uncertainty

64 The hierarchical POMDP planner in action
Planning under Uncertainty

65 Summary POMDPs offer a rich framework for large-scale real-world robot planning problems. POMDP planning is made difficult by: the curse of dimensionality the curse of history A combination of abstraction and structure can lead to tractable POMDP plans for large domains. Planning under Uncertainty

66 Project information: www.cs.cmu.edu/~nursebot Navigation software:
Papers and more: Joint work with: Geoff Gordon, Michael Montemerlo, Judy Matthews, Martha Pollack, Nicholas Roy, Sebastian Thrun

67 Point-based value update
V(b) P(s1) b1 b0 b2 Planning under Uncertainty

68 Point-based value update
Initialize the value function (…and skip ahead a few iterations) Vn(b) P(s1) b1 b0 b2 Planning under Uncertainty

69 Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: Vn(b) P(s1) b Planning under Uncertainty

70 Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Vn(b) P(s1) ba1,o1 ba2,o1 ba2,o2 b ba1,o2 Planning under Uncertainty

71 Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Vn(b) ba1,o1, ba2,o1 ba1,o2 ba2,o2 P(s1) ba1,o1 ba2,o1 ba2,o2 b ba1,o2 Planning under Uncertainty

72 Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Sum over observations: Vn(b) ba1,o1, ba2,o1 ba1,o2 ba2,o2 P(s1) ba1,o1 ba2,o1 ba2,o2 b ba1,o2 Planning under Uncertainty

73 Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Sum over observations: Vn(b) ba1,o1, ba2,o1 ba1,o2 ba2,o2 P(s1) b Planning under Uncertainty

74 Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Sum over observations: Vn+1(b) ba2 ba1 P(s1) b Planning under Uncertainty

75 Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Sum over observations: Max over actions: Vn+1(b) ba2 ba1 P(s1) b Planning under Uncertainty

76 Point-based value update
Initialize the value function (…and skip ahead a few iterations) For each bB: For each (a,o): Project forward bba,o and find best value: Sum over observations: Max over actions: Vn+1(b) P(s1) b1 b0 b2 Planning under Uncertainty

77 Complexity of value update
Exact Update Point-based Update I - Projection S2An S2AB II - Sum SAn SAB2 III - Max SAn SAB where: S = # states n = # solution vectors at iteration n A = # actions B = # belief points  = # observations n+1 Planning under Uncertainty

78 A bound on the approximation error
Bound error of the point-based value update. Bound depends on how densely we sample belief points. Let  be the set of reachable beliefs. Let B be the set of belief points. Theorem: For any belief set B and any horizon n, the error of the PBVI algorithm n=||VnB-Vn*|| is bounded by: Planning under Uncertainty

79 A closer look at the error bound
V(b) Err Err P(s1) b1 b0 b2 a,o a,o Planning under Uncertainty

80 The anytime PBVI algorithm
Alternate between: Growing the set of belief point (e.g. B doubles in size everytime) Planning for those belief points Terminate when you run out of time or have a good policy. Lasertag results: 13 phases: |B|=1334 ran out of time! Planning under Uncertainty

81 The anytime PBVI algorithm
Alternate between: Growing the set of belief point (e.g. B doubles in size everytime) Planning for those belief points Terminate when you run out of time or have a good policy. Lasertag results: 13 phases: |B|=1334 ran out of time! Hallway2 results: 8 phases: |B|=95 found good policy. Planning under Uncertainty

82 Results on small dialogue domain
|S|=12, |A|=20, |O|=3 Planning under Uncertainty

83 Achieving a flexible trade-off
POMDP PolCA+ D1 PolCA+ D2 Reward FIB QMDP Planning time Planning under Uncertainty

84 Addressing the curse of dimensionality
Planning under Uncertainty


Download ppt "Joelle Pineau Robotics Institute Carnegie Mellon University"

Similar presentations


Ads by Google