Download presentation
Presentation is loading. Please wait.
Published byClyde Newman Modified over 9 years ago
1
A Hybridized Planner for Stochastic Domains Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento
2
Planning under Uncertainty (ICAPS’03 Workshop) Qualitative (disjunctive) uncertainty Which real problem can you solve? Quantitative (probabilistic) uncertainty Which real problem can you model?
3
The Quantitative View Markov Decision Process models uncertainty with probabilistic outcomes general decision-theoretic framework algorithms are slow do we need the full power of decision theory? is an unconverged partial policy any good?
4
The Qualitative View Conditional Planning Model uncertainty as logical disjunction of outcomes exploits classical planning techniques FAST ignores probabilities poor solutions how bad are pure qualitative solutions? can we improve the qualitative policies?
5
HybPlan: A Hybridized Planner combine probabilistic + disjunctive planners produces good solutions in intermediate times anytime: makes effective use of resources bounds termination with quality guarantee Quantitative View completes partial probabilistic policy by using qualitative policies in some states Qualitative View improves qualitative policies in more important regions
6
Outline Motivation Planning with Probabilistic Uncertainty (RTDP) Planning with Disjunctive Uncertainty (MBP) Hybridizing RTDP and MBP (HybPlan) Experiments Conclusions and Future Work
7
Markov Decision Process S : a set of states A : a set of actions Pr : prob. transition model C : cost model s 0 : start state G : a set of goals Find a policy (S ! A) minimizes expected cost to reach a goal for an indefinite horizon for a fully observable Markov decision process. Optimal cost function, J*, ~ optimal policy
8
s0s0 Goal 2 2 Example Longer path Wrong direction, but goal still reachable All states are dead-ends
9
4 1 3 8 4 4 1 8 32 2 77 33 1 1 1 2 6 0 1 1 2 2 Goal Optimal State Costs
10
4 3 32 2 0 1 1 Goal Optimal Policy
11
Bellman Backup: Create better approximation to cost function @ s
12
Bellman Backup: Create better approximation to cost function @ s Trial=simulate greedy policy & update visited states
13
Bellman Backup: Create better approximation to cost function @ s Real Time Dynamic Programming (Barto et al. ’95; Bonet & Geffner’03) Repeat trials until cost function converges Trial=simulate greedy policy & update visited states
14
Planning with Disjunctive Uncertainty S : a set of states A : a set of actions T : disjunctive transition model s 0 : the start state G : a set of goals Find a strong-cyclic policy (S ! A) that guarantees reaching a goal for an indefinite horizon for a fully observable planning problem
15
Model Based Planner (Bertoli et. al.) States, transitions, etc. represented logically Uncertainty multiple possible successor states Planning Algorithm Iteratively removes “bad” states. Bad = don’t reach anywhere or reach other bad states
16
Goal MBP Policy Sub-optimal solution
17
Outline Motivation Planning with Probabilistic Uncertainty (RTDP) Planning with Disjunctive Uncertainty (MBP) Hybridizing RTDP and MBP (HybPlan) Experiments Conclusions and Future Work
18
HybPlan Top Level Code 0. run MBP to find a solution to goal 1.run RTDP for some time 2.compute partial greedy policy ( rtdp ) 3.compute hybridized policy ( hyb ) by 1. hyb (s) = rtdp (s) if visited(s) > threshold hyb (s) = mbp (s) otherwise 4.clean hyb by removing 1. dead-ends 2. probability 1 cycles 5.evaluate hyb 6.save best policy obtained so far repeat until 1) resources exhaust or 2)a satisfactory policy found
19
0 0 0 0 0 0 0 0 00 0 00 00 0 0 0 0 0 Goal 0 0 2 2 First RTDP Trial 1.run RTDP for some time
20
0 0 0 0 0 0 0 00 0 00 00 0 0 0 0 0 Goal 0 0 2 2 Q 1 (s,N) = 1 + 0.5£ 0 + 0.5£ 0 Q 1 (s,N) = 1 Q 1 (s,S) = Q 1 (s,W) = Q 1 (s,E) = 1 J 1 (s) = 1 Let greedy action be North Bellman Backup 1.run RTDP for some time
21
1 0 0 0 0 0 0 0 00 0 00 00 0 0 0 0 0 Goal 0 0 2 2 Simulation of Greedy Action 1.run RTDP for some time
22
1 0 0 0 0 0 0 00 0 00 00 0 0 0 0 0 Goal 0 0 2 2 Continuing First Trial 1.run RTDP for some time
23
0 0 0 0 1 0 0 00 0 00 0 0 0 0 0 0 Goal 0 0 2 2 1 Continuing First Trial 1.run RTDP for some time
24
0 0 0 0 0 0 00 0 00 10 0 0 0 0 Goal 0 0 2 2 1 1 Finishing First Trial 1.run RTDP for some time
25
0 0 0 0 0 0 00 0 00 0 0 0 0 2 0 Goal 0 0 2 2 1 1 1 Cost Function after First Trial 1.run RTDP for some time
26
0 2 Goal 2 1 1 1 Partial Greedy Policy 2. compute greedy policy ( rtdp )
27
0 2 Goal 2 1 1 1 0 Construct Hybridized Policy w/ MBP 3. compute hybridized policy ( hyb ) (threshold = 0)
28
0 2 Goal 2 1 1 1 After first trial 0 J( hyb ) = 5 5 4 3 2 4 3 Evaluate Hybridized Policy 5. evaluate hyb 6. store hyb
29
0 0 0 0 1 0 00 0 00 0 0 1 2 2 0 Goal 0 0 2 2 1 1 1 Second Trial
30
0 11 21 Partial Greedy Policy
31
0 11 21 Absence of MBP Policy MBP Policy doesn’t exist! no path to goal £ 0 1 01 2 Goal 2
32
0 0 1 0 1 0 00 0 01 0 0 1 2 3 0 0 2 2 1 1 1 Third Trial 2
33
1 0 1 3 2 1 Partial Greedy Policy
34
1 0 1 3 2 1 Probability 1 Cycles 0 repeat find a state s in cycle hyb (s) = mbp (s) until cycle is broken
35
1 0 1 3 2 1 Probability 1 Cycles 0 repeat find a state s in cycle hyb (s) = mbp (s) until cycle is broken
36
1 0 1 3 2 1 Probability 1 Cycles 0 repeat find a state s in cycle hyb (s) = mbp (s) until cycle is broken
37
1 0 1 3 2 1 Probability 1 Cycles 0 repeat find a state s in cycle hyb (s) = mbp (s) until cycle is broken
38
1 0 1 3 2 1 Probability 1 Cycles 0 0 1 01 2 Goal 2 repeat find a state s in cycle hyb (s) = mbp (s) until cycle is broken
39
0 2 Goal 2 1 1 1 After 1 st trial 0 J( hyb ) = 5 5 4 3 2 4 3 Error Bound J*(s 0 ) · 5 J*(s 0 ) ¸ 1 ) Error( hyb ) = 5-1 = 4
40
Termination when a policy of required error bound is found when the planning time exhausts when the available memory exhausts Properties outputs a proper policy anytime algorithm (once MBP terminates) HybPlan = RTDP, if infinite resources available HybPlan = MBP, if extremely limited resources HybPlan = better than both, otherwise
41
Outline Motivation Planning with Probabilistic Uncertainty (RTDP) Planning with Disjunctive Uncertainty (MBP) Hybridizing RTDP and MBP (HybPlan) Experiments Anytime Properties Scalability Conclusions and Future Work
42
Domains NASA Rover Domain Factory Domain Elevator domain
43
Anytime Properties RTDP
44
Anytime Properties RTDP
45
Scalability ProblemsTime before memory exhausts J( rtdp )J( mbp )J( hyb ) Rov5~1100 sec55.3667.0448.16 Rov2~800 sec 1 65.2249.91 Mach9~1500 sec143.9566.5048.49 Mach6~300 sec 1 71.56 Elev14~10000 sec 1 46.4944.48 Elev15~10000 sec 1 233.0787.46
46
Conclusions First algorithm that integrates disjunctive and probabilistic planners. Experiments show that HybPlan is anytime scales better than RTDP produces better quality solutions than MBP can interleaved planning and execution
47
Hybridized Planning: A General Notion Hybridize other pairs of planners an optimal or close-to-optimal planner a sub-optimal but fast planner to yield a planner that produces a good quality solution in intermediate running times Examples POMDP : RTDP/PBVI with POND/MBP/BBSP Oversubscription Planning : A* with greedy solutions Concurrent MDP : Sampled RTDP with single-action RTDP
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.