1Nau: Univ of Alberta, 2004 Forward-Chaining Planning in Nondeterministic Domains Ugur Kuter and Dana Nau Department of Computer Science and Institute for Systems Research University of Maryland College Park, Maryland
2Nau: Univ of Alberta, 2004 Generating Plans of Action l Programs to aid human planners u Project management (consumer software) u Plan storage and retrieval »(e.g., variant process planning) u Automatic schedule generation »(various OR and AI techniques) l For some problems, really want to generate plans automatically u Much more difficult u One source of difficulty: nondeterministic outcomes »If I plan to perform some action a, I cannot be sure in advance what outcome a will have
3Nau: Univ of Alberta, 2004 Planning with Nondeterminism l Actions with multiple possible outcomes u Action failures »e.g., gripper drops its load u Exogenous events »e.g., road closed l Like Markov Decision Processes (MDPs), but without probabilities attached to the outcomes u Useful if accurate probabilities aren’t available, or if probability calculations would introduce inaccuracies l Existing approaches u Conditional Planning (e.g., Penberthy & Weld, 1992) u Conformant Planning (e.g., Smith & Weld, 1998) u Symbolic Model Checking (e.g., Cimatti et al., 1998, 2003) a c b Grasp block c a c b Intended outcome abc Unintended outcome
4Nau: Univ of Alberta, 2004 Research Motivation l Algorithms for planning with nondeterminism have very high computational complexity u Search space usually is huge u Existing algorithms search most of the space l Classical planning u Lots of work on generating plans quickly u Techniques for pruning large parts of the entire space u Can we generalize any of these techniques for use in nondeterministic domains?
5Nau: Univ of Alberta, 2004 Our Results l A way to nondeterminize any forward-chaining planner for deterministic planning domains u Rewrite it so that it works in nondeterministic domains l Theoretical analysis u Under the appropriate conditions, some nondeterminized planners can run exponentially faster than the best previous planners for nondeterministic domains l Experimental verification of the theoretical results
6Nau: Univ of Alberta, 2004 l Some of the most capable existing planners use forward chaining u Backtracking state-space search starting at the initial state u e.g., HSP, TLPlan, TALplanner, SHOP2 l FCP: abstract model of forward-chaining planners l Among different forward-chaining planners, the main difference is the action-generation function (s) {actions applicable to s} l Can classify them based on u Domain-specific u Domain-independent u Domain-configurable Forward-Chaining Planners Procedure FCP (s 0, g) π := the empty plan; s := s 0 loop if s satisfies g then return π else if s isn’t in ancestors(s) then A := (s) if A is empty then return failure nondeterministically choose a A π := π.a; s := (s,a) else return failure
7Nau: Univ of Alberta, 2004 Classification of Forward-Chaining Planners l Domain-specific: is designed or tuned for one specific domain u Several application-oriented planners work this way »e.g., EDAPS (process planning), Tignum 2 (used in Bridge Baron) »Good performance in the given domain, but hard to generalize l Domain-independent: works in any domain within some class u Usually, works in any classical planning domain u Focus of most research on AI planning u So far, not practical for real-world planning l Domain-configurable: … Procedure FCP (s 0, g) π := the empty plan; s := s 0 loop if s satisfies g then return π else if s isn’t in ancestors(s) then A := (s) if A is empty then return failure nondeterministically choose a A π := π.a; s := (s,a) else return failure
8Nau: Univ of Alberta, 2004 Classification (continued) l Domain-configurable u has a domain-independent computational engine u Give domain-specific information to as part of the domain description »How to prune some of the actions from 1.Control rules written in temporal logic, used for pruning 2.Hierarchical Task Networks (HTNs) and ordered decomposition Procedure FCP (s 0, g, K) π := the empty plan; s := s 0 loop if s satisfies g then return π else if s isn’t in ancestors(s) then A := (s, K) if A is empty then return failure nondeterministically choose a A π := π.a; s := (s,a) else return failure
9Nau: Univ of Alberta, Control Rules in Temporal Logic l Depth-first forward search, with control rules written in temporal logic u For each state s, a control rule, f »prune s if it doesn’t satisfy f u Control rules for successors of s are computed via logical progression l TLPlan (Bacchus & Kabanza, Artificial Intelligence 2000) l TALplanner (Doherty & Kvarnstrom, AMAI 2001) u Both work the same way, but they use different temporal logics l Example (next slide): u A trivial blocks-world planning problem u LTL (the logic used in TLPlan)
10Nau: Univ of Alberta, 2004 Example State s:Goal: {on(b,a)} l Control rule f: never pick up block x from the table unless x needs to be on top of another block l Progressed formula f + (must be true in all children of s) u If we pick up a, f + will not be satisfied - prune this state u If we pick up b, f + will be satisfied - keep searching below this state l Can write rules to prune huge parts of the search space ab a b
11Nau: Univ of Alberta, HTN Planning method travel(x,y) get-ticket (a(x), a(y)) travel (x, a(x))fly (a(x), a(y)) travel (a(y),y) air-travel(x,y) get-taxiride-taxi (x,y)pay-driver taxi-travel(x,y) travel(UMD, U-of-Alberta) get-ticket(DCA, YEG) go to Orbitz find-flights(DCA,YEG) buy-ticket(DCA,YEG) travel(UMD, DCA) get-taxi ride-taxi(UMD, DCA) pay-driver fly(DCA, YEG) travel(YEG, U-of-Alberta) get-taxi ride-taxi(YEG, U-of-Alberta) pay-driver task u Decompose tasks into subtasks u Handle constraints (e.g., taxi not good for long distances) u Resolve interactions (e.g., take taxi early enough to catch plane) u If necessary, backtrack and try other decompositions
12Nau: Univ of Alberta, 2004 Ordered Decomposition l Decompose tasks in the same order in which they’ll be executed l Whenever we want to plan the next task u we’ve already planned everything that comes before it u Thus, we know the current state of the world l SHOP2 (Nau et al., IJCAI 2001, JAIR 2003) s0s0 s1s1 s2s2 … task t m … … task t n op 1 op 2 op i S i–1 task t 0
13Nau: Univ of Alberta, 2004 Performance l Using control rules and HTNs u can encode domain-specific problem-solving knowledge u highly focused search »go almost directly toward a near-optimal solution, with very little backtracking l TLPlan, TALplanner, and SHOP2 have been the best performers in the International Planning Competitions »Several orders of magnitude faster than the domain- independent planners »Solved many more problems
14Nau: Univ of Alberta, 2004 Us:East declarer, West dummy Opponents:defenders, South & North Contract:East – 3NT On lead:West at trick 3 East: KJ74 West: A2 Out: QT98653 Expressivity l Forward-chaining planners always know the current state u This makes it easy to do things that would be difficult otherwise u States can be arbitrary data structures u Preconditions and effects can include »logical inference »complex numeric computations »interactions with other software packages l Applications: u SHOP2 is open-source freeware, has been used in dozens of applications (Nau et al., 2004) u Bacchus and Kabanza are attempting to commercialize TLPlan
15Nau: Univ of Alberta, 2004 How to Nondeterminize Forward-Chaining Planners l Two steps: 1. Modify FCP to generate policies rather than plans 2. Modify FCP to solve problems in which actions have multiple outcomes l Want to do this in such a way that it will work for all instances of FCP u Nondeterminized versions of HSP, TLPlan, TALplanner, SHOP2, etc.
16Nau: Univ of Alberta, 2004 Plans Versus Policies l In classical domains, a solution is a plan (sequence of actions) l For nondeterministic domains, that’s not sufficient u An action may lead to more than one possible state u What to do next depends on what state we’re in u Instead of a plan, use a policy: a partial function from states to actions s0s0 s1s1 s2s2 s3s3 a0a0 a1a1 a2a2 Initial State Goal State s0s0 s1s1 s3s3 a0a0 s2s2 a1a1 a2a2 s4s4 π = (a 0, a 1, a 2 ) π = {(s 1,a 0 ), (s 1,a 1 ), (s 2,a 3 )} s0s0 s1s1 s3s3 a0a0
17Nau: Univ of Alberta, 2004 Execution Graphs l An action a has more than one possible outcome … … so a policy π has more than one possible execution path l Execution graph E(π) = the graph of all of π’s possible execution paths u S π = {all states in E(π)} s0s0 s2s2 s3s3 s4s4 s1s1 s5s5 Initial States Goal States a1a1 a1a1 a2a2 π = {(s 0, a 0 ), (s 1, a 1 ), (s 2, a 1 ), (s 3, a 2 )} a0a0 s0s0 s1s1 s3s3 a0a0
18Nau: Univ of Alberta, 2004 Nondeterminization (Step 1) l Rewrite FCP so that it generates solution policies rather than solution plans Procedure Policy-FCP (s 0, g, K) π := ; s := s 0 loop if s satisfies g then return π else if s isn’t in S π then A := (s, K) if A is empty then return failure nondeterministically choose a A π := π {(s,a)}; s := (s,a) else return failure Procedure FCP (s 0, g, K) π := the empty plan; s := s 0 loop if s satisfies g then return π else if s isn’t in ancestors(s) then A := (s, K) if A is empty then return failure nondeterministically choose a A π := π.a; s := (s,a) else return failure
19Nau: Univ of Alberta, 2004 (Cimatti et al, Artificial Intelligence, 2003) l Weak solution: at least one execution path reaches a goal l Strong solution: every execution path reaches a goal l Strong-cyclic solution: every fair execution path reaches a goal u Don’t stay in a cycle forever if there’s a state-transition out of it s0 s1 s3 Goal a0 a1 a2 s2 a3 s0 s1 s3 Goal a0 a1 a2 s2 s0 s1 s3 Goal a0 a1 a2 s2 Goal Types of Solutions a3
20Nau: Univ of Alberta, 2004 Nondeterminization (Step 2) l Modify Policy-FCP to generate strong-cyclic solutions u Can also modify it to generate strong and weak solutions (won’t discuss details) Procedure ND-FCP (S 0, g, K) π := ; S := S 0 ; solved := loop if S = then return π select s in S and remove it from S if s satisfies g then put s into solved else if s isn’t in S π then A := (s, K) if A is empty then return failure nondeterministically choose a A π := π {(s,a)}; S := S (s,a) else if s has no descendants in (S solved) – S π then return failure Procedure Policy-FCP (s 0, g, K) π := ; s := s 0 loop if s satisfies g then return π else if s isn’t in S π then A := (s, K) if A is empty then return failure nondeterministically choose a A π := π {(s,a)}; s := (s,a) else return failure
21Nau: Univ of Alberta, 2004 Bookkeeping l Bookkeeping to generate graphs rather than paths u S = {nodes that have been generated but not yet explored} u solved = {nodes from which we know we can get to a solution} s0 s1 s3 a Procedure ND-FCP (S 0, g, K) π := ; S := S 0 ; solved := loop if S = then return π select s in S and remove it from S if s satisfies g then put s into solved else if s isn’t in S π then A := (s, K) if A is empty then return failure nondeterministically choose a A π := π {(s,a)}; S := S (s,a) else if s has no descendants in (S solved) – S π then return failure
22Nau: Univ of Alberta, 2004 l A node s is unsolvable in the following cases: u s is a dead end, u s is part of a cycle from which there is no escape, u every descendant of s is unsolvable l This happens if s has no descendants in (S solved) – S π Failure Detection s0 s1 s3 a0 a3 a1 s2 s6 a2 s4 s5 Procedure ND-FCP (S 0, g, K) π := ; S := S 0 ; solved := loop if S = then return π select s in S and remove it from S if s satisfies g then put s into solved else if s isn’t in S π then A := (s, K) if A is empty then return failure nondeterministically choose a A π := π {(s,a)}; S := S (s,a) else if s has no descendants in (S solved) – S π then return failure
23Nau: Univ of Alberta, 2004 Formal Properties l Several planning algorithms are instances of FCP u TLPlan, TALplanner, SHOP2, etc. u Only difference: what is l Nondeterminizing FCP preserves , so it works on any instance of FCP u ND-TLPlan, ND-TALplanner, ND-SHOP2, etc. l Nondeterminizing them preserves soundness, completeness, time complexity u Details on the next few slides Procedure ND-FCP (S 0, g, K) π := ; S := S 0 ; solved := loop if S = then return π select s in S and remove it from S if s satisfies g then put s into solved else if s isn’t in S π then A := (s, K) if A is empty then return failure nondeterministically choose a A π := π {(s,a)}; S := S (s,a) else if s has no descendants in (S solved) – S π then return failure
24Nau: Univ of Alberta, 2004 Nondeterministic Versions of Operators and Domains l Nondeterministic version of an operator o u Same as o except that it may have additional possible outcomes u Failures, exogenous events, etc. l Nondeterministic version of a domain D u The operators are nondeterministic versions of the ones in D a c b Grasp block c a c b Intended outcome abc Unintended outcome
25Nau: Univ of Alberta, 2004 Formal Properties l Nondeterminizing an algorithm preserves its soundness and completeness u Let P be any planning algorithm that’s an instance of FCP u Let ND-P be the nondeterminization of P u Let D be any classical planning domain u Let D’ be any nondeterministic version of D l If P is sound/complete on D, then ND-P is sound/complete on D’ l Nondeterminizing an algorithm preserves its time complexity (as a function of its output) u Let T P (n) and T ND-P (n) be the running times of P and ND-P, where n = size of the solution found u Then T ND-P (n) is polynomially bounded by T P (n) »(Details on next slide)
26Nau: Univ of Alberta, 2004 Time-Complexity Theorem l P = an instance of FCP; D = a classical domain l Suppose P’s time complexity is O(f(| |)), where f is monotonic l D = a nondeterministic version of D u ND-P’s time complexity is O(p(f(| |))) l Caveat: π may be exponentially larger than π s0s0 s1s1 s2s2 s3s3 a0a0 a1a1 a2a2 Initial State Goal State s0s0 s2s2 s3s3 s4s4 s1s1 s5s5 Initial States Goal States a1a1 a1a1 a2a2 a0a0
27Nau: Univ of Alberta, 2004 Special Case l Suppose that P runs in polynomial time and ND-P produces solutions of polynomial size l Then ND-P runs in polynomial time l Example: Blocks World u Given the appropriate domain knowledge »TALplanner, TLplan and SHOP2 solve Blocks-World problems in polynomial time »ND-TALplanner, ND-TLplan, and ND-SHOP2 produce solutions of polynomial size u With this domain knowledge, »ND-TALplanner, ND-TLplan, and ND-SHOP2 solve nondeterministic-BW problems in polynomial time
28Nau: Univ of Alberta, 2004 Experimental Verification l Implementation of ND-SHOP2 l Compare with MBP (Bertoli et al., 2001) u The best-known planner for nondeterministic domains u Based on symbolic model-checking l Two experimental domains u Robot-Navigation (Kabanza et al., 1997) »The e. coli of research on planning with nondeterminism u Nondeterministic Blocks-World
29Nau: Univ of Alberta, 2004 Robot Navigation Domain l Adapted from (Kabanza et al., 1997) u Rooms, doors, hallway u Robot can open/close doors, move packages to other rooms u Objective: move packages to their destinations u A kid runs around and randomly opens/closes doors »Robot may need to re-open a door repeatedly to go through l Experimental Setup u Kid doors: k = 1, …, 7 u Packages: n = 1, …, 5 u 20 randomly-generated problems for each combination of n, k
30Nau: Univ of Alberta, 2004 Varying the problem size
31Nau: Univ of Alberta, 2004 Varying the amount of nondeterminism
32Nau: Univ of Alberta, 2004 Nondeterministic Blocks World l Traditional Blocks-World operators: u pickup, putdown, stack, unstack l Actions may have unintended outcomes u e.g., drop a block on the table l Experimental Setup u vary number of blocks from 3 to 10 u 20 randomly-generated problems for each case a c b Grasp block c a c b Intended outcome abc Unintended outcome
33Nau: Univ of Alberta, 2004 Varying the problem size
34Nau: Univ of Alberta, 2004 Complexity Analysis l Complexity analysis shows MBP running in exponential time and ND-SHOP2 running in time O(n 5 ) l To see why, need to understand how MBP and ND-SHOP2 work
35Nau: Univ of Alberta, 2004 Representing Policies l A policy π is a partial function from states into actions π(s 0 ) = a 0, π(s 1 ) = a 1, π(s 2 ) = a 1, π(s 3 ) = a 2 l Can use a symbolic representation roughly like this: if in(r 4 ) and holding(b) and door-closed(r 4 ) then π(s) = open-door(r 4 ) if in(r 4 ) and holding(b) and door-open(r 4 ) then π(s) = go(r 4, hall) u Each state description ignores all doors other than d 4 u Includes an exponential number of states l Both MBP and ND-SHOP2 use symbolic representations of policies u Can write polynomial-size policies for exponentially large state spaces
36Nau: Univ of Alberta, 2004 How MBP Generates Policies l MBP uses model-checking techniques u e.g., computing pre-images of sets of states u Roughly like a breadth-first backward search l MBP may need to explore exponentially many states that are unreachable from the initial state u Exponentially many states => exponential time u That’s what happens in the robot navigation and nondeterminized blocks world domains
37Nau: Univ of Alberta, 2004 How ND-SHOP2 Generates Policies l ND-SHOP2 takes domain knowledge in the form of HTN methods u Method m1 Task: take-package (p, r, hall) Precond: in(r), holding(p), door-open(r) Subtasks: go(r, hall) u Method m2 Task: take-package(p, r, hall) Precond: in(r), holding(p), door-closed(r) Subtasks: open-door(r), go(r, hall) l Consider the task take-package(b, r 4, hall) l ND-SHOP can very quickly develop the policy if in(r 4 ) and holding(b) and door-closed(r 4 ) then π(s) = open-door(r 4 ) if in(r 4 ) and holding(b) and door-open(r 4 ) then π(s) = go(r 4, hall)
38Nau: Univ of Alberta, 2004 Conclusions l A technique for “nondeterminization” of forward-chaining classical planner l Theoretical analysis u Nondeterminization preserves soundness/completeness u Time complexity of the generalized planners is polynomially bounded by the time complexity of the original ones l Experimental verification of the results
39Nau: Univ of Alberta, 2004 Future Work l Nondeterministic planning domains are just like MDPs except that there are no probabilities l We are quite confident that u We can generalize our approach to work in MDPs too u Our “MDP-ized” algorithms will be able to run exponentially faster than traditional MDP algorithms l Preliminary implementation and experiments u So far, very encouraging
40Nau: Univ of Alberta, 2004 l M. Ghallab, D. Nau, and P. Traverso, Automated Planning: Theory and Practice (Morgan Kaufmann, May 2004) l First comprehensive textbook on automated planning u models, techniques, algorithms u case studies of applications Web site: u Lecture slides available online Related Work