Presentation is loading. Please wait.

Presentation is loading. Please wait.

4/1: Search Methods and Heuristics  Progression: Sapa (TLPlan; FF)  Regression: TP4  Partial order: Zeno (IxTET)

Similar presentations


Presentation on theme: "4/1: Search Methods and Heuristics  Progression: Sapa (TLPlan; FF)  Regression: TP4  Partial order: Zeno (IxTET)"— Presentation transcript:

1 4/1: Search Methods and Heuristics  Progression: Sapa (TLPlan; FF)  Regression: TP4  Partial order: Zeno (IxTET)

2 Reading List  (3/27)Papers on Metric Temporal Planning  Paper on PDDL-2.1 standard (read up to--not including- -section 6) Paper on PDDL-2.1 standard (read up to--not including- -section 6)  Paper on SAPA Paper on SAPA  Paper on Temporal TLPlan (see Section 3 for a slightly longer description of the progression search used in SAPA). (regression search for Temporal Planning Paper on Temporal TLPlan (see Section 3 for a slightly longer description of the progression search used in SAPA). (regression search for Temporal Planning  Paper on TP4 (regression search for Temporal Planning Paper on TP4 (regression search for Temporal Planning  Paper on Zeno (Plan-space search for Temporal Planning) Paper on Zeno (Plan-space search for Temporal Planning)

3 State-Space Search: Search is through time-stamped states Search states should have information about -- what conditions hold at the current time slice (P,M below) -- what actions have we already committed to put into the plan ( ,Q below) S=(P,M, ,Q,t) Set of predicates pi and the time of their last achievement t i < t. Set of functions represent resource values. Set of protected persistent conditions (could be binary or resource conds). Event queue (contains resource as well As binary fluent events). Time stamp of S. In the initial state, P,M, non-empty Q non-empty if we have exogenous events

4 Let current state S be P:{have_light@0; at_steps@0}; Q:{~have_light@15} t: 0 (presumably after doing the light-candle action) Applying cross_cellar to this state gives S’= P:{have_light@0; crossing@0};  :{have_light, } Q:{at_fuse-box@10;~have_light@15} t: 0 Light-match Cross-cellar 15 10 Time-stamp

5 “Advancing” the clock as a device for concurrency control  To support concurrency, we need to consider advancing the clock  How far to advance the clock?  One shortcut is to advance the clock to the time of the next earliest event event in the event queue; since this is the least advance needed to make changes to P and M of S.  At this point, all the events happening at that time point are transferred from Q to P and M (to signify that they have happened)  This  This strategy will find “a” plan for every problem—but will have the effect of enforcing concurrency by putting the concurrent actions to “align on the left end”  In the candle/cellar example, we will find plans where the crossing cellar action starts right when the light-match action starts  If we need slack in the start times, we will have to post-process the plan  If we want plans with arbitrary slacks on start-times to appears in the search space, we will have to consider advancing the clock by arbitrary amounts (even if it changes nothing in the state other than the clock time itself). Light-match Cross-cellar ~have-light 15 10 In the cellar plan above, the clock, If advanced, will be advanced to 15, Where an event (~have-light will occur) This means cross-cellar can either be done At 0 or 15 (and the latter makes no sense) Cross-cellar

6 Search Algorithm (cont.)  Goal Satisfaction : S=(P,M, ,Q,t)  G if   G either:    P, t j < t i and no event in Q deletes p i.   e  Q that adds p i at time t e < t i.  Action Application : Action A is applicable in S if:  All instantaneous preconditions of A are satisfied by P and M.  A’s effects do not interfere with  and Q.  No event in Q interferes with persistent preconditions of A.  A does not lead to concurrent resource change  When A is applied to S:  P is updated according to A’s instantaneous effects.  Persistent preconditions of A are put in   Delayed effects of A are put in Q. S=(P,M, ,Q,t) [TLplan; Sapa; 2001]

7 Regression Search is similar…  In the case of regression over durative actions too, the main generalization we need is differentiating the “advancement of clock” and “application of a relevant action”  Can use same state representation S=(P,M, ,Q,t) with the semantics that  P and M are binary and resource subgoals needed at current time point  Q are the subgoals needed at earlier time points   are subgoals to be protected over specific intervals  We can either add an action to support something in P or Q, or push the clock backward before considering subgoals  If we push the clock backward, we push it to the time of the latest subgoal in Q  TP4 uses a slightly different representation (with State and Action information) [TP4; 1999] A2:X A3:W A1:Y Q RWXyRWXy We can either work On R at t inf or R and Q At t inf -D(A 3 )

8 Let current state S be P:{at_fuse_box@0} t: 0 Regressing cross_cellar over this state gives S’= P:{};  :{have_light, } Q:{have_light@ -10;at_stairs@-10} t: 0 Cross_cellar Have_light This example changed since the class Notice that in contrast to progression, Regression will align the end points of Concurrent actions…(e.g. when we put in Light-match to support have-light)

9 S’= P:{};  :{have_light, } Q:{have_light@-10;at_stairs@-10} t: 0 If we now decide to support the subgoal in Q Using light-match S’’=P:{} Q:{have-match@-15;at_stairs@-10}  :{have_light, } t: 0 Cross_cellar Have_light Notice that in contrast to progression, Regression will align the end points of Concurrent actions…(e.g. when we put in Light-match to support have-light) Cross_cellar Have_light Light-match

10 PO (Partial Order) Search [Zeno; 1994] Split the Interval into Multiple overlapping intervals Involves Posting temporal Constraints, and Durative goals Involves LPsolving over Linear constraints (temporal constraints Are linear too); Waits for nonlinear constraints To become linear.

11 More on Temporal planning by plan-space planners (Zeno)  The “accommodation” to complexity that Zeno makes by refusing to handle nonlinear constraints (waiting instead until they become linear) is sort of hilarious given it doesn’t care much about heuristic control otherwise  Basically Zeno is trying to keep the “per-node” cost of the search down (and if you do nonlinear constraint consistency check, even that is quite hard)  Of course, we know now that there is no obvious reason to believe that reducing the per-node cost will, ipso facto, also lead to reduction in overall search.  The idea of “goal reduction” by splitting a temporal subgoal to multiple sub- intervals is used only in Zeno, and helps it support a temporal goal over a long duration with multiple actions. Neat idea.  Zeno doesn’t have much of a problem handling arbitrary concurrency—since we are only posting constraints on temporal variables denoting the start points of the various actions. In particular, Zeno does not force either right or left alignment of actions.  In addition to Zeno, IxTeT is another influential metric temporal planner that uses plan-space planning idea.

12 at_fuse_box@G} Cross_cellar GI At_fusebox Have_light@ t1 t2 t2-t1 =10 t1 < tG tI < t1 Have_light@t1

13 at_fuse_box@G} Cross_cellar GI At_fusebox Have_light@ t1 t2 t2-t1 =10 t1 < tG tI < t1 T4<tG T4-t3=15 T3<t1 T4<t3 V t1<t4 Have_light@t1 Burn_match t3t4 ~have-light The ~have_light effect at t4 can violate the causal link! Resolve by Adding T4<t3 V t1<t4

14 at_fuse_box@G} Cross_cellar GI At_fusebox Have_light@ t1 t2 t2-t1 =10 t1 < tG tI < t1 T4<tG T4-t3=15 T3<t1 T4<t3 V t1<t4 T3<t2 T4<t3 V t2<t4 Have_light@t1 Burn_match t3t4 ~have-light To work on have_light@, we can either --support the whole interval directly by adding a causal link > --or first split to two subintervals and work on supporting have-light on both intervals Notice that zeno allows arbitrary slack between the two actions

15 4/3 Discussion of the Sapa/Tp4/Zeno search algorithms Heuristics for temporal planning

16 Q/A on Search Methods for Temporal Planning  Menkes: What is meant by the argument that resources are always easy to handle for progression planners?  The idea is that the partial plans in the search space of a progression planner are “position constrained”—so you know exactly when each action starts. Given then, it is a simple matter to check if a particular resource constraint (however complicated and nonlinear) holds over a time point or interval. In contrast, partial order planners only have constraints on the start points. So, checking that a resource constraint is valid involves checking that it holds on every possible assignment of times to the temporal variables. The difference is akin to the difference between model checking and theorem proving [Halpern & Vardi; KR91] (you can check the consistency of more complicated formulas in more complicated logics if you only need to do model- checking rather than inference/theorem proving

17 Q/A contd.  Dan: Can the “interval goal reduction” used in Zeno be made more goal directed?  Yes. For example, regressing a goal have_light@[1 15] over an action that gives have_ligth@[1 7] will make it have_light@[7 15]  Making the reduction goal directed may be actually a smarter idea (especially for position constrained planners—for zeno, it doesn’t make much difference since it splits the interval into two variable-sized intervals.

18 Q/A contd  Romeo: TL Plan paper says that their strategy is to keep adding concurrent actions until no more actions can be added at the current point, and only then advance the clock. Is this used in SAPA too?  Rao: I am surprised to hear that TLPlan does that. If this is used as a “strategy” rather than as a “heuristic”, then it can lead to loss of completeness. In general, we just because an action can be done doesn’t mean that it should be done.  For example, consider a problem where you want a goal G. Ultimately, all actions that give G wind up requiring, among other conditions, the condition P*. P* is present in the init state. There is an action A that deletes P* and no action gives P*. It is applicable in the init state and doesn’t interfere with ANY of the other actions. Now, if we put A in the plan, just because it can be done concurrently, then we know we are doomed.  I (Rao) made this mistake in my ECP-97 paper on Graphplan (see Footnote 2 in http://rakaposhi.eas.asu.edu/pub/rao/ewsp-graphplan.ps), and figured out my error later http://rakaposhi.eas.asu.edu/pub/rao/ewsp-graphplan.ps

19 Tradeoffs: Progression/Regression/PO Planning for metric/temporal planning  Compared to PO, both progression and regression do a less than fully flexible job of handling concurrency (e.g. slacks may have to be handled through post-processing).  Progression planners have the advantage that the exact amount of a resource is known at any given state. So, complex resource constraints are easier to verify. PO (and to some extent regression), will have to verify this by posting and then verifying resource constraints.  Currently, SAPA (a progression planner) does better than TP4 (a regression planner). Both do oodles better than Zeno/IxTET. However  TP4 could be possibly improved significantly by giving up the insistence on admissible heuristics  Zeno (and IxTET) could benefit by adapting ideas from RePOP.

20 Heuristic Control Temporal planners have to deal with more branching possibilities  More critical to have good heuristic guidance Design of heuristics depends on the objective function Classical Planning Number of actions Parallel execution time Solving time Temporal Resource Planning Number of actions Makespan Resource consumption Slack …….  In temporal Planning heuristics focus on richer obj. functions that guide both planning and scheduling

21 Objectives in Temporal Planning  Number of actions: Total number of actions in the plan.  Makespan: The shortest duration in which we can possibly execute all actions in the solution.  Resource Consumption: Total amount of resource consumed by actions in the solution.  Slack: The duration between the time a goal is achieved and its deadline.  Optimize max, min or average slack values  Combinations there-of

22 Deriving heuristics for SAPA We use phased relaxation approach to derive different heuristics Relax the negative logical and resource effects to build the Relaxed Temporal Planning Graph Pruning a bad state while preserving the completeness. Deriving admissible heuristics: –To minimize solution’s makespan. –To maximize slack-based objective functions. Find relaxed solution which is used as distance heuristics Adjust the heuristic values using the negative interaction (Future work) Adjust the heuristic values using the resource consumption Information. [AltAlt,AIJ2001]

23 Heuristics in Sapa are derived from the Graphplan-style bi-level relaxed temporal planning graph (RTPG) Progression; so constructed anew for each state..

24 Relaxed Temporal Planning Graph Relaxed Action:  No delete effects  May be okay given progression planning  No resource consumption  Will adjust later Person Airplane Person AB Load(P,A) Fly(A,B)Fly(B,A) Unload(P,A) Unload(P,B) Init Goal Deadline t=0tgtg while(true) forall A  advance-time applicable in S S = Apply(A,S) Involves changing P, ,Q,t {Update Q only with positive effects; and only when there is no other earlier event giving that effect} if S  G then Terminate{solution} S’ = Apply(advance-time,S) if  (p i,t i )  G such that t i < Time(S’) and p i  S then Terminate{non-solution} else S = S’ end while; Deadline goals

25 Details on RTPG Construction  All our heuristics are based on the relaxed temporal planning graph structure (RTPG). This is a Graphplanstyle[ 2] bi-level planning graph generalized to temporal domains. Given a state S = ( P;M;  ¦ ; Q; t ), the RTPG is built from S using the set of relaxed actions, which are generated from original actions by eliminating all effects which (1) delete some fact (predicate) or (2) reduce the level of some resource. Since delete effects are ignored, RTPG will not contain any mutex relations, which considerably reduces the cost of constructing RTPG. The algorithm to build the RTPG structure is summarized in Figure 4.  To build RTPG, we need three main datastructures: a fact level, an action level, and an unexecuted event queue  Each fact f or action A is marked in, and appears in the RTPG’s fact/action level at time instant tf / tA if it can be achieved/executed at tf / tA.  In the beginning, only facts which appear in P are marked in at t, the action level is empty, and the event queue holds all the unexecuted events in Q that add new predicates.  Action A will be marked in if (1) A is not already marked in and (2) all of A ’s preconditions are marked in. When action A is in, then all of A ’s unmarked instant add effects will also be marked in at t.  Any delayed effect e of A that adds fact f is put into the event queue Q if (1) f is not marked in and (2) there is no event e0 in Q that is scheduled to happen before e and which also adds f. Moreover, when an event e is added to Q, we will take out from Q any event e0 which is scheduled to occur after e and also adds f.  When there are no more unmarked applicable actions in S, we will stop and return no-solution if either (1) Q is empty or (2) there exists some unmarked goal with a deadline that is smaller than the time of the earliest event in Q.  If none of the situations above occurs, then we will apply advance-time action to S and activate all events at time point te0 of the earliest event e’ in Q.  The process above will be repeated until all the goals are marked in or one of the conditions indicating non- solution occurs. [From Do & Kambhampati; ECP 01]

26 Heuristics directly from RTPG  For Makespan: Distance from a state S to the goals is equal to the duration between time(S) and the time the last goal appears in the RTPG.  For Min/Max/Sum Slack: Distance from a state to the goals is equal to the minimum, maximum, or summation of slack estimates for all individual goals using the RTPG.  Slack estimate is the difference between the deadline of the goal, and the expected time of achievement of that goal. Proof: All goals appear in the RTPG at times smaller or equal to their achievable times. A D M I S S I B L E

27 Heuristics from Relaxed Plan Extracted from RTPG RTPG can be used to find a relaxed solution which is then used to estimate distance from a given state to the goals Sum actions: Distance from a state S to the goals equals the number of actions in the relaxed plan. Sum durations: Distance from a state S to the goals equals the summation of action durations in the relaxed plan. Person Airplane Person AB Load(P,A) Fly(A,B)Fly(B,A) Unload(P,A) Unload(P,B) Init Goal Deadline t=0tgtg

28 Resource-based Adjustments to Heuristics Resource related information, ignored originally, can be used to improve the heuristic values Adjusted Sum-Action: h = h +  R  (Con(R) – (Init(R)+Pro(R)))/  R  Adjusted Sum-Duration: h = h +  R [(Con(R) – (Init(R)+Pro(R)))/  R ].Dur(A R )  Will not preserve admissibility

29 Aims of Empirical Study  Evaluate the effectiveness of the different heuristics.  Ablation studies:  Test if the resource adjustment technique helps different heuristics.  Compare with other temporal planning systems.

30 Empirical Results Adjusted Sum-Action Sum-Duration Probtime#actnodesdurtime#actnodesdur Zeno10.317514/483200.35520/67320 Zeno254.3723188/1303950---- Zeno329.7313250/12214306.201360/289450 Zeno913.0113151/79359098.66134331/5971460 Log11.511627/15710.01.811633/19210.0 Log282.0122199/159218.8738.432261/50518.87 Log310.251230/21511.75---- Log9116.093291/83026.25---- Sum-action finds solutions faster than sum-dur Admissible heuristics do not scale up to bigger problems Sum-dur finds shorter duration solutions in most of the cases Resource-based adjustment helps sum-action, but not sum-dur Very few irrelevant actions. Better quality than TemporalTLPlan. So, (transitively) better than LPSAT

31 Empirical Results (cont.) Logistics domain with driving restricted to intra-city (traditional logistics domain) Sapa is the only planner that can solve all 80 problems

32 Empirical Results (cont.) The “sum-action” heuristic used as the default in Sapa can be mislead by the long duration actions... Logistics domain with inter-city driving actions  Future work on fixed point time/level propagation

33 Multi-objective search  Multi-dimensional nature of plan quality in metric temporal planning:  Temporal quality (e.g. makespan, slack)  Plan cost (e.g. cumulative action cost, resource consumption)  Necessitates multi-objective optimization:  Modeling objective functions  Tracking different quality metrics and heuristic estimation  Challenge: There may be inter-dependent relations between different quality metric Next Class:


Download ppt "4/1: Search Methods and Heuristics  Progression: Sapa (TLPlan; FF)  Regression: TP4  Partial order: Zeno (IxTET)"

Similar presentations


Ads by Google