3/4  The slides on quotienting were added after the class to reflect the white-board discussion in the class.

3/4  The slides on quotienting were added after the class to reflect the white-board discussion in the class

Thoughts on Candidate Set semantics for Temporal Planning Doing Temporal Planning Correctly [In search of Complete Position-Constrained Planner]

Need  Talking about “complete” and “completely optimal” seems to make little sense unless we first define the space over which we want completeness  Qn: What is the space over which candidate set of a temporal plan is defined?  For classical planning, we know it is over “action sequences”  Interestingly, even partial-order planners are essentially aiming for completeness over these action sequences

Dispatches as candidates  We can define candidate sets in terms of “dispatches”  A dispatch is a set of 3 tuples { } where  a is a ground (durative) action  Sa is the start time for the action a  Ea is the end time for the action a  For fixed duration actions, ea is determined given sa  Completeness, optimality etc should be defined over these dispatches eventually..

Quotient spaces  The space of dispatches is “dense” when you have real valued time points  It is more convenient to think of search in terms of quotient spaces defined over the space of dispatches  In fact, it seems necessary that we search in quotient spaces for temporal planning (especially with real-valued time)  Since we want the complexity of planning be somehow related to the number of actions in the plan, and not on their durations(?)  A quotient space essentially involves setting up disjoint equivalence classes over the base space  SNLP’s partial plans actually set up a quotient space over the ground operator sequences (otherwise, the space of partially ordered plans will be much larger than the space of sequences..)  There are multiple ways of setting up quotient spaces over dispatches  You can discuss completeness of any planner w.r.t. any legal quotient space.  But.. Some quotient spaces may be more natural to discuss some planners…

Start/End point permutations (SEPP)  One quotient space over dispatches is to consider the space of permutations over the start and end points of actions  Specifically, we consider the space of sequences over the alphabet {as ae} over all actions where:  If the sequence contains as, it must contain ae (and vice versa)  as must come before ae in the sequence  If the sequence contains end points of two actions a1 and a2, then their order must not violated durations of the actions  If d(a1)< d(a2), then we can’t have..a1s…a2s…a2e..a1e.. In the sequence  Note that each element of SEPP space is a representative for a possibly infinite number of dispatches  Completeness over the SEPP space is a necessary condition for completeness over dispatch space

POP space  The space of partially ordered causal link plans that VHPOP/Zeno search in should be seen as quotienting further over the SEPP space  Similar to the way SNLP plans can be seen as quotienting over the action sequences.

SAPA-space?  Another way of setting up a quotient space over dispatches is to consider specific dispatches themselves as the prototypes of an equivalence class of dispatches

Prototype-based quotient spaces  SAPA seems to be easiest to understand in terms of associating a specific dispatch as the representative of a set of dispatches  It then only searches over these dispatches ..so it will be incomplete if the optimal solution of a problem is not in the space of these canonical dispatches  The basic result of [Cushing et al 2007] can be understood as saying that there is no easy way to set up a finite set of representative dispatches that will be complete for all problems  This, I believe, is the lesson of the failed quest for complete DEP planners  Left-shifted plans as representatives?

Quotient Space & Navigation??  Sapa can be understood as  Trying to navigate in a quotient space of left-shifted dispatches  But with an incomplete navigational strategy  Navigation is being effected through epochs  Our inability to find a good epoch-based navigation seems to suggest that there is no natural way to navigate this space?

Left-shifted plans  Two plans are equivalent if they have the same happening sequence  The canonical representation

Mid-term Feedback..  9 out of 12 gave feedback. I will post them all un-edited.  People generally happy (perhaps embarrassingly happy) with the way the class is going  One person said it is all too overwhelming and the pace and coverage should be reduced significantly  Readings: A mixture of reading before and after.  Homeworks: Majority seem happy that they force them to re-read the paper. There seems to be little support for “more” homework  One person said they should be more challenging and go beyond readings.  Semester project: Majority seem to be getting started; and want to spend time on “their” project rather than homeworks etc.  Interactivity: People think there is enough discussion (I beg to disagree—but I am just an instructor).  One person thought that there should be more discssion--and suggested design of more incentives for discussion  (sort of like the blog discussion requirement) 

Temporal Constraints

 Qualitative  Interval constraints (and algebra)  Point constraints (and algebra)  Metric constraints  Best seen as putting distance ranges over time points General temporal constraint reasoning is NP-hard. Tractable subclasses exist. Hybrid: allow qualitative and quantitative constraints Most temporal constraint formalisms model only binary constraints

Tradeoffs: Progression/Regression/PO Planning for metric/temporal planning  Compared to PO, both progression and regression do a less than complete job of handling concurrency (e.g. slacks may have to be handled through post-processing).  Progression planners have the advantage that the exact amount of a resource is known at any given state. So, complex resource constraints are easier to verify. PO (and to some extent regression), will have to verify this by posting and then verifying resource constraints.  Currently, SAPA (a progression planner) does better than TP4 (a regression planner). Both do oodles better than Zeno/IxTET. However  TP4 could be possibly improved significantly by giving up the insistence on admissible heuristics  Zeno (and IxTET) could benefit by adapting ideas from RePOP.

Interleaving-Space: TEMPO  Delay dispatch decisions until afterwards  Choose  Start an action  End an action  Make a scheduling decision  Solve temporal constraints  Temporally Simple  Complete, Optimal  Temporally Expressive  Complete, Optimal Salvaging State-space Temporal Planning light fix match fuse fix light fuse fix light match fuse fix light

Y Qualitative Temporal Constraints (Allen 83)  x before y  x meets y  x overlaps y  x during y  x starts y  x finishes y  x equals y XY XY XY YX YX YX X  y after x  y met-by x  y overlapped-by x  y contains x  y started-by x  y finished-by x  y equals x

Intervals can be handled directly  The 13 in the previous page are primitive relations. The relation between a pair of intervals may well be a disjunction of these primitive ones:  A meets B OR A starts B  There are “transitive” axioms for computing the relations between A and C, given the relations between A and B & B and C  A meets B & B starts C => A starts C  A starts B & B during C => ~ [C before A]  Using these axioms, we can do constraint propagation directly on interval relations; to check for tight relations among any given pair of relations (as well as consistency of a set of relations)  Allen’s Interval Algebra  Intervals can also be handled in terms of their start and end points. This latter is what we will see next.

Example: Deep Space One Remote Agent Experiment Max_Thrust Idle Poke Timer Attitude Accum SEP Action SEP_Segment

Qualitative Temporal Constraints Maybe Expressed as Inequalities (Vilain, Kautz 86)  x before yX + < Y -  x meets yX + = Y -  x overlaps y(Y - < X + ) & (X - < Y + )  x during y (Y - < X - ) & (X + < Y + )  x starts y(X - = Y - ) & (X + < Y + )  x finishes y(X - < Y - ) & (X + = Y + )  x equals y(X - = Y - ) & (X + = Y + ) Inequalities may be expressed as binary interval relations: X + - Y - < [-inf, 0]

Metric Constraints  Going to the store takes at least 10 minutes and at most 30 minutes. → 10 < [T + (store) – T - (store)] < 30  Bread should be eaten within a day of baking. → 0 < [T + (baking) – T - (eating)] < 1 day  Inequalities, X + < Y -, may be expressed as binary interval relations: → - inf < [X + - Y - ] < 0

Metric Time: Quantitative Temporal Constraint Networks (Dechter, Meiri, Pearl 91)  A set of time points X i at which events occur.  Unary constraints (a 0 < X i < b 0 ) or (a 1 < X i < b 1 ) or...  Binary constraints (a 0 < X j - X i < b 0 ) or (a 1 < X j - X i < b 1 ) or... Not n-ary constraints STN (simple temporal network) is a TCN that has no disjunctive constraints (each constraint has one interval)

TCSP Are Visualized Using Directed Constraint Graphs 13 42 0 [10,20] [30,40] [60,inf] [10,20] [20,30] [40,50] [60,70]

Digression: the less-than-fully-rational bias for binary CSP problems in CSP community  Much work in CSP community (including temporal networks) is directed at “binary” CSPs—i.e. csps where all the constraints are between exactly 2 variables.  E.g. Arc-consistency, Conflit-directed-backjumping etc are only clearly articulated for binary CSPs first. Temporal networks studied in Dechter et al are all binary.  Binary CSPs are a “canonical” subset of CSP—any n-ary CSP can be compiled into a binary CSP by introducing additional (hidden) variables. The conversion is not always good  [Bacchus and Vanbeek, 98] provides a tradeoff analysisBacchus and Vanbeek, 98]  The ostensible reason for the interest in binary CSPs is ostensibly that most naturally occuring constraints are between 2-entities.  A less charitable characterization is that the constraint graphs in binary CSPs are normal graphs so they can be analyzed better  The constraint graphs in n-ary CSPs will be “hyper graphs” (edges are between sets of vertices)  In the case of temporal networks that will arise in planning,even for simple constraints caused by causal threats, the disjunctive constraint that is posted is a 3-ary constraint (between threat, producer and consumer)—not a binary one  If you split the disjunction into the search space however, we will get two Simple temporal networks that are both binary.

TCSPs vs CSPs  TCSP is a subclass of CSPs with some important properties  The domains of the variables are totally ordered  The domains of the variables are continuous  Most queries on TCSPs would involve reasoning over all solutions of a TCSP (e.g. earliest/latest feasible time of a temporal variable)  Since there are potentially an infinite number of solutions to a TCSP, we need to find a way of representing the set of all solutions compactly  Minimal TCSP network is such a representation

TCSP Queries (Dechter, Meiri, Pearl, AIJ91)  Is the TCSP consistent? Planning  What are the feasible times for each X i ?  What are the feasible durations between each X i and X j ?  What is a consistent set of times? Scheduling  Dispatch  What are the earliest possible times? Scheduling  What are the latest possible times? All of these can be done if we compute the minimal equivalent network

Constraint Tightness & Minimal Networks  A TCSP N1 is considered minimal network if there is no other network N2 that has the same solutions as N1, and has at least one tighter constraint than N1  Tightness means there are fewer valid composite labels for the variables. This has nothing to do with the “syntactic complexity” of the constraint  A Constraint a[ 1 3]b is tighter than a constraint a[0 10]b  A constraint a[1 1.5][1.6 1.9][1.9 2.3] [2.3 4.8] [5 6]b is tighter than a constraint a[0 10]b  Computation of minimal networks, in general, involves doing two operations:  Intersection over constraints  Composition over constraints  For each path p in the network, connecting a pair of nodes a and b, find the path constraint between a and b (using composition)  Intersect all the constraints between a pair of nodes a and b to find the tightest constraint between a and b  Can lead to “fragmentation of constraints” in the case of disjunctive TCSPs…

Union/Composition/Intersection of Temporal Constraints

Operations on Constraints: Intersection And Composition Compose [10,20] with [30,40][60,inf] to get constraint between 0 and 3

An example where minimal network is different from the original one. 13 0 [10,20][30,40] [0,100] 13 0 [10,20][30,40] [0,100] [40,60] To compute the constraint between 0 and 3, we first compose [10,20] and [30,40] to get [40,60] we then intersect [40,60] and [0,100] to get [40,60]

Computing Minimal Networks Using Path Consistency  Minimal networks for TCSPs can be computed by ensuring “path consistency”  For each triple of vertices i,j,k  C(i,k) := C(i,k).intersection. [C(i,j).compose. C(j,k)]  For STP’s we are guaranteed to reach fixpoint by the time we visit each constraint once  I.e., outerloop executes only once.  For Disjunctive TCSPs, enforcing path consistency is NP-hard  Shouldn’t be surprising… consistency of disjunctive precedence constraints is NP- hard  “Fragmentation” happens  Approximation schemes possible

Solving Disjunctive TCSPs: Split disjunction  Suppose we have a TCSP, where just one of the constraints is dijunctive: a [1 2][5 6] b  We have two STPs one in which the constraint a[1 2]b is there and the other contains a[5 6]b  Disjunctive TCSP’s can be solved by solving the exponential number of STPs  Minimal network for DTP is the union of minimal networks for the STPs  This is a brute-force method; Exponential number of STPs— many of which have significant overlapping constraints.

To Query an STN Map to a Distance Graph G d = 70 13 42 0 20 50 -10 40 -30 20 -10 -40 -60 13 42 0 [10,20][30,40] [10,20] [40,50] [60,70] T ij = (a ij  X j - X i  b ij ) X j - X i  b ij X i - X j  - a ij Edge encodes an upper bound on distance to target from source.

G d Induces Constraints  Path constraint: i 0 =i, i 1 =..., i k = j → Conjoined path constraints result in the shortest path as bound: where d ij is the shortest path from i to j

Conjoined Paths are Computed using All Pairs Shortest Path (e.g., Floyd-Warshall’s algorithm ) 1. for i := 1 to n do d ii 0; 2. for i, j := 1 to n do d ij a ij ; 3. for k := 1 to n do 4. for i, j := 1 to n do 5. d ij min{d ij, d ik + d kj }; i k j

d-graph Shortest Paths of G d 70 12 43 0 20 50 -10 40 -30 20 -10 -40 -60

STN Minimum Network d-graphSTN minimum network

Testing Plan Consistency d-graph 70 12 43 0 20 50 -10 40 -30 20 -10 -40 -60 No negative cycles: -5 > T A – T A = 0

Latest Solution 70 12 43 0 20 50 -10 40 -30 20 -10 -40 -60 d-graph Node 0 is the reference.

Earliest Solution 70 12 43 0 20 50 -10 40 -30 20 -10 -40 -60 d-graph Node 0 is the reference.

Solution: Earliest Times 70 13 42 0 20 50 -10 40 -30 20 -10 -40 -60 S 1 = (-d 10,..., -d n0 )

Scheduling: Feasible Values d-graph X 1 in [10, 20] X 2 in [40, 50] X 3 in [20, 30] X 4 in [60, 70] Latest Times Earliest Times

Scheduling without Search: Solution by Decomposition d-graph Select value for 1 à 15[10,20]

Solution by Decomposition d-graph Select value for 2, consistent with 1 à 45 [40,50], 15+[30,40] Select value for 1 à 15

Solution by Decomposition d-graph Select value for 2, consistent with 1 à 45[45,50] Select value for 1 à 15

Solution by Decomposition d-graph Select value for 2, consistent with 1 à 45 [45,50] Select value for 1 à 15

Solution by Decomposition d-graph Select value for 2, consistent with 1 à 45 Select value for 1 à 15 Select value for 3, consistent with 1 & 2 à 30 [20,30], 15+[10,20],45+[-20,-10]

Solution by Decomposition d-graph Select value for 2, consistent with 1 à 45 Select value for 1 à 15 Select value for 3, consistent with 1 & 2 à 30 [25,30]

Solution by Decomposition d-graph Select value for 4, consistent with 1,2 & 3 O(N 2 ) Select value for 2, consistent with 1 à 45 Select value for 1 à 15 Select value for 3, consistent with 1 & 2 à 30

More on Temporal planning by plan-space planners (Zeno)  The “accommodation” to complexity that Zeno makes by refusing to handle nonlinear constraints (waiting instead until they become linear) is sort of hilarious given it doesn’t care much about heuristic control otherwise  Basically Zeno is trying to keep the “per-node” cost of the search down (and if you do nonlinear constraint consistency check, even that is quite hard)  Of course, we know now that there is no obvious reason to believe that reducing the per-node cost will, ipso facto, also lead to reduction in overall search.  The idea of “goal reduction” by splitting a temporal subgoal to multiple sub- intervals is used only in Zeno, and helps it support a temporal goal over a long duration with multiple actions. Neat idea.  Zeno doesn’t have much of a problem handling arbitrary concurrency—since we are only posting constraints on temporal variables denoting the start points of the various actions. In particular, Zeno does not force either right or left alignment of actions.  In addition to Zeno, IxTeT is another influential metric temporal planner that uses plan-space planning idea.

10/30 (Don’t print hidden slides)

Multi-objective search  Multi-dimensional nature of plan quality in metric temporal planning:  Temporal quality (e.g. makespan, slack—the time when a goal is needed – time when it is achieved.)  Plan cost (e.g. cumulative action cost, resource consumption)  Necessitates multi-objective optimization:  Modeling objective functions  Tracking different quality metrics and heuristic estimation  Challenge: There may be inter-dependent relations between different quality metric

Example  Option 1: Tempe  Phoenix (Bus)  Los Angeles (Airplane)  Less time: 3 hours; More expensive: $200  Option 2: Tempe  Los Angeles (Car)  More time: 12 hours; Less expensive: $50  Given a deadline constraint (6 hours)  Only option 1 is viable  Given a money constraint ($100)  Only option 2 is viable Tempe Phoenix Los Angeles

Solution Quality in the presence of multiple objectives  When we have multiple objectives, it is not clear how to define global optimum  E.g. How does plan compare to ?  Problem: We don’t know what the user’s utility metric is as a function of cost and makespan.

Solution 1: Pareto Sets  Present pareto sets/curves to the user  A pareto set is a set of non-dominated solutions  A solution S1 is dominated by another S2, if S1 is worse than S2 in at least one objective and equal in all or worse in all other objectives. E.g. dominated by  A travel agent shouldn’t bother asking whether I would like a flight that starts at 6pm and reaches at 9pm, and cost 100$ or another ones which also leaves at 6 and reaches at 9, but costs 200$.  A pareto set is exhaustive if it contains all non-dominated solutions  Presenting the pareto set allows the users to state their preferences implicitly by choosing what they like rather than by stating them explicitly.  Problem: Exhaustive Pareto sets can be large (exponentially large in many cases).  In practice, travel agents give you non-exhaustive pareto sets, just so you have the illusion of choice  Optimizing with pareto sets changes the nature of the problem—you are looking for multiple rather than a single solution.

Solution 2: Aggregate Utility Metrics  Combine the various objectives into a single utility measure  Eg: w1*cost+w2*make-span  Could model grad students’ preferences; with w1=infinity, w2=0  Log(cost)+ 5*(Make-span) 25  Could model Bill Gates’ preferences.  How do we assess the form of the utility measure (linear? Nonlinear?)  and how will we get the weights?  Utility elicitation process  Learning problem: Ask tons of questions to the users and learn their utility function to fit their preferences  Can be cast as a sort of learning task (e.g. learn a neual net that is consistent with the examples)  Of course, if you want to learn a true nonlinear preference function, you will need many many more examples, and the training takes much longer.  With aggregate utility metrics, the multi-obj optimization is, in theory, reduces to a single objective optimization problem  *However* if you are trying to good heuristics to direct the search, then since estimators are likely to be available for naturally occurring factors of the solution quality, rather than random combinations there-of, we still have to follow a two step process 1.Find estimators for each of the factors 2.Combine the estimates using the utility measure THIS IS WHAT IS DONE IN SAPA

Sketch of how to get cost and time estimates  Planning graph provides “level” estimates  Generalizing planning graph to “temporal planning graph” will allow us to get “time” estimates  For relaxed PG, the generalization is quite simple—just use bi-level representation of the PG, and index each action and literal by the first time point (not level) at which they can be first introduced into the PG  Generalizing planning graph to “cost planning graph” (i.e. propagate cost information over PG) will get us cost estimates  We discussed how to do cost propagation over classical PGs. Costs of literals can be represented as monotonically reducing step functions w.r.t. levels.  To estimate cost and time together we need to generalize classical PG into Temporal and Cost-sensitive PG  Now, the costs of literals will be monotonically reducing step functions w.r.t. time points (rather than level indices)  This is what SAPA does

SAPA approach  Using the Temporal Planning Graph (Smith & Weld) structure to track the time-sensitive cost function:  Estimation of the earliest time (makespan) to achieve all goals.  Estimation of the lowest cost to achieve goals  Estimation of the cost to achieve goals given the specific makespan value.  Using this information to calculate the heuristic value for the objective function involving both time and cost  Involves propagating cost over planning graphs..

Heuristic Control Temporal planners have to deal with more branching possibilities  More critical to have good heuristic guidance Design of heuristics depends on the objective function Classical Planning Number of actions Parallel execution time Solving time Temporal Resource Planning Number of actions Makespan Resource consumption Slack …….  In temporal Planning heuristics focus on richer obj. functions that guide both planning and scheduling

Objectives in Temporal Planning  Number of actions: Total number of actions in the plan.  Makespan: The shortest duration in which we can possibly execute all actions in the solution.  Resource Consumption: Total amount of resource consumed by actions in the solution.  Slack: The duration between the time a goal is achieved and its deadline.  Optimize max, min or average slack values  Combinations there-of

Deriving heuristics for SAPA We use phased relaxation approach to derive different heuristics Relax the negative logical and resource effects to build the Relaxed Temporal Planning Graph Pruning a bad state while preserving the completeness. Deriving admissible heuristics: –To minimize solution’s makespan. –To maximize slack-based objective functions. Find relaxed solution which is used as distance heuristics Adjust the heuristic values using the negative interaction (Future work) Adjust the heuristic values using the resource consumption Information. [AltAlt,AIJ2001]

Heuristics in Sapa are derived from the Graphplan-style bi-level relaxed temporal planning graph (RTPG) Progression; so constructed anew for each state..

Relaxed Temporal Planning Graph Relaxed Action:  No delete effects  May be okay given progression planning  No resource consumption  Will adjust later Person Airplane Person AB Load(P,A) Fly(A,B)Fly(B,A) Unload(P,A) Unload(P,B) Init Goal Deadline t=0tgtg while(true) forall A  advance-time applicable in S S = Apply(A,S) Involves changing P, ,Q,t {Update Q only with positive effects; and only when there is no other earlier event giving that effect} if S  G then Terminate{solution} S’ = Apply(advance-time,S) if  (p i,t i )  G such that t i < Time(S’) and p i  S then Terminate{non-solution} else S = S’ end while; Deadline goals RTPG is modeled as a time-stamped plan! (but Q only has +ve events) Note: Bi-level rep; we don’t actually stack actions multiple times in PG—we just keep track the first time the action entered

Details on RTPG Construction  All our heuristics are based on the relaxed temporal planning graph structure (RTPG). This is a Graphplanstyle[ 2] bi-level planning graph generalized to temporal domains. Given a state S = ( P;M;  ¦ ; Q; t ), the RTPG is built from S using the set of relaxed actions, which are generated from original actions by eliminating all effects which (1) delete some fact (predicate) or (2) reduce the level of some resource. Since delete effects are ignored, RTPG will not contain any mutex relations, which considerably reduces the cost of constructing RTPG. The algorithm to build the RTPG structure is summarized in Figure 4.  To build RTPG, we need three main datastructures: a fact level, an action level, and an unexecuted event queue  Each fact f or action A is marked in, and appears in the RTPG’s fact/action level at time instant tf / tA if it can be achieved/executed at tf / tA.  In the beginning, only facts which appear in P are marked in at t, the action level is empty, and the event queue holds all the unexecuted events in Q that add new predicates.  Action A will be marked in if (1) A is not already marked in and (2) all of A ’s preconditions are marked in. When action A is in, then all of A ’s unmarked instant add effects will also be marked in at t.  Any delayed effect e of A that adds fact f is put into the event queue Q if (1) f is not marked in and (2) there is no event e0 in Q that is scheduled to happen before e and which also adds f. Moreover, when an event e is added to Q, we will take out from Q any event e0 which is scheduled to occur after e and also adds f.  When there are no more unmarked applicable actions in S, we will stop and return no-solution if either (1) Q is empty or (2) there exists some unmarked goal with a deadline that is smaller than the time of the earliest event in Q.  If none of the situations above occurs, then we will apply advance-time action to S and activate all events at time point te0 of the earliest event e’ in Q.  The process above will be repeated until all the goals are marked in or one of the conditions indicating non- solution occurs. [From Do & Kambhampati; ECP 01]

Heuristics directly from RTPG  For Makespan: Distance from a state S to the goals is equal to the duration between time(S) and the time the last goal appears in the RTPG.  For Min/Max/Sum Slack: Distance from a state to the goals is equal to the minimum, maximum, or summation of slack estimates for all individual goals using the RTPG.  Slack estimate is the difference between the deadline of the goal, and the expected time of achievement of that goal. Proof: All goals appear in the RTPG at times smaller or equal to their achievable times. A D M I S S I B L E

Heuristics from Relaxed Plan Extracted from RTPG RTPG can be used to find a relaxed solution which is then used to estimate distance from a given state to the goals Sum actions: Distance from a state S to the goals equals the number of actions in the relaxed plan. Sum durations: Distance from a state S to the goals equals the summation of action durations in the relaxed plan. Person Airplane Person AB Load(P,A) Fly(A,B)Fly(B,A) Unload(P,A) Unload(P,B) Init Goal Deadline t=0tgtg

Resource-based Adjustments to Heuristics Resource related information, ignored originally, can be used to improve the heuristic values Adjusted Sum-Action: h = h +  R  (Con(R) – (Init(R)+Pro(R)))/  R  Adjusted Sum-Duration: h = h +  R [(Con(R) – (Init(R)+Pro(R)))/  R ].Dur(A R )  Will not preserve admissibility

Aims of Empirical Study  Evaluate the effectiveness of the different heuristics.  Ablation studies:  Test if the resource adjustment technique helps different heuristics.  Compare with other temporal planning systems.

Empirical Results Adjusted Sum-Action Sum-Duration Probtime#actnodesdurtime#actnodesdur Zeno10.317514/483200.35520/67320 Zeno254.3723188/1303950---- Zeno329.7313250/12214306.201360/289450 Zeno913.0113151/79359098.66134331/5971460 Log11.511627/15710.01.811633/19210.0 Log282.0122199/159218.8738.432261/50518.87 Log310.251230/21511.75---- Log9116.093291/83026.25---- Sum-action finds solutions faster than sum-dur Admissible heuristics do not scale up to bigger problems Sum-dur finds shorter duration solutions in most of the cases Resource-based adjustment helps sum-action, but not sum-dur Very few irrelevant actions. Better quality than TemporalTLPlan. So, (transitively) better than LPSAT

Empirical Results (cont.) Logistics domain with driving restricted to intra-city (traditional logistics domain) Sapa is the only planner that can solve all 80 problems

Empirical Results (cont.) The “sum-action” heuristic used as the default in Sapa can be mislead by the long duration actions... Logistics domain with inter-city driving actions  Future work on fixed point time/level propagation

The (Relaxed) Temporal PG Tempe Phoenix Los Angeles Drive-car(Tempe,LA) Heli(T,P) Shuttle(T,P) Airplane(P,LA) t = 0t = 0.5t = 1t = 1.5 t = 10

Time-sensitive Cost Function  Standard (Temporal) planning graph (TPG) shows the time-related estimates e.g. earliest time to achieve fact, or to execute action  TPG does not show the cost estimates to achieve facts or execute actions Tempe Phoenix L.A Shuttle(Tempe,Phx): Cost: $20; Time: 1.0 hour Helicopter(Tempe,Phx): Cost: $100; Time: 0.5 hour Car(Tempe,LA): Cost: $100; Time: 10 hour Airplane(Phx,LA): Cost: $200; Time: 1.0 hour cost time 0 1.5210 $300 $220 $100  Drive-car(Tempe,LA) Heli(T,P) Shuttle(T,P) Airplane(P,LA) t = 0t = 0.5t = 1t = 1.5 t = 10

Estimating the Cost Function Tempe Phoenix L.A time 01.5210 $300 $220 $100  t = 1.5 t = 10 Shuttle(Tempe,Phx): Cost: $20; Time: 1.0 hour Helicopter(Tempe,Phx): Cost: $100; Time: 0.5 hour Car(Tempe,LA): Cost: $100; Time: 10 hour Airplane(Phx,LA): Cost: $200; Time: 1.0 hour 1 Drive-car(Tempe,LA) Hel(T,P) Shuttle(T,P) t = 0 Airplane(P,LA) t = 0.5 0.5 t = 1 Cost(At(LA))Cost(At(Phx)) = Cost(Flight(Phx,LA)) Airplane(P,LA) t = 2.0 $20

Observations about cost functions  Because cost-functions decrease monotonically, we know that the cheapest cost is always at t_infinity (don’t need to look at other times)  Cost functions will be monotonically decreasing as long as there are no exogenous events  Actions with time-sensitive preconditions are in essence dependent on exogenous events (which is why PDDL 2.1 doesn’t allow you to say that the precondition must be true at an absolute time point—only a time point relative to the beginning of the action  If you have to model an action such as “Take Flight” such that it can only be done with valid flights that are pre-scheduled (e.g. 9:40AM, 11:30AM, 3:15PM etc), we can model it by having a precondition “Have-flight” which is asserted at 9:40AM, 11:30AM and 3:15PM using timed initial literals)  Becase cost-functions are step funtions, we need to evaluate the utility function U(makespan,cost) only at a finite number of time points (no matter how complex the U(.) function is.  Cost functions will be step functions as long as the actions do not model continuous change (which will come in at PDDL 2.1 Level 4). If you have continuous change, then the cost functions may change continuously too ADDED

Cost Propagation  Issues:  At a given time point, each fact is supported by multiple actions  Each action has more than one precondition  Propagation rules:  Cost(f,t) = min {Cost(A,t) : f  Effect(A)}  Cost(A,t) = Aggregate(Cost(f,t): f  Pre(A))  Sum-propagation:  Cost(f,t)  The plans for individual preconds may be interacting  Max-propagation: Max {Cost(f,t)}  Combination: 0.5  Cost(f,t) + 0.5 Max {Cost(f,t)} Probably other better ideas could be tried Can’t use something like set-level idea here because That will entail tracking the costs of subsets of literals

Termination Criteria  Deadline Termination: Terminate at time point t if:   goal G: Dealine(G)  t   goal G: (Dealine(G) < t)  (Cost(G,t) =   Fix-point Termination: Terminate at time point t where we can not improve the cost of any proposition.  K-lookahead approximation: At t where Cost(g,t) < , repeat the process of applying (set) of actions that can improve the cost functions k times. cost time 0 1.5210 $300 $220 $100  Drive-car(Tempe,LA) H(T,P) Shuttle(T,P) Plane(P,LA) t = 0 0.5 1 1.5 t = 10 Earliest time point Cheapest cost

Heuristic estimation using the cost functions  If the objective function is to minimize time: h = t 0  If the objective function is to minimize cost: h = CostAggregate(G, t  )  If the objective function is the function of both time and cost O = f(time,cost) then: h = min f(t,Cost(G,t)) s.t. t 0  t  t  Eg: f(time,cost) = 100.makespan + Cost then h = 100x2 + 220 at t 0  t = 2  t  time cost 0 t 0 =1.52t  = 10 $300 $220 $100  Cost(At(LA)) Earliest achieve time: t 0 = 1.5 Lowest cost time: t  = 10 The cost functions have information to track both temporal and cost metric of the plan, and their inter-dependent relations !!!

Heuristic estimation by extracting the relaxed plan  Relaxed plan satisfies all the goals ignoring the negative interaction:  Take into account positive interaction  Base set of actions for possible adjustment according to neglected (relaxed) information (e.g. negative interaction, resource usage etc.)  Need to find a good relaxed plan (among multiple ones) according to the objective function

Heuristic estimation by extracting the relaxed plan  Initially supported facts: SF = Init state  Initial goals: G = Init goals \ SF  Traverse backward searching for actions supporting all the goals. When A is added to the relaxed plan RP, then: SF = SF  Effects(A) G = (G  Precond(A)) \ Effects  If the objective function is f(time,cost), then A is selected such that: f(t(RP+A),C(RP+A)) + f(t(G new ),C(G new )) is minimal (G new = (G  Precond(A)) \ Effects)  When A is added, using mutex to set orders between A and actions in RP so that less number of causal constraints are violated time cost 0 t 0 =1.52t  = 10 $300 $220 $100  Tempe Phoenix L.A f(t,c) = 100.makespan + Cost

Heuristic estimation by extracting the relaxed plan  General Alg.: Traverse backward searching for actions supporting all the goals. When A is added to the relaxed plan RP, then: Supported Fact = SF  Effects(A) Goals = SF \ (G  Precond(A))  Temporal Planning with Cost: If the objective function is f(time,cost), then A is selected such that: f(t(RP+A),C(RP+A)) + f(t(G new ),C(G new )) is minimal (G new = (G  Precond(A)) \ Effects)  Finally, using mutex to set orders between A and actions in RP so that less number of causal constraints are violated time cost 0 t 0 =1.52t  = 10 $300 $220 $100  Tempe Phoenix L.A f(t,c) = 100.makespan + Cost

End of 10/30 lecture

Adjusting the Heuristic Values Ignored resource related information can be used to improve the heuristic values (such like +ve and –ve interactions in classical planning) Adjusted Cost: C = C +  R  (Con(R) – (Init(R)+Pro(R)))/  R  * C(A R )  Cannot be applied to admissible heuristics

Partialization Example A1A2A3 A1(10) gives g1 but deletes p A3(8) gives g2 but requires p at start A2(4) gives p at end We want g1,g2 A position-constrained plan with makespan 22 A1 A2 A3 G p g1 g2 [et(A1) = st(A3)] [et(A2) <= st(A3) …. Order Constrained plan The best makespan dispatch of the order-constrained plan A1 A2A3 14+  There could be multiple O.C. plans because of multiple possible causal sources. Optimization will involve Going through them all.

Problem Definitions  Position constrained (p.c) plan: The execution time of each action is fixed to a specific time point  Can be generated more efficiently by state-space planners  Order constrained (o.c) plan: Only the relative orderings between actions are specified  More flexible solutions, causal relations between actions  Partialization: Constructing a o.c plan from a p.c plan Q R R G Q RR {Q} {G} t1t1 t2t2 t3t3 p.c plano.c plan Q R R G Q RR {Q} {G}

Validity Requirements for a partialization  An o.c plan P oc is a valid partialization of a valid p.c plan P pc, if:  P oc contains the same actions as P pc  P oc is executable  P oc satisfies all the top level goals  (Optional) P pc is a legal dispatch (execution) of P oc  (Optional) Contains no redundant ordering relations P Q P Q X redundant

Greedy Approximations  Solving the optimization problem for makespan and number of orderings is NP-hard (Backstrom,1998)  Greedy approaches have been considered in classical planning (e.g. [Kambhampati & Kedar, 1993], [Veloso et. al.,1990]):  Find a causal explanation of correctness for the p.c plan  Introduce just the orderings needed for the explanation to hold

Partialization: A simple example Pickup(A)Stack(A,B)Pickup(C)Stack(C,D) Pickup(A) Stack(A,B) Pickup(C) Stack(C,D) On(A,B) On(C,D) Holding(B) Holding(C) Hand-empty

Modeling greedy approaches as value ordering strategies  Variation of [Kambhampati & Kedar,1993] greedy algorithm for temporal planning as value ordering:  Supporting variables: S p A = A’ such that:  et p A’ < st p A in the p.c plan P pc   B s.t.: et p A’ < et  p B < st p A   C s.t.: et p C < et p A’ and satisfy two above conditions  Ordering and interference variables:   p AB = if st  p B > st p A   r AA’ = if st r A > et r A’ in P pc ;  r AA’ =  other wise. Key insight: We can capture many of the greedy approaches as specific value ordering strategies on the CSOP encoding

CSOP Variables and values  Continuous variables:  Temporal: st A ; D(st A ) = {0, +  }, D(st init ) = {0}, D(st Goals ) = {Dl(G)}.  Resource level: V r A  Discrete variables:  Resource ordering:  r AA’ ; Dom(  r AA’ ) = { } or Dom(  r AA’ ) = {,  }  Causal effect: S p A ; Dom(S p A ) = {B 1, B 2,…B n }, p  E(B i )  Mutex:  p AA’ ; Dom(  p AA’ ) = { }; p  E(A),  p  E(A’) U P(A’) Q R R G Q RR {Q} {G} A1A1 A2A2 A3A3 Exp: Dom(S Q A2 ) = {A ibit, A 1 } Dom(S R A3 ) = {A 2 }, Dom(S G Ag ) = {A 3 }  R A1A2,  R A1A3

Constraints  Causal-link protection:  S p A = B   A’,  p  E(A’): (  p A’B = )  Ordering and temporal variables:  S p A = B  et p B < st p A   p A’B =  et  p A > st p A’   r AA’ =  st r A > et r A’  Optional temporal constraints:  Goal deadline: st Ag  t g ;  Time constraints on individual actions: L  st A  U  Resource precondition constraints:  For each precondition V r A  K,  = {>,<, , ,=} set up one constraint involving all  r AA’ such as:  Exp: Init r +  A’ K if  = >

Modeling Different Objective Functions  Temporal quality:  Minimum Makespan: Minimize Max A (st A + dur A )  Maximize summation of slacks: Maximize  (st g Ag - et g A ); S g Ag = A  Maximize average flexibility: Maximize Avg(Dom(st A ))  Fewest orderings:  Minimize #(st A < st A’ )

Empirical evaluation  Objective:  Demonstrate that metric temporal planner armed with our approach is able to produce plans that satisfy a variety of cost/makespan tradeoff.  Testing problems:  Randomly generated logistics problems from TP4 (Hasslum&Geffner) Load/unload(package,location): Cost = 1; Duration = 1; Drive-inter-city(location1,location2): Cost = 4.0; Duration = 12.0; Flight(airport1,airport2): Cost = 15.0; Duration = 3.0; Drive-intra-city(location1,location2,city): Cost = 2.0; Duration = 2.0;

3/4  The slides on quotienting were added after the class to reflect the white-board discussion in the class.

Similar presentations

Presentation on theme: "3/4  The slides on quotienting were added after the class to reflect the white-board discussion in the class."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

3/4  The slides on quotienting were added after the class to reflect the white-board discussion in the class.

Similar presentations

Presentation on theme: "3/4  The slides on quotienting were added after the class to reflect the white-board discussion in the class."— Presentation transcript:

Similar presentations

About project

Feedback