Download presentation
Presentation is loading. Please wait.
1
SAPA: A Domain-independent Heuristic Temporal Planner Minh B. Do & Subbarao Kambhampati Arizona State University
2
Buenos dias, amigos. Obviamente este es al articulo de Binh Minh. De todas maneras, yo lo convenci de que seria mejor para el usar su tiempo en trabajar en otro articulo proximo mas que en visitar Toledo, un pueblo del oeste medio en Ohio. Yo entiendo que esta es basicamente la estrategia que Malik uso para presentar tambien el articulo de Romain.
3
Talk Outline Temporal Planning and SAPA Action representation and search algorithm Objective functions and heuristics –Admissible/Inadmissible –Resource adjustment Empirical results Related & future work
4
Planning Most academic research has been done in the context of classical planning: –Already P-SPACE complete –Useful techniques are likely to be applicable in more expressive planning problems Real world application normally has more complex requirements: Non-instantaneous actions Temporal constraints on goals Resource consumption Classical planning has been able to scale up to big problems recently Can winning strategies in classical planning be applicable in more expressive environments?
5
Related Work Planners that can handle similar types of temporal and resource constraints: TLPlan, HSTS, IxTexT, Zeno –Cannot scale up without domain knowledge Planners that can handle a subset of constraints: –Only temporal: TGP –Only resources: LPSAT, GRT-R –Subset of temporal and resource constraints: TP4, Resource-IPP
6
SAPA Forward state space planner –Based on [Bachus&Ady]. –Make resource reasoning easier Handles temporal constraints Actions with static and dynamic durations Temporal goals with deadlines Continuous resource consumption and production Heuristic functions to support a variety of objective functions
7
Action Representation Flying (in-city ?airplane ?city1) (fuel ?airplane) > 0 (in-city ?airplane ?city1) (in-city ?airplane ?city2) consume (fuel ?airplane) Durative with E A = S A + D A Instantaneous effects e at time t e = S A + d, 0 d D A Preconditions need to be true at the starting point, and protected during a period of time d, 0 d D A Action can consume or produce continuous amount of some resource Action Conflicts: Consuming the same resource One action’s effect conflicting with other’s precondition or effect
8
Searching time-stamped states Search through the space of time-stamped states S=(P,M, ,Q,t) Set of predicates pi and the time of their last achievement t i < t. Set of functions represent resource values. Set of protected persistent conditions. Event queue. Time stamp of S.
9
Search Algorithm (cont.) Goal Satisfaction : S=(P,M, ,Q,t) G if G either: – P, t j < t i and no event in Q deletes p i. – e Q that adds p i at time t e < t i. Action Application : Action A is applicable in S if: –All instantaneous preconditions of A are satisfied by P and M. –A’s effects do not interfere with and Q. –No event in Q interferes with persistent preconditions of A. When A is applied to S: –S is updated according to A’s instantaneous effects. –Persistent preconditions of A are put in –Delayed effects of A are put in Q. S=(P,M, ,Q,t)
10
Heuristic Control Temporal planners have to deal with more branching possibilities More critical to have good heuristic guidance Design of heuristics depends on the objective function Classical Planning Number of actions Parallel execution time Solving time Temporal Resource Planning Number of actions Makespan Resource consumption Slack ……. In temporal Planning heuristics focus on richer obj. functions that guide both planning and scheduling
11
Objectives in Temporal Planning Number of actions: Total number of actions in the plan. Makespan: The shortest duration in which we can possibly execute all actions in the solution. Resource Consumption: Total amount of resource consumed by actions in the solution. Slack: The duration between the time a goal is achieved and its deadline. –Optimize max, min or average slack values
12
Deriving heuristics for SAPA We use phased relaxation approach to derive different heuristics Relax the negative logical and resource effects to build the Relaxed Temporal Planning Graph Pruning a bad state while preserving the completeness. Deriving admissible heuristics: –To minimize solution’s makespan. –To maximize slack-based objective functions. Find relaxed solution which is used as distance heuristics Adjust the heuristic values using the negative interaction (Future work) Adjust the heuristic values using the resource consumption Information. [AltAlt,AIJ2001]
13
Relaxed Temporal Planning Graph Heuristics in Sapa are derived from the Graphplan-style bi-level relaxed temporal planning graph (RTPG) Relaxed Action: No delete effects No resource consumption Person Airplane Person AB Load(P,A) Fly(A,B)Fly(B,A) Unload(P,A) Unload(P,B) Init Goal Deadline t=0tgtg while(true) forall A advance-time applicable in S S = Apply(A,S) if S G then Terminate{solution} S’ = Apply(advance-time,S) if (p i,t i ) G such that t i < Time(S’) and p i S then Terminate{non-solution} else S = S’ end while;
14
Heuristics directly from RTPG For Makespan: Distance from a state S to the goals is equal to the duration between time(S) and the time the last goal appears in the RTPG. For Min/Max/Sum Slack: Distance from a state to the goals is equal to the minimum, maximum, or summation of slack estimates for all individual goals using the RTPG. Proof: All goals appear in the RTPG at times smaller or equal to their achievable times. A D M I S S I B L E
15
Heuristics from Solution Extracted from RTPG RTPG can be used to find a relaxed solution which is then used to estimate distance from a given state to the goals Sum actions: Distance from a state S to the goals equals the number of actions in the relaxed plan. Sum durations: Distance from a state S to the goals equals the summation of action durations in the relaxed plan. Person Airplane Person AB Load(P,A) Fly(A,B)Fly(B,A) Unload(P,A) Unload(P,B) Init Goal Deadline t=0tgtg
16
Heuristics from Solution Extracted from RTPG RTPG can be used to find a relaxed solution which is then used to estimate distance from a given state to the goals Sum actions: Distance from a state S to the goals equals the number of actions in the relaxed plan. Sum durations: Distance from a state S to the goals equals the summation of action durations in the relaxed plan. Motivation: Planning progresses by adding actions to achieve goals. Thus, choose state closer to the goals in terms of total number of actions. Motivation: Choose state closer to the goals in terms of total action durations instead of number of actions. Thus, favor actions with shorter durations. Person Airplane Person AB Load(P,A) Fly(A,B)Fly(B,A) Unload(P,A) Unload(P,B) Init Goal Deadline t=0tgtg
17
Resource-based Adjustments to Heuristics Resource related information, ignored originally, can be used to improve the heuristic values Adjusted Sum-Action: h = h + R (Con(R) – (Init(R)+Pro(R)))/ R Adjusted Sum-Duration: h = h + R [(Con(R) – (Init(R)+Pro(R)))/ R ].Dur(A R ) Will not preserve admissibility
18
Aims of Empirical Study Evaluate the effectiveness of the different heuristics. Ablation studies: –Test if the resource adjustment technique helps different heuristics. Compare with other temporal planning systems.
19
Empirical Results Adjusted Sum-Action Sum-Duration Probtime#actnodesdurtime#actnodesdur Zeno10.317514/483200.35520/67320 Zeno254.3723188/1303950---- Zeno329.7313250/12214306.201360/289450 Zeno913.0113151/79359098.66134331/5971460 Log11.511627/15710.01.811633/19210.0 Log282.0122199/159218.8738.432261/50518.87 Log310.251230/21511.75---- Log9116.093291/83026.25---- Sum-action finds solutions faster than sum-dur Admissible heuristics do not scale up to bigger problems Sum-dur finds shorter duration solutions in most of the cases Resource-based adjustment helps sum-action, but not sum-dur Very few irrelevant actions. Better quality than TemporalTLPlan. So, (transitively) better than LPSAT
20
Comparison to other planners Planners with similar capabilities –IxTet, Zeno Poor scaleup –HSTS, TLPLAN Domain dependent search control Planners with limited capabilities –TGP and TGP –Compared on a set of random temporal logistics problem: Domain specification and problems are defined by TP4’s creator (“P@trik” Haslum) No resource requirements No deadline constraints or actions with dynamic duration
21
Empirical Results (cont.) Logistics domain with driving restricted to intra-city (traditional logistics domain) Sapa is the only planner that can solve all 80 problems
22
Empirical Results (cont.) The “sum-action” heuristic used as the default in Sapa can be mislead by the long duration actions... Logistics domain with inter-city driving actions Future work on fixed point time/level propagation
23
Conclusion Presented SAPA, a domain-independent forward temporal planner that can handle: –Durative actions –Deadline goals –Continuous resources Developed different heuristic functions based on the relaxed temporal planning graph to address both satisficing and optimizing search Method to improve heuristic values by resource reasoning Promising initial empirical results
24
Related Work Planners can handle similar types of temporal and resource constraints: TLPlan, HSTS, IxTexT, Zeno –Cannot scale up without domain knowledge Planners that can handle a subset of constraints: –Only temporal: TGP –Only resources: LPSAT, GRT-R –Subset of temporal and resource constraints: TP4, Resource-IPP
25
Future Work Exploit mutex information in: –Building the temporal planning graph –Adjusting the heuristic values in the relaxed solution Relevance analysis Improving solution quality Relaxing constraints and integrating with full-scale scheduler
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.