Download presentation
Presentation is loading. Please wait.
1
1001 Ways to Skin a Planning Graph for Heuristic Fun and Profit Subbarao Kambhampati Arizona State University http://rakaposhi.eas.asu.edu (With tons of help from Daniel Bryce, Minh Binh Do, Xuan Long Nguyen Romeo Sanchez Nigenda, Biplav Srivastava, Terry Zimmerman) Funding from NSF & NASA 987 WMD-in-the-toilet “After the flush, you may find that there were no bombs to begin with”
2
Planning Graph and Projection Envelope of Progression Tree (Relaxed Progression) –Proposition lists: Union of states at k th level –Mutex: Subsets of literals that cannot be part of any legal state Lowerbound reachability information p pqrspqrs pqrstpqrst A1 A2 A3 A1 A2 A3 A4 [Blum&Furst, 1995] [ECP, 1997] p pq pr ps pqr pq pqs psq ps pst A1 A2 A3 A2 A1 A3 A1 A3 A4 Planning Graphs can be used as the basis for heuristics!
3
And PG Heuristics for all.. –Classical (regression) planning –AltAlt (AAAI 2000; AIJ 2002); AltAlt p (JAIR 2003) Serial vs. Parallel graphs; Level and Adjusted heuristics; Partial expansion –Graphplan style search –GP-HSP (AIPS 2000) Variable/Value ordering heuristics based on distances –Partial order planning – RePOP (IJCAI 2001) Mutexes used to detect Indirect Conflicts –Metric Temporal Planning –Sapa (ECP 2001; AIPS 2002; JAIR 2003) Propagation of cost functions; Phased relaxation –Conformant Planning – CAltAlt (ICAPS Uncertanity Wkshp, 2003) Multiple graphs; Labelled graphs
4
And PG Heuristics for all.. –Classical (regression) planning –AltAlt (AAAI 2000; AIJ 2002); AltAlt p (JAIR 2003) Serial vs. Parallel graphs; Level and Adjusted heuristics; Partial expansion –Graphplan style search –GP-HSP (AIPS 2000); PEGG (IJCAI 2003; AAAI 1999] Variable/Value ordering heuristics based on distances –Partial order planning – RePOP (IJCAI 2001) Mutexes used to detect Indirect Conflicts –Metric Temporal Planning –Sapa (ECP 2001; AIPS 2002; JAIR 2003) Propagation of cost functions; Phased relaxation –Conformant Planning – CAltAlt (ICAPS Uncertanity Wkshp, 2003) Multiple graphs; Labelled graphs Caveat: “All Tempe, All the time”
5
I. PG Heuristics for State-space (Regression) planners [AAAI 2000; AIPS 2000; AIJ 2002; JAIR 2003] Problem: Given a set of subgoals (regressed state) estimate how far they are from the initial state
6
Planning Graphs: Optimistic Projection of Achievability At(0,0) Key(0,1) Prop list Level 0 At(0,1) At(1,0) noop Action list Level 0 Move(0,0,0,1) Move(0,0,1,0) x At(0,0) key(0,1) Prop list Level 1 x At(0,0) Key(0,1) noop x Action list Level 1 x Prop list Level 2 Move(0,1,1,1) At(1,1) At(1,0) At(0,1) Move(1,0,1,1) noop x x x x x x …... x Pick_key(0,1) Have_key ~Key(0,1)x x x x x Mutexes Initial state 0 1 2 Goal state Grid Problem Serial PG: PG where any pair of non-noop actions are marked mutex lev(S): index of the first level where all props in S appear non-mutexed. –If there is no such level, then If the graph is grown to level off, then Else k+1 (k is the current length of the graph)
7
Cost of a Set of Literals lev(p) : index of the first level at which p comes into the planning graph lev(S): index of the first level where all props in S appear non-mutexed. –If there is no such level, then If the graph is grown to level off, then Else k+1 (k is the current length of the graph) SumSet-Level Partition-kAdjusted SumCombo Set-Level with memos h(S) = p S lev({p}) h(S) = lev(S) Admissible
8
Adjusting the Sum Heuristic Start with Sum heuristic and adjust it to take subgoal interactions into account –Negative interactions in terms of “degree of interaction” –Positive interactions in terms of co-achievement links Ignore negative interactions when accounting for positive interactions (and vice versa) [AAAI 2000] HAdjSum2M(S) = length(RelaxedPlan(S)) + max p,q S (p,q) Where (p,q) = lev({p,q}) - max{lev(p), lev(q)} /*Degree of –ve Interaction */
9
Optimizations in Heuristic Computation Taming Space/Time costs Bi-level Planning Graph representation Partial expansion of the PG (stop before level-off) –It is FINE to cut corners when using PG for heuristics (instead of search)!! Branching factor can still be quite high –Use actions appearing in the PG Select actions in lev(S) vs Levels-off
10
AltAlt Performance Logistics Scheduling Problem sets from IPC 2000
11
Even Parallel Plans aren’t safe.. [JAIR 2003] Serial graph over-estimates Use “parallel” rather than serial PG as the basis for heuristics Projection over sets of actions too costly Select the branch with the best action and fatten it Use “push-up” to make the partial plans more parallel
12
II. PG heuristics for Graphplan..
13
PG Heuristics for Graphplan(!) Goal/Action Ordering Heuristics for Backward Search Propositions are ordered for consideration in decreasing value of their levels. Actions supporting a proposition are ordered for consideration in increasing values of their costs –Cost of an action = 1 + Cost of its set of preconditions Use of level heuristics improves the performance significantly. –The heuristics are surprisingly insensitive to the length of the planning graph [AIPS 2000]
14
2 4 3 …And then state-space heuristics for Graphplan (PEGG) EYQEYQ EYRTEYRT EFREFR - Init State A C E F K 0 1 Goal X Y Z 5 XWQXWQ ------ WTSWTS ------ WTRWTR Planning Graph (proposition levels) 6 1: Capture a state space view of Graphplan’s search in a search trace X Y a2 a3 a4 Z action assignments Regressed ‘states’ No solution? extend graph…
15
6 Init State A C E F K 01234 5 WEWE RERE CETCET FDKFDK FWFW 7 YFYF Goal X Y Z WRWR WREWRE XWQXWQ ------ WTSWTS ------ WTRWTR EYQEYQ EYRTEYRT ------ EFREFR FRFR EFJEFJ FRAFRA EYRTEYRT WREWRE FRFR …And then state-space heuristics for Graphplan
16
PEGG now competitive with a heuristic state space planner [IJCAI 2003]
17
In the beginning it was all POP. Then it was cruelly UnPOPped The good times return with Re(vived)POP III. PG Heuristics for PO Planners
18
POP Algorithm 1.Plan Selection: Select a plan P from the search queue 2. Flaw Selection: Choose a flaw f (open cond or unsafe link) 3. Flaw resolution: If f is an open condition, choose an action S that achieves f If f is an unsafe link, choose promotion or demotion Update P Return NULL if no resolution exist 4. If there is no flaw left, return P S0S0 S1S1 S2S2 S3S3 S in f p ~p g1g1 g2g2 g2g2 oc 1 oc 2 q1q1 Choice points Flaw selection (open condition? unsafe link? Non-backtrack choice) Flaw resolution/Plan Selection (how to select (rank) partial plan?) S0S0 S inf g1g2g1g2 1. Initial plan: 2. Plan refinement (flaw selection and resolution):
19
Distance heuristics to estimate cost of partially ordered plans (and to select flaws) –If we ignore negative interactions, then the set of open conditions can be seen as a regression state Mutexes used to detect indirect conflicts in partial plans –A step threatens a link if there is a mutex between the link condition and the steps’ effect or precondition –Post disjunctive precedences and use propagation to simplify PG Heuristics for Partial Order Planning
20
RePOP’s Performance RePOP implemented on top of UCPOP –Dramatically better than any other partial order planner –Competitive with Graphplan and AltAlt –VHPOP carried the torch at ICP 2002 [IJCAI, 2001] You see, pop, it is possible to Re-use all the old POP work! Written in Lisp, runs on Linux, 500MHz, 250MB
21
IV. PG Heuristics for Metric Temporal Planning [ECP 2001; AIPS 2002; ICAPS 2003; JAIR 2003]
22
Multi-Objective Nature of MTP Plan quality in Metric Temporal domains is inherently Multi- dimensional –Temporal quality (e.g. makespan, slack) –Plan cost (e.g. cumulative action cost, resource consumption) Necessitates multi-objective search –Modeling objective functions –Tracking different quality metrics and heuristic estimation Challenge: Inter-dependencies between different quality metrics Typically cost will go down with higher makespan… Tempe Phoenix L.A
23
SAPA’s approach Use a temporal version of the Planning Graph (Smith & Weld) structure to track the time-sensitive cost function: –Estimation of the earliest time (makespan) to achieve all goals. –Estimation of the lowest cost to achieve goals –Estimation of the cost to achieve goals given the specific makespan value. Use this information to calculate the heuristic value for the objective function involving both time and cost Challenge: How to propagate cost over planning graphs?
24
Search through time-stamped states Goal Satisfaction: S=(P,M, ,Q,t) G if G either: – P, t j < t i and no event in Q deletes p i. – e Q that adds p i at time t e < t i. Action Application: Action A is applicable in S if: –All instantaneous preconditions of A are satisfied by P and M. –A’s effects do not interfere with and Q. –No event in Q interferes with persistent preconditions of A. –A does not lead to concurrent resource change When A is applied to S: –P is updated according to A’s instantaneous effects. –Persistent preconditions of A are put in –Delayed effects of A are put in Q.
25
Propagating Cost Functions Tempe Phoenix L.A time 01.5210 $300 $220 $100 t = 1.5 t = 10 Shuttle(Tempe,Phx): Cost: $20; Time: 1.0 hour Helicopter(Tempe,Phx): Cost: $100; Time: 0.5 hour Car(Tempe,LA): Cost: $100; Time: 10 hour Airplane(Phx,LA): Cost: $200; Time: 1.0 hour 1 Drive-car(Tempe,LA) Hel(T,P) Shuttle(T,P) t = 0 Airplane(P,LA) t = 0.5 0.5 t = 1 Cost(At(LA))Cost(At(Phx)) = Cost(Flight(Phx,LA)) Airplane(P,LA) t = 2.0 $20
26
Issues in Cost Propagation Costing a set of literals Cost(f,t) = min {Cost(A,t) : f Effect(A)} Cost(A,t) = Aggregate(Cost(f,t): f Pre(A)) Aggregate can be Sum or Max Set-level idea would entail tracking costs of subsets of literals Termination Criteria Deadline Termination: Terminate at time point t if: – goal G: Deadline(G) t – goal G: (Deadline(G) < t) (Cost(G,t) = Fix-point Termination: Terminate at time point t where we can not improve the cost of any proposition. K-lookahead approximation: At t where Cost(g,t) < , repeat the process of applying (set) of actions that can improve the cost functions k times.
27
Heuristics based on cost functions If we want to minimize makespan: – h = t 0 If we want to minimize cost – h = CostAggregate(G, t ) If we want to minimize a function f(time,cost) of cost and makespan –h = min f(t,Cost(G,t)) s.t. t 0 t t E.g. f(time,cost) = 100.makespan + Cost then h = 100x2 + 220 at t 0 t = 2 t time cost 0 t 0 =1.52t = 10 $300 $220 $100 Cost(At(LA)) Time of Earliest achievement Time of lowest cost Direct Extract a relaxed plan using h as the bias –If the objective function is f(time,cost), then action A ( to be added to RP) is selected such that: f(t(RP+A),C(RP+A)) + f(t(G new ),C(G new )) is minimal G new = (G Precond(A)) \ Effects) Using Relaxed Plan
28
Phased Relaxation Adjusting for Resource Interactions: Estimate the number of additional resource-producing actions needed to make-up for any resource short-fall in the relaxed plan C = C + R (Con(R) – (Init(R)+Pro(R)))/ R * C(A R ) Adjusting for Mutexes: Adjust the make-span estimate of the relaxed plan by marking actions that are mutex (and thus cannot be executed concurrently The relaxed plan can be adjusted to take into account constraints that were originally ignored
29
Handling Cost/Makespan Tradeoffs Results over 20 randomly generated temporal logistics problems involve moving 4 packages between different locations in 3 cities: O = f(time,cost) = .Makespan + (1- ).TotalCost
30
SAPA at IPC-2002 Rover (time setting) Satellite (complex setting) [JAIR 2003]
31
IV. PG Heuristics for Conformant Planning
32
Conformant Planning as Regression Actions: A1: M P => K A2: M Q => K A3: M R => L A4: K => G A5: L => G Initially: (P V Q V R) & (~P V ~Q) & (~P V ~R) & (~Q V ~R) & M Goal State: G G (G V K) (G V K V L) A4 A1 (G V K V L V P) & M A2 A5 A3 G or K must be true before A4 For G to be true after A4 (G V K V L V P V Q) & M (G V K V L V P V Q V R) & M Each Clause is Satisfied by a Clause in the Initial Clausal State -- Done! (5 actions) Initially: (P V Q V R) & (~P V ~Q) & (~P V ~R) & (~Q V ~R) & M (G V K V L V P V Q V R) & M
33
Using a Single, Unioned Graph P M Q M R M P Q R M A1 A2 A3 Q R M K L A4 G A5 P A1 A2 A3 Q R M K L P G A4 K A1 P M Heuristic Estimate = 2 Not effective Lose world specific support information Incorrect mutexes Union literals from all initial states into a conjunctive initial graph level Easy to implement
34
Using Multiple Graphs P M A1 P M K P M K A4 G R M A3 R M L R M L G A5 P M Q M R M Q M A2 Q M K Q K A4 G M G K A1 M P G A4 K A2 Q M G A5 L A3 R M Accurate Mutexes Moderate Implementation Difficulty Memory Intensive Heuristic Computation Can be costly Unioning these graphs a priori would give much savings …
35
Using a Single, Labeled Graph P Q R A1 A2 A3 P Q R M L A1 A2 A3 P Q R L A5 Action Labels: Conjunction of Labels of Supporting Literals Literal Labels: Disjunction of Labels Of Supporting Actions P M Q M R M K A4 G K A1 A2 A3 P Q R M G A5 A4 L K A1 A2 A3 P Q R M Heuristic Value = 5 Memory Efficient Cheap Heuristics Scalable Extensible Tricky to Implement Benefits from BDD’s and a model checker ATMS ~Q & ~R ~P & ~R ~P & ~Q (~P & ~R) V (~Q & ~R) (~P & ~R) V (~Q & ~R) V (~P & ~Q) M True Label Key Label of a literal signifies the set of worlds in which it is supported --Full support means all init worlds
36
CAltAlt Performance Label-graph based heuristics make CAltAlt competitive with the current best approaches
37
The Damage until now.. –Classical (regression) planning –AltAlt (AAAI 2000; AIJ 2002); AltAlt p (JAIR 2003) Serial vs. Parallel graphs; Level and Adjusted heuristics; Partial expansion –Graphplan style search –GP-HSP (AIPS 2000); PEGG (IJCAI 2003; AAAI 1999] Variable/Value ordering heuristics based on distances –Partial order planning – RePOP (IJCAI 2001) Mutexes used to detect Indirect Conflicts –Metric Temporal Planning –Sapa (ECP 2001; AIPS 2002; JAIR 2003) Propagation of cost functions; Phased relaxation –Conformant Planning – CAltAlt (ICAPS Uncertanity Wkshp, 2003) Multiple graphs; Labelled graphs Still to come: PG Heuristics for— Probabilistic Conformant Planning Conditional Planning Lifted Planning Trans-Atlantic camaraderie Post-war reconstruction Middle-east peace…
38
Meanwhile outside Tempe… Hoffman’s FF uses relaxed plans from PG Geffner & Haslum derive DP-versions of PG- heuristics Gerevini & Serina’s LPG uses PG heuristics to cost the various repairs Smith back-propagates (convolves) probability distributions over PG to decide the contingencies worth focusing on Trinquart proposes a PG-clone that directly computes reachability in plan-space… …
39
Why do we love PG Heuristics? They work! They are “forgiving” – You don't like doing mutex? okay – You don't like growing the graph all the way? okay. Allow propagation of many types of information –Level, subgoal interaction, time, cost, world support, Support phased relaxation –E.g. Ignore mutexes and resources and bring them back later… Graph structure supports other synergistic uses –e.g. action selection Versatility…
40
PG Variations –Serial –Parallel –Temporal –Labelled Propagation Methods –Level –Mutex –Cost –Label Planning Problems –Classical –Resource/Temporal –Conformant Planners –Regression –Progression –Partial Order –Graphplan-style Versatility of PG Heuristics
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.