Presentation is loading. Please wait.

Presentation is loading. Please wait.

Planning Graph Based Reachability Heuristics Daniel Bryce & Subbarao Kambhampati IJCAI’07 Tutorial 12 January 8, 2007

Similar presentations


Presentation on theme: "Planning Graph Based Reachability Heuristics Daniel Bryce & Subbarao Kambhampati IJCAI’07 Tutorial 12 January 8, 2007"— Presentation transcript:

1 Planning Graph Based Reachability Heuristics Daniel Bryce & Subbarao Kambhampati IJCAI’07 Tutorial 12 January 8, 2007 http://rakaposhi.eas.asu.edu/pg-tutorial/ dan.bryce@asu.edudan.bryce@asu.edu http://verde.eas.asu.eduhttp://verde.eas.asu.edu rao@asu.edurao@asu.edu, http://rakaposhi.eas.asu.eduhttp://rakaposhi.eas.asu.edu Also See Our Tutorial Article in the Winter Issue of AI Magazine

2 January 18, 2007IJCAI'07 Tutorial T122 What is Planning?  Wikipedia says: Automated planning and scheduling is a branch of artificial intelligence that concerns the realization of strategies or action sequences, typically for execution by intelligent agents, autonomous robots and unmanned vehicles. Unlike classical control and classification problems, the solutions are complex, unknown and have to be discovered and optimized in multidimensional space.artificial intelligencestrategiesintelligent agents autonomous robotsunmanned vehiclescontrolclassification

3 January 18, 2007IJCAI'07 Tutorial T123 Why Automated Planning?  Manual Planning is slow and error prone  Autonomous agents must plan for themselves  Lots of Applications  Manufacturing: Xerox’s next generation copiers, supply chain management  Space: Mars Exploration Rovers, Deep Space One  Entertainment: non-player characters, narratives  Areas of CS: Web service composition, Database Query planning  Security: User (hacker) modeling  Defense: UAV Reconnaissance, Logistics, Tactical missions

4 January 18, 2007IJCAI'07 Tutorial T124 Classical Planning Problem  (P, A, I, G)  P: set of propositions  A: set of actions  pre(a) µ P: precondition propositions  eff + (a) µ P: positive effects  eff - (a) µ P: negative effects  I µ P: Initial State  G µ P: Goal propositions  Semantics  appl(a, s) : pre(a) µ s  s’ = succ(a, s) = s [ eff + (a) \ eff - (a)  s’ = succ({a1, a2, …, an}, s) = succ(an, …succ(a2, succ(a1, s)) …)  If succ({a1, a2, …, an}, I) µ G, then {a1, a2, …, an} is a correct plan

5 January 18, 2007IJCAI'07 Tutorial T125 Why is Planning Hard?  Subgoal Interactions make planning hard  The effect of an action can help or hinder the plan:  Enable later actions  Support a goal  Undo the effect of previous actions  Prevent the execution of subsequent actions  Plan Synthesis must find a sequence of actions that is executable and achieves the goals  Classical Planning Problems:  Finite Length: NP-Hard  Indefinite Length: PSPACE-Complete  Beyond Classical Planning it is more difficult:  Non-deterministic Planning: EXP-Complete to 2EXP-Complete  Probabilistic Planning: NP PP -Complete to Undecidable

6 January 18, 2007IJCAI'07 Tutorial T126 Scalability of Planning  Before, planning algorithms could synthesize about 6 – 10 action plans in minutes  Significant scale- up in the last 6-7 years  Now, we can synthesize 100 action plans in seconds. Realistic encodings of Munich airport! The primary revolution in planning in the recent years has been domain-independent heuristics to scale up plan synthesis Problem is Search Control!!!

7 January 18, 2007IJCAI'07 Tutorial T127 Motivation  Ways to improve Planner Scalability  Problem Formulation  Search Space  Reachability Heuristics  Domain (Formulation) Independent  Work for many search spaces  Flexible – work with most domain features  Overall complement other scalability techniques  Effective!!

8 January 18, 2007IJCAI'07 Tutorial T128 Topics  Classical Planning  Cost Based Planning  Partial Satisfaction Planning   Non-Deterministic/Probabilistic Planning  Resources (Continuous Quantities)  Temporal Planning  Wrap-up Rao Dan

9 January 18, 2007IJCAI'07 Tutorial T129 Classical Planning

10 January 18, 2007IJCAI'07 Tutorial T1210 Rover Domain

11 January 18, 2007IJCAI'07 Tutorial T1211 Classical Planning  Relaxed Reachability Analysis  Types of Heuristics  Level-based  Relaxed Plans  Mutexes  Heuristic Search  Progression  Regression  Plan Space  Exploiting Heuristics

12 January 18, 2007IJCAI'07 Tutorial T1212 Planning Graph and Search Tree  Envelope of Progression Tree (Relaxed Progression)  Proposition lists: Union of states at k th level  Lowerbound reachability information

13 January 18, 2007IJCAI'07 Tutorial T1213 Level Based Heuristics  The distance of a proposition is the index of the first proposition layer in which it appears  Proposition distance changes when we propagate cost functions – described later  What is the distance of a Set of propositions??  Set-Level: Index of first proposition layer where all goal propositions appear  Admissible  Gets better with mutexes, otherwise same as max  Max: Maximum distance proposition  Sum: Summation of proposition distances

14 January 18, 2007IJCAI'07 Tutorial T1214 Example of Level Based Heuristics set-level(s I, G) = 3 max(s I, G) = max(2, 3, 3) = 3 sum(s I, G) = 2 + 3 + 3 = 8

15 January 18, 2007IJCAI'07 Tutorial T1215 Distance of a Set of Literals SumSet-Level Partition-kAdjusted SumCombo Set-Level with memos h(S) =  p  S lev({p}) h(S) = lev(S) Admissible  lev(p) : index of the first level at which p comes into the planning graph  lev(S): index of the first level where all props in S appear non-mutexed.  If there is no such level, then If the graph is grown to level off, then  Else k+1 (k is the current length of the graph)

16 January 18, 2007IJCAI'07 Tutorial T1216 How do Level-Based Heuristics Break? g A p1p1 p2p2 p3p3 p 99 p 100 p1p1 p2p2 p3p3 p 99 p 100 B1 q B2 B3 B99 B100 qq B1 B2 B3 B99 B100 A1A1 P1P1 P2P2 A0A0 P0P0 The goal g is reached at level 2, but requires 101 actions to support it.

17 January 18, 2007IJCAI'07 Tutorial T1217 Relaxed Plan Heuristics  When Level does not reflect distance well, we can find a relaxed plan.  A relaxed plan is subgraph of the planning graph, where:  Every goal proposition is in the relaxed plan at the last level  Every proposition in the relaxed plan has a supporting action in the relaxed plan  Every action in the relaxed plan has its preconditions supported.  Relaxed Plans are not admissible, but are generally effective.  Finding the optimal relaxed plan is NP-hard, but finding a greedy one is easy. Later we will see how “greedy” can change.  Later, in over-subscription planning, we’ll see that the effort of finding an optimal relaxed plan is worthwhile

18 January 18, 2007IJCAI'07 Tutorial T1218 Example of Relaxed Plan Heuristic Count Actions RP(s I, G) = 8 Identify Goal Propositions Support Goal Propositions Individually

19 January 18, 2007IJCAI'07 Tutorial T1219 Results Relaxed Plan Level-Based

20 January 18, 2007IJCAI'07 Tutorial T1220 Optimizations in Heuristic Computation  Taming Space/Time costs  Bi-level Planning Graph representation  Partial expansion of the PG (stop before level-off)  It is FINE to cut corners when using PG for heuristics (instead of search)!!  Branching factor can still be quite high  Use actions appearing in the PG (complete)  Select actions in lev(S) vs Levels-off (incomplete)  Consider action appearing in RP (incomplete)

21 January 18, 2007IJCAI'07 Tutorial T1221 Adjusting for Negative Interactions  Until now we assumed actions only positively interact. What about negative interactions?  Mutexes help us capture some negative interactions  Types  Actions: Interference/Competing Needs  Propositions: Inconsistent Support  Binary are the most common and practical  |A| + 2|P|-ary will allow us to solve the planning problem with a backtrack-free GraphPlan search  An action layer may have |A| actions and 2|P| noops  Serial Planning Graph assumes all non-noop actions are mutex

22 January 18, 2007IJCAI'07 Tutorial T1222 Binary Mutexes at(  ) sample(soil,  ) drive( ,  ) drive( ,  ) avail(soil,  ) avail(rock,  ) avail(image,  ) at(  ) avail(soil,  ) avail(rock,  ) avail(image,  ) at(  ) at(  ) have(soil) sample(soil,  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) avail(rock,  ) avail(image,  ) at(  ) at(  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) commun(soil) sample(rock,  ) sample(image,  ) drive( ,  ) have(image) comm(soil) have(rock) A1A1 A0A0 P1P1 P0P0 P2P2 Set-Level(s I, {at(  ),.have(soil)}) = 2 Max(s I, {at(  ),. have(soil)}) = 1 sample needs at(  ), drive negates at(  ) --Interference-- have(soil) only supporter is mutex with at(  ) only supporter -- Inconsistent Support-- have(soil) has a supporter not mutex with a supporter of at(  ), --feasible together at level 2--

23 January 18, 2007IJCAI'07 Tutorial T1223 Adjusting the Relaxed Plans  Start with RP heuristic and adjust it to take subgoal interactions into account  Negative interactions in terms of “degree of interaction”  Positive interactions in terms of co-achievement links  Ignore negative interactions when accounting for positive interactions (and vice versa)  It is NP-hard to find a plan (when there are mutexes), and even NP-hard to find an optimal relaxed plan.  It is easier to add a penalty to the heuristic for ignored mutexes [AAAI 2000] HAdjSum2M(S) = length(RelaxedPlan(S)) + max p,q  S  (p,q) Where  (p,q) = lev({p,q}) – max{lev(p), lev(q)} /*Degree of –ve Interaction */

24 January 18, 2007IJCAI'07 Tutorial T1224 Anatomy of a State-space Regression planner [AAAI 2000; AIPS 2000; AIJ 2002; JAIR 2003] Problem: Given a set of subgoals (regressed state) estimate how far they are from the initial state

25 January 18, 2007IJCAI'07 Tutorial T1225 Rover Example in Regression comm(soil) comm(rock) comm(image) commun(image) comm(soil) comm(rock) have(image) comm(soil) have(rock) have(image) comm(soil) avail(rock,  ) have(image) at(  ) sample(rock,  ) commun(rock) comm(soil) avail(rock,  ) avail(image,  ) at(  ) at(  ) sample(image,  ) sGsG s1s1 s2s2 s3s3 s4s4 Sum(s I, s 3 ) = 0+1+2+2 =5 Sum(s I, s 4 ) = 0+0+1+1+2 =4 Should be ∞, s 4 is inconsistent, how do we improve the heuristic??

26 January 18, 2007IJCAI'07 Tutorial T1226 AltAlt Performance Logistics Scheduling Problem sets from IPC 2000 Adjusted RP Level-based

27 January 18, 2007IJCAI'07 Tutorial T1227 In the beginning it was all POP. Then it was cruelly UnPOPped The good times return with Re(vived)POP Plan Space Search

28 January 18, 2007IJCAI'07 Tutorial T1228 POP Algorithm 1.Plan Selection: Select a plan P from the search queue 2. Flaw Selection: Choose a flaw f (open cond or unsafe link) 3. Flaw resolution: If f is an open condition, choose an action S that achieves f If f is an unsafe link, choose promotion or demotion Update P Return NULL if no resolution exist 4. If there is no flaw left, return P S0S0 S1S1 S2S2 S3S3 S in f p ~p g1g1 g2g2 g2g2 oc 1 oc 2 q1q1 Choice points Flaw selection (open condition? unsafe link? Non-backtrack choice) Flaw resolution/Plan Selection (how to select (rank) partial plan?) S0S0 S inf g1g2g1g2 1. Initial plan: 2. Plan refinement (flaw selection and resolution):

29 January 18, 2007IJCAI'07 Tutorial T1229  Distance heuristics to estimate cost of partially ordered plans (and to select flaws)  If we ignore negative interactions, then the set of open conditions can be seen as a regression state  Mutexes used to detect indirect conflicts in partial plans  A step threatens a link if there is a mutex between the link condition and the steps’ effect or precondition  Post disjunctive precedences and use propagation to simplify PG Heuristics for Partial Order Planning

30 January 18, 2007IJCAI'07 Tutorial T1230 Regression and Plan Space comm(soil) comm(rock) comm(image) commun(image) comm(soil) comm(rock) have(image) comm(soil) have(rock) have(image) comm(soil) avail(rock,  ) have(image) at(  ) sample(rock,  ) S1S1 comm(rock) comm(image) comm(soil) S0S0 avail(rock,  ) avail(image,  ) avail(soil,  ) at(  ) commun(image) S1S1 comm(image) have(image) commun(rock) S2S2 comm(rock) have(rock) sample(rock,  ) S3S3 have(rock) avail(rock,  ) at(  ) commun(rock) comm(soil) avail(rock,  ) avail(image,  ) at(  ) at(  ) sample(image,  ) S4S4 have(image) avail(image,  ) at(  ) S1S1 comm(rock) comm(image) comm(soil) S0S0 avail(rock,  ) avail(image,  ) avail(soil,  ) at(  ) commun(image) S1S1 comm(image) have(image) commun(rock) S2S2 comm(rock) have(rock) sample(rock,  ) S3S3 have(rock) avail(rock,  ) at(  ) sGsG s1s1 s2s2 s3s3 s4s4

31 January 18, 2007IJCAI'07 Tutorial T1231 RePOP’s Performance  RePOP implemented on top of UCPOP  Dramatically better than any other partial order planner before it  Competitive with Graphplan and AltAlt  VHPOP carried the torch at ICP 2002 [IJCAI, 2001] You see, pop, it is possible to Re-use all the old POP work! Written in Lisp, runs on Linux, 500MHz, 250MB

32 January 18, 2007IJCAI'07 Tutorial T1232 Exploiting Planning Graphs  Restricting Action Choice  Use actions from:  Last level before level off (complete)  Last level before goals (incomplete)  First Level of Relaxed Plan (incomplete) – FF’s helpful actions  Only action sequences in the relaxed plan (incomplete) – YAHSP  Reducing State Representation  Remove static propositions. A static proposition is only ever true or false in the last proposition layer.

33 January 18, 2007IJCAI'07 Tutorial T1233 Classical Planning Conclusions  Many Heuristics  Set-Level, Max, Sum, Relaxed Plans  Heuristics can be improved by adjustments  Mutexes  Useful for many types of search  Progresssion, Regression, POCL

34 January 18, 2007IJCAI'07 Tutorial T1234 Cost-Based Planning

35 January 18, 2007IJCAI'07 Tutorial T1235 Cost-based Planning  Propagating Cost Functions  Cost-based Heuristics  Generalized Level-based heuristics  Relaxed Plan heuristics

36 January 18, 2007IJCAI'07 Tutorial T1236 Rover Cost Model

37 January 18, 2007IJCAI'07 Tutorial T1237 Cost Propagation at(  ) sample(soil,  ) drive( ,  ) drive( ,  ) avail(soil,  ) avail(rock,  ) avail(image,  ) at(  ) avail(soil,  ) avail(rock,  ) avail(image,  ) at(  ) at(  ) have(soil) sample(soil,  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) avail(rock,  ) avail(image,  ) at(  ) at(  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) commun(soil) sample(rock,  ) sample(image,  ) drive( ,  ) have(image) comm(soil) have(rock) 20 10 30 20 10 30 20 35 10 30 35 25 15 35 40 20 35 10 25 A1A1 A0A0 P1P1 P0P0 P2P2 Cost Reduces because Of different supporter At a later level

38 January 18, 2007IJCAI'07 Tutorial T1238 at(  ) Cost Propagation (cont’d) avail(soil,  ) avail(rock,  ) avail(image,  ) at(  ) at(  ) have(soil) have(image) comm(soil) have(rock) commun(image) commun(rock) comm(image) comm(rock) sample(soil,  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) avail(rock,  ) avail(image,  ) at(  ) at(  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) commun(soil) sample(rock,  ) sample(image,  ) drive( ,  ) have(image) comm(soil) have(rock) commun(image) commun(rock) comm(image) comm(rock) sample(soil,  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) avail(rock,  ) avail(image,  ) at(  ) at(  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) commun(soil) sample(rock,  ) sample(image,  ) drive( ,  ) have(image) comm(soil) have(rock) 20 10 30 20 10 30 20 35 10 25 30 35 40 25 30 40 15 25 30 25 15 35 A2A2 P2P2 P3P3 A3A3 P4P4 1-lookahead

39 January 18, 2007IJCAI'07 Tutorial T1239 Terminating Cost Propagation  Stop when:  goals are reached (no-lookahead)  costs stop changing (∞-lookahead)  k levels after goals are reached (k-lookahead)

40 January 18, 2007IJCAI'07 Tutorial T1240 Guiding Relaxed Plans with Costs Start Extract at last level (goal proposition is cheapest)

41 January 18, 2007IJCAI'07 Tutorial T1241 Cost-Based Planning Conclusions  Cost-Functions:  Remove false assumption that level is correlated with cost  Improve planning with non-uniform cost actions  Are cheap to compute (constant overhead)

42 January 18, 2007IJCAI'07 Tutorial T1242 Partial Satisfaction (Over-Subscription) Planning

43 January 18, 2007IJCAI'07 Tutorial T1243 Partial Satisfaction Planning  Selecting Goal Sets  Estimating goal benefit  Anytime goal set selection  Adjusting for negative interactions between goals

44 January 18, 2007IJCAI'07 Tutorial T1244 Actions have execution costs, goals have utilities, and the objective is to find the plan that has the highest net benefit. Partial Satisfaction (Oversubscription) Planning In many real world planning tasks, the agent often has more goals than it has resources to accomplish. Example: Rover Mission Planning (MER) Need automated support for Over-subscription/Partial Satisfaction Planning

45 January 18, 2007IJCAI'07 Tutorial T1245 Adapting PG heuristics for PSP  Challenges:  Need to propagate costs on the planning graph  The exact set of goals are not clear  Interactions between goals  Obvious approach of considering all 2 n goal subsets is infeasible  Idea: Select a subset of the top level goals upfront  Challenge: Goal interactions  Approach: Estimate the net benefit of each goal in terms of its utility minus the cost of its relaxed plan  Bias the relaxed plan extraction to (re)use the actions already chosen for other goals

46 January 18, 2007IJCAI'07 Tutorial T1246 Goal Set Selection In Rover Problem comm(soil) comm(rock)comm(image) 50-25 = 25 60-40 = 20 20-35 = -15 Found By Cost Propagation comm(rock)comm(image) 50-25 = 25 Found By RP 110-65 = 45 Found By Biased RP 70-60 = 20 comm(image) 130-100 = 30

47 January 18, 2007IJCAI'07 Tutorial T1247 SAPA PS (anytime goal selection)  A* Progression search  g-value: net-benefit of plan so far  h-value: relaxed plan estimate of best goal set  Relaxed plan found for all goals  Iterative goal removal, until net benefit does not increase  Returns plans with increasing g-values.

48 January 18, 2007IJCAI'07 Tutorial T1248 Some Empirical Results for AltAlt ps [AAAI 2004] Exact algorithms based on MDPs don’t scale at all

49 January 18, 2007IJCAI'07 Tutorial T1249 Adjusting for Negative Interactions (AltWlt)  Problem:  What if the apriori goal set is not achievable because of negative interactions?  What if greedy algorithm gets bad local optimum?  Solution:  Do not consider mutex goals  Add penalty for goals whose relaxed plan has mutexes.  Use interaction factor to adjust cost, similar to adjusted sum heuristic  max g1, g2 2 G {lev(g1, g2) – max(lev(g1), lev(g2)) }  Find Best Goal set for each goal

50 January 18, 2007IJCAI'07 Tutorial T1250 Goal Utility Dependencies High-res photo utility: 150 Low-res photo utility: 100 Both goals: Remove 80 High-res photo utility: 150 Soil sample utility: 100 Both goals: Additional 200 Cost: 50 Utility: 0 Cost: 100 Utility: 500 BAD GOOD

51 January 18, 2007IJCAI'07 Tutorial T1251 Dependencies cost dependencies Goals share actions in the plan trajectory Defined by the plan Goals may interact in the utility they give Explicitly defined by user utility dependencies Exists in classical planning, but is a larger issue in PSP No need to consider this in classical planning goal interactions exist as two distinct types All PSP PlannersSPUDS Our planner based on Sapa PS AltWlt, Optiplan, Sapa PS See talk Thurs. (11 th) in session G4 (Planning and Scheduling II)

52 January 18, 2007IJCAI'07 Tutorial T1252 (have left-shoe) (have right-shoe) +500 (have-image low-res loc1) 100 (have-image high-res loc1) (have-soil loc1) +200 Defining Goal Dependencies (have-image high-res loc1) 150 (have-soil loc1) 100 (have-image high-res loc1) (have-image low-res loc1) -80

53 January 18, 2007IJCAI'07 Tutorial T1253 Sapa PS g2 g3 g4 a1: Move(l1,l2) a2: Calibrate(camera) a3: Sample(l2) a4: Take_high_res(l2) a5: Take_low_res(l2) a6: Sample(l1) At(l1) At(l2)Calibrated c a1 =50 c a2 =20 c a3 = 40 c a4 =40 c a5 =25 g1 c a6 =40 Example of a relaxed plan (RP) in a rovers domain RP as a tree g1: Sample(l1) g2: Sample(l2) g3: HighRes(l2) g4: lowRes(l2) Propagate costs on relaxed planning graph Using the RP as a tree lets us remove supporting actions as we remove goals We remove goals whose relaxed cost outweighs utility Anytime Forward Search

54 January 18, 2007IJCAI'07 Tutorial T1254 SPUDS a1: Move(l1,l2) a2: Calibrate(camera) a3: Sample(l2) a4: Take_high_res(l2) a5: Take_low_res(l2) a6: Sample(l1) At(l1) At(l2)Calibrated c a1 =50 c a2 =20 g1 c a6 =40 RP as a tree In Sapa PS goals are removed independently (and in pairs) If cost > utility, the goal is removed. …but now we have more complicated goal interactions. New challenge: Goals no longer independent How do we select the best set of goals? Idea: Use declarative programming g1: Sample(l1) g2: Sample(l2) g3: HighRes(l2) g4: lowRes(l2) g3 c a4 =40 g4 c a5 =25 Example of a relaxed plan (RP) in a rovers domain

55 January 18, 2007IJCAI'07 Tutorial T1255 Heuristically Identifying Goals g2 g3 g4 At(l1) At(l2)Calibrated c a1 =50 c a2 =20 c a3 = 40 c a4 =40 c a5 =25 g1 c a6 =40 a1: Move(l1,l2) a2: Calibrate(camera) a3: Sample(l2) a4: Take_high_res(l2) a5: Take_low_res(l2) a6: Sample(l1) g1: Sample(l1) g2: Sample(l2) g3: HighRes(l2) g4: lowRes(l2) We can encode all goal sets and actions

56 January 18, 2007IJCAI'07 Tutorial T1256 Heuristically Identifying Goals goals goal utility dependencies actions Binary Variables We encode the relaxed plan Select goals in dependency Select actions for goals Select goals in dependency Constraints gives all goals actionis involved in reaching Objective function: wheregives the utility given by the goal set,cost ofand

57 January 18, 2007IJCAI'07 Tutorial T1257 Heuristically Identifying Goals g2 g3 g4 At(l1) At(l2)Calibrated c a1 =50 c a2 =20 c a3 = 40 c a4 =40 c a5 =25 g1 c a6 =40 a1: Move(l1,l2) a2: Calibrate(camera) a3: Sample(l2) a4: Take_high_res(l2) a5: Take_low_res(l2) a6: Sample(l1) g1: Sample(l1) g2: Sample(l2) g3: HighRes(l2) g4: lowRes(l2) We can now remove combinations of goals that give inoptimal relaxed solutions We are finding optimal goals given the (inadmissible) relaxed plan

58 January 18, 2007IJCAI'07 Tutorial T1258 Results

59 January 18, 2007IJCAI'07 Tutorial T1259 Results Only in this instance does Sapa PS do slightly better

60 January 18, 2007IJCAI'07 Tutorial T1260 PSP Conclusions  Goal Set Selection  Apriori for Regression Search  Anytime for Progression Search  Both types of search use greedy goal insertion/removal to optimize net-benefit of relaxed plans

61 January 18, 2007IJCAI'07 Tutorial T1261 Non-Deterministic Planning

62 January 18, 2007IJCAI'07 Tutorial T1262 Non-Deterministic Planning  Belief State Distance  Multiple Planning Graphs  Labelled Uncertainty Graph  Implicit Belief states and the CFF heuristic

63 January 18, 2007IJCAI'07 Tutorial T1263 Conformant Rover Problem

64 January 18, 2007IJCAI'07 Tutorial T1264 Search in Belief State Space at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) sample(soil,  ) drive( ,  ) have(soil) drive( ,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) drive( ,  ) have(soil) drive( ,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) sample(soil,  ) drive(b,  ) have(soil) sample(soil,  ) at(  ) avail(soil,  ) have(soil) at(  ) avail(soil,  ) have(soil) at(  ) avail(soil,  )

65 January 18, 2007IJCAI'07 Tutorial T1265 Belief State Distance BS 3 BS 1 BS 2 20 40 30 15 5 10 Max = 15, 20 Sum = 20, 20 Union = 17, 20 [ICAPS 2004] P2 P1 Compute Classical Planning Distance Measures Assume can reach closest goal state Aggregate State Distances

66 January 18, 2007IJCAI'07 Tutorial T1266 Belief State Distance BS 3 BS 1 15 5 Union = 17 a1 a3 a4 a5 a8 a44 a27 a14 a22 a37 a19 a50 a80 a34 a11 a19 a3 a4 a5 a13 a19 a13 P1 Capture Positive Interaction & Independence Estimate Plans for each state pair [ICAPS 2004]

67 January 18, 2007IJCAI'07 Tutorial T1267 State Distance Aggregations Max Sum Union Under-estimates Over-estimates Just Right [Bryce et.al, 2005]

68 January 18, 2007IJCAI'07 Tutorial T1268 at(  ) sample(soil,  ) drive( ,  ) drive( ,  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) at(  ) have(soil) at(  ) drive( ,  ) drive( ,  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) at(  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) at(  ) at(  ) sample(soil,  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) at(  ) drive( ,  ) drive( ,  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) at(  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) at(  ) at(  ) sample(soil,  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) at(  ) at(  ) sample(soil,  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) commun(soil) comm(soil) commun(soil) comm(soil) sample(soil,  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) at(  ) at(  ) have(soil) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) at(  ) at(  ) sample(soil,  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) commun(soil) comm(soil) A1A1 A0A0 A2A2 P1P1 P0P0 P2P2 P3P3 Multiple Planning Graphs Build A planning Graph For each State in the belief state Extract Relaxed Plans from each Step-wise union relaxed plans 1 3 3 h = 7

69 January 18, 2007IJCAI'07 Tutorial T1269 sample(soil,  ) avail(soil,  ) have(soil) at(  ) drive( ,  ) drive( ,  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) at(  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) at(  ) at(  ) sample(soil,  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) avail(soil,  ) sample(soil,  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) at(  ) at(  ) sample(soil,  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) commun(soil) comm(soil) commun(soil) comm(soil) sample(soil,  ) avail(soil,  ) avail(soil,  ) sample(soil,  ) sample(soil,  ) avail(soil,  ) A1A1 A0A0 A2A2 P1P1 P0P0 P2P2 P3P3 Labelled Uncertainty Graph Labels correspond To sets of states, and Are represented as Propositional formulas at(  ) Æ ( avail(soil,  ) Ç avail(soil,  ) Ç avail(soil,  ) ) Æ : at(  ) Æ … Action labels are the Conjunction (intersection) of their Precondition labels Effect labels are the Disjunction (union) of supporter labels Stop when goal is Labeled with every state

70 January 18, 2007IJCAI'07 Tutorial T1270 sample(soil,  ) avail(soil,  ) have(soil) at(  ) drive( ,  ) drive( ,  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) at(  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) at(  ) at(  ) sample(soil,  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) avail(soil,  ) sample(soil,  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) at(  ) at(  ) sample(soil,  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) commun(soil) comm(soil) commun(soil) comm(soil) sample(soil,  ) avail(soil,  ) avail(soil,  ) sample(soil,  ) sample(soil,  ) avail(soil,  ) A1A1 A0A0 A2A2 P1P1 P0P0 P2P2 P3P3 Labelled Relaxed Plan Must Pick Enough Supporters to cover the (sub)goals Subgoals and Supporters Need not be used for Every state where they are reached (labeled) h = 6

71 January 18, 2007IJCAI'07 Tutorial T1271 Comparison of Planning Graph Types [JAIR, 2006] LUG saves Time Sampling more worlds, Increases cost of MG Sampling more worlds, Improves effectiveness of LUG

72 January 18, 2007IJCAI'07 Tutorial T1272 State Agnostic Planning Graphs (SAG)  LUG represents multiple explicit planning graphs  SAG uses LUG to represent a planning graph for every state  The SAG is built once per search episode and we can use it for relaxed plans for every search node, instead of building a LUG at every node  Extract relaxed plans from SAG by ignoring planning graph components not labeled by states in our search node.

73 January 18, 2007IJCAI'07 Tutorial T1273 SAG 1 3 1 5 1 3 4 5 1 3 5 o 12 o 34 o 56 2 1 3 4 5 o 12 o 34 o 23 o 45 o 56 2 66 7 o 67 oGoG G GG oGoG oGoG 3 5  Ignore irrelevant labels  Largest LUG == all LUGs [ AAAI, 2005] Build a LUG for all states (union of all belief states)

74 January 18, 2007IJCAI'07 Tutorial T1274 Belief Space ProblemsClassical Problems Conformant Conditional

75 January 18, 2007IJCAI'07 Tutorial T1275 Non-Deterministic Planning Conclusions  Measure positive interaction and independence between states co- transitioning to the goal via overlap  Labeled planning graphs efficiently measure conformant plan distance  Conformant planning heuristics work for conditional planning without modification

76 January 18, 2007IJCAI'07 Tutorial T1276 Stochastic Planning

77 January 18, 2007IJCAI'07 Tutorial T1277 Stochastic Rover Example [ICAPS 2006]

78 January 18, 2007IJCAI'07 Tutorial T1278 Search in Probabilistic Belief State Space at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) sample(soil,  ) drive( ,  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) sample(soil,  ) 0.4 0.5 0.1 at(  ) avail(soil,  ) 0.36 0.5 0.1 0.4 0.5 0.1 0.04 at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) have(soil) at(  ) avail(soil,  ) 0.36 0.5 0.1 0.04 at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) have(soil) at(  ) avail(soil,  ) 0.45 0.1 0.04 0.36 at(  ) avail(soil,  ) have(soil) 0.05

79 January 18, 2007IJCAI'07 Tutorial T1279 Handling Uncertain Actions  Extending LUG to handle uncertain actions requires label extension that captures:  State uncertainty (as before)  Action outcome uncertainty  Problem: Each action at each level may have a different outcome. The number of uncertain events grows over time – meaning the number of joint outcomes of events grows exponentially with time  Solution: Not all outcomes are important. Sample some of them – keep number of joint outcomes constant. [ICAPS 2006]

80 January 18, 2007IJCAI'07 Tutorial T1280 Monte Carlo LUG (McLUG)  Use Sequential Monte Carlo in the Relaxed Planning Space  Build several deterministic planning graphs by sampling states and action outcomes  Represent set of planning graphs using LUG techniques  Labels are sets of particles  Sample which Action outcomes get labeled with particles  Bias relaxed plan by picking actions labeled with most particles to prefer more probable support [ICAPS 2006]

81 January 18, 2007IJCAI'07 Tutorial T1281 at(  ) avail(soil,  ) Relaxed Conformant GraphPlan (CGP) at(  ) sample(soil,  ) drive( ,  ) drive( ,  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) at(  ) have(soil) comm(soil) at(  ) avail(soil,  ) at(  ) at(  ) have(soil) A1A1 A0A0 P1P1 P0P0 P2P2 at(  ) avail(soil,  ) at(  ) at(  ) at(  ) avail(soil,  ) at(  ) at(  ) have(soil) comm(soil) at(  ) avail(soil,  ) at(  ) at(  ) have(soil) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) at(  ) at(  ) have(soil) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) 0.4 0.5 0.1 commun(soil) sample(soil,  ) Generate a proposition layer for each joint Outcome of actions Initial Proposition Layer For Each Possible State

82 January 18, 2007IJCAI'07 Tutorial T1282 CGP-style planning graph 0.4 0.5 0.1 Planning Graph is a tree Of Deterministic Planning Graph Branches

83 January 18, 2007IJCAI'07 Tutorial T1283 Monte Carlo CGP 0.4 0.5 0.1 P(G) = 5/16 = 0.3125 P(G) = 8/16 = 0.5 P(G) = 13/16 = 0.8125 Problem: Have Multiple Planning Graphs, Which can still be costly Solution: Use Labeled Planning Graph

84 January 18, 2007IJCAI'07 Tutorial T1284 Monte Carlo LUG (McLUG) -- Initial Layer at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) 0.4 0.5 0.1 at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) avail(soil,  ) N = 4 Sample a State for Each particle Form Initial Layer at(  ) avail(soil,  ) avail(soil,  )

85 January 18, 2007IJCAI'07 Tutorial T1285 McLUG for rover example sample(soil,  ) avail(soil,  ) have(soil) at(  ) drive( ,  ) drive( ,  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) at(  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) at(  ) at(  ) sample(soil,  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) at(  ) at(  ) sample(soil,  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) commun(soil) comm(soil) commun(soil) comm(soil) sample(soil,  ) avail(soil,  ) sample(soil,  ) avail(soil,  ) A1A1 A0A0 A2A2 P1P1 P0P0 P2P2 P3P3 Sample States for Initial layer -- avail(soil,  ) not sampled Particles in action Label must sample action outcome ¼ of particles Support goal, need At least ½ ¾ of particles Support goal, Okay to stop [ICAPS 2006]

86 January 18, 2007IJCAI'07 Tutorial T1286 Monte Carlo LUG (McLUG) sample(soil,  ) avail(soil,  ) have(soil) at(  ) drive( ,  ) drive( ,  ) avail(soil,  ) at(  ) avail(soil,  ) at(  ) at(  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) at(  ) at(  ) sample(soil,  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) at(  ) avail(soil,  ) at(  ) at(  ) sample(soil,  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) drive( ,  ) commun(soil) comm(soil) commun(soil) comm(soil) sample(soil,  ) avail(soil,  ) sample(soil,  ) avail(soil,  ) A1A1 A0A0 A2A2 P1P1 P0P0 P2P2 P3P3 Must Support Goal in All Particles Pick Persistence By default, Commun covers Other particles Support Preconditions for Particles the Action supports

87 January 18, 2007IJCAI'07 Tutorial T1287 Logistics Domain Results P2-2-2 time (s) P4-2-2 time (s)P2-2-4 time (s) P2-2-2 length P4-2-2 lengthP2-2-4 length [ICAPS 2006] Scalable, w/ reasonable quality

88 January 18, 2007IJCAI'07 Tutorial T1288 Grid Domain Results Grid(0.8) time(s) Grid(0.8) length Grid(0.5) time(s) Grid(0.5) length Again, good scalability and quality! Need More Particles for broad beliefs [ICAPS 2006]

89 January 18, 2007IJCAI'07 Tutorial T1289 Stochastic Planning Conclusions  Number of joint action outcomes too large  Sampling outcomes to represent in labels is much faster than exact representation  SMC gives us a good way to use multiple planning graph for heuristics, and the McLUG helps keep the representation small

90 January 18, 2007IJCAI'07 Tutorial T1290 Planning with Resources

91 January 18, 2007IJCAI'07 Tutorial T1291 Planning with Resources  Propagating Resource Intervals  Relaxed Plans  Handling resource subgoals

92 January 18, 2007IJCAI'07 Tutorial T1292 Rover with power Resource Resource Usage, Same as costs for This example

93 January 18, 2007IJCAI'07 Tutorial T1293 Resource Intervals at(  ) avail(soil,  ) avail(rock,  ) avail(image,  ) sample(soil,  ) drive( ,  ) at(  ) avail(soil,  ) avail(rock,  ) avail(image,  ) at(  ) have(soil) sample(soil,  ) drive( ,  ) recharge at(  ) avail(soil,  ) avail(rock,  ) avail(image,  ) at(  ) at(  ) have(soil) drive( ,  ) drive( ,  ) commun(soil) sample(rock,  ) comm(soil) have(rock) power [-5,50] recharge power [25,25] drive(  ) power [-115,75] 530 5 A1A1 A0A0 P1P1 P0P0 P2P2 Resource Interval assumes independence among Consumers/Producers UB: UB(noop) + recharge = 25 + 25 = 50 LB: LB(noop) + sample + drive = 25 + (-20) + (-10) = -5 UB: UB(noop) + recharge = 50 + 25 = 75 LB: LB(noop) + drive + sample + drive + sample + commun + drive + drive = -5 + (-30) + (-20) + (-10) + (-25) + (-5) + (-15) + (-5) = -115

94 January 18, 2007IJCAI'07 Tutorial T1294 Resource Intervals (cont’d) at(  ) avail(soil,  ) avail(rock,  ) avail(image,  ) at(  ) at(  ) have(soil) comm(soil) have(rock) commun(image) commun(rock) comm(image) comm(rock) at(  ) at(  ) at(  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) commun(soil) sample(image,  ) drive( ,  ) have(image) comm(soil) have(rock) sample(soil,  ) drive( ,  ) recharge avail(soil,  ) avail(rock,  ) avail(image,  ) drive(  ) sample(rock,  ) commun(rock) comm(rock) at(  ) at(  ) at(  ) have(soil) drive( ,  ) drive( ,  ) drive( ,  ) commun(soil) sample(image,  ) drive( ,  ) have(image) comm(soil) have(rock) sample(soil,  ) drive( ,  ) recharge avail(soil,  ) avail(rock,  ) avail(image,  ) power [-115,75] power [-250,100] power [-390,125] drive(  ) sample(rock,  ) 55 A2A2 P2P2 P3P3 A3A3 P4P4

95 January 18, 2007IJCAI'07 Tutorial T1295 Relaxed Plan Extraction with Resources Start Extraction as before Track “Maximum” Resource Requirements For actions chosen at each level May Need more than One Supporter For a resource!!

96 January 18, 2007IJCAI'07 Tutorial T1296 Results

97 January 18, 2007IJCAI'07 Tutorial T1297 Results (cont’d)

98 January 18, 2007IJCAI'07 Tutorial T1298 Planning With Resources Conclusion  Resource Intervals allow us to be optimistic about reachable values  Upper/Lower bounds can get large  Relaxed Plans may require multiple supporters for subgoals  Negative Interactions are much harder to capture

99 January 18, 2007IJCAI'07 Tutorial T1299 Temporal Planning

100 January 18, 2007IJCAI'07 Tutorial T12100 Temporal Planning  Temporal Planning Graph  From Levels to Time Points  Delayed Effects  Estimating Makespan  Relaxed Plan Extraction

101 January 18, 2007IJCAI'07 Tutorial T12101 Rover with Durative Actions

102 January 18, 2007IJCAI'07 Tutorial T12102 Search through time-stamped states  Goal Satisfaction: S=(P,M, ,Q,t)  G if   G either:    P, t j < t i and no event in Q deletes p i.   e  Q that adds p i at time t e < t i.  Action Application: Action A is applicable in S if:  All instantaneous preconditions of A are satisfied by P and M.  A’s effects do not interfere with  and Q.  No event in Q interferes with persistent preconditions of A.  A does not lead to concurrent resource change  When A is applied to S:  P is updated according to A’s instantaneous effects.  Persistent preconditions of A are put in   Delayed effects of A are put in Q.

103 January 18, 2007IJCAI'07 Tutorial T12103 Temporal Planning Record First time Point Action/Proposition is First Reachable Assume Latest Start Time for actions in RP

104 January 18, 2007IJCAI'07 Tutorial T12104 SAPA at IPC-2002 Rover (time setting) Satellite (complex setting) [JAIR 2003]

105 January 18, 2007IJCAI'07 Tutorial T12105 Temporal Planning Conclusion  Levels become Time Points  Makespan and plan length/cost are different objectives  Set-Level heuristic measures makespan  Relaxed Plans measure makespan and plan cost

106 January 18, 2007IJCAI'07 Tutorial T12106 Overall Conclusions  Relaxed Reachability Analysis  Concentrate strongly on positive interactions and independence by ignoring negative interaction  Estimates improve with more negative interactions  Heuristics can estimate and aggregate costs of goals or find relaxed plans  Propagate numeric information to adjust estimates  Cost, Resources, Probability, Time  Solving hybrid problems is hard  Extra Approximations  Phased Relaxation  Adjustments/Penalties

107 January 18, 2007IJCAI'07 Tutorial T12107 Why do we love PG Heuristics?  They work!  They are “forgiving”  You don't like doing mutex? okay  You don't like growing the graph all the way? okay.  Allow propagation of many types of information  Level, subgoal interaction, time, cost, world support, probability  Support phased relaxation  E.g. Ignore mutexes and resources and bring them back later…  Graph structure supports other synergistic uses  e.g. action selection  Versatility…

108 January 18, 2007IJCAI'07 Tutorial T12108  PG Variations  Serial  Parallel  Temporal  Labelled  Propagation Methods  Level  Mutex  Cost  Label  Planning Problems  Classical  Resource/Temporal  Conformant  Planners  Regression  Progression  Partial Order  Graphplan-style Versatility of PG Heuristics


Download ppt "Planning Graph Based Reachability Heuristics Daniel Bryce & Subbarao Kambhampati IJCAI’07 Tutorial 12 January 8, 2007"

Similar presentations


Ads by Google