1001 Ways to Skin a Planning Graph for Heuristic Fun and Profit Subbarao Kambhampati Arizona State University (With tons of.

Slides:



Advertisements
Similar presentations
Forward-Chaining Partial-Order Planning Amanda Coles, Andrew Coles, Maria Fox and Derek Long (to appear, ICAPS 2010)
Advertisements

CLASSICAL PLANNING What is planning ?  Planning is an AI approach to control  It is deliberation about actions  Key ideas  We have a model of the.
Top 5 Worst Times For A Conference Talk 1.Last Day 2.Last Session of Last Day 3.Last Talk of Last Session of Last Day 4.Last Talk of Last Session of Last.
Probabilistic Planning (goal-oriented) Action Probabilistic Outcome Time 1 Time 2 Goal State 1 Action State Maximize Goal Achievement Dead End A1A2 I A1.
Finding Search Heuristics Henry Kautz. if State[node] is not in closed OR g[node] < g[LookUp(State[node],closed)] then A* Graph Search for Any Admissible.
1 Graphplan José Luis Ambite * [* based in part on slides by Jim Blythe and Dan Weld]
Effective Approaches for Partial Satisfaction (Over-subscription) Planning Romeo Sanchez * Menkes van den Briel ** Subbarao Kambhampati * * Department.
Graph-based Planning Brian C. Williams Sept. 25 th & 30 th, J/6.834J.
Planning Graphs * Based on slides by Alan Fern, Berthe Choueiry and Sungwook Yoon.
Planning and Scheduling. 2 USC INFORMATION SCIENCES INSTITUTE Some background Many planning problems have a time-dependent component –  actions happen.
Best-First Search: Agendas
Planning CSE 473 Chapters 10.3 and 11. © D. Weld, D. Fox 2 Planning Given a logical description of the initial situation, a logical description of the.
11/15: Planning in Belief Space contd.. Home Work 3 returned; Homework 4 assigned Avg Std. Dev Median57 Agenda: Long post-mortem on Kanna.
9/14: Belief Search Heuristics Today: Planning graph heuristics for belief search Wed: MDPs.
Planning Graph Based Reachability Heuristics Daniel Bryce & Subbarao Kambhampati IJCAI’07 Tutorial 12 January 8, 2007
Planning Graph Based Reachability Heuristics Daniel Bryce & Subbarao Kambhampati ICAPS’06 Tutorial 6 June 7, 2006
OnT-A onT-B onT-C cl-A cl-C cl-B he Pick-A Pick-B Pick-C onT-A onT-B onT-C cl-A cl-C cl-B he h-A h-B h-C ~cl-A ~cl-B ~cl-C ~he st-A-B st-A-C st-B-A st-B-C.
3/25  Monday 3/31 st 11:30AM BYENG 210 Talk by Dana Nau Planning for Interactions among Autonomous Agents.
Planning: Part 3 Planning Graphs COMP151 April 4, 2007.
Scalability of Planning  Before, planning algorithms could synthesize about 6 – 10 action plans in minutes  Significant scale-up in the last 6-7 years.
4/8: Cost Propagation & Partialization Today’s lesson: Beware of solicitous suggestions from juvenile cosmetologists Exhibit A: Abe Lincoln Exhibit B:
10/18: Temporal Planning (Contd) 10/25: Rao out of town; midterm Today:  Temporal Planning with progression/regression/Plan-space  Heuristics for temporal.
State Agnostic Planning Graphs William Cushing Daniel Bryce Arizona State University {william.cushing, Special thanks to: Subbarao Kambhampati,
Planning Graph-based Heuristics for Cost-sensitive Temporal Planning Minh B. Do & Subbarao Kambhampati CSE Department, Arizona State University
Handling non-determinism and incompleteness. Problems, Solutions, Success Measures: 3 orthogonal dimensions  Incompleteness in the initial state  Un.
A: A Unified Brand-name-Free Introduction to Planning Subbarao Kambhampati 11 th Feb.
Planning II CSE 473. © Daniel S. Weld 2 Logistics Tournament! PS3 – later today Non programming exercises Programming component: (mini project) SPAM detection.
Expressive and Efficient Frameworks for Partial Satisfaction Planning Subbarao Kambhampati Arizona State University (Proposal submitted for consideration.
Minh Do - PARC Planning with Goal Utility Dependencies J. Benton Department of Computer Science Arizona State University Tempe, AZ Subbarao.
Heuristics for Planning  Initial project advisors assigned—please contact them  Project 1 ready; due 2/14  Due to underwhelming interest, note-taking.
CS121 Heuristic Search Planning CSPs Adversarial Search Probabilistic Reasoning Probabilistic Belief Learning.
Class 2 (11/21) j He. Scalability of Planning  Before, planning algorithms could synthesize about 6 – 10 action plans in minutes  Significant scale-up.
1 Planning Chapters 11 and 12 Thanks: Professor Dan Weld, University of Washington.
SAPA: A Domain-independent Heuristic Temporal Planner Minh B. Do & Subbarao Kambhampati Arizona State University.
Local Search Techniques for Temporal Planning in LPG Paper by Gerevini, Serina, Saetti, Spinoni Presented by Alex.
RePOP: Reviving Partial Order Planning
Planning II CSE 573. © Daniel S. Weld 2 Logistics Reading for Wed Ch 18 thru 18.3 Office Hours No Office Hour Today.
Classical Planning Chapter 10.
GraphPlan Alan Fern * * Based in part on slides by Daniel Weld and José Luis Ambite.
For Wednesday Read chapter 12, sections 3-5 Program 2 progress due.
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
Exam #2 statistics (total = 100pt) u CS480: 12 registered, 9 took exam #2  Average:  Max: 100 (2)  Min: 68 u CS580: 8 registered, 8 took exam.
Homework 1 ( Written Portion )  Max : 75  Min : 38  Avg : 57.6  Median : 58 (77%)
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
Heuristics in Search-Space CSE 574 April 11, 2003 Dan Weld.
June 6 th, 2005 ICAPS-2005 Workshop on Constraint Programming for Planning and Scheduling 1/12 Stratified Heuristic POCL Temporal Planning based on Planning.
AI Lecture 17 Planning Noémie Elhadad (substituting for Prof. McKeown)
RePOP: Reviving Partial Order Planning XuanLong Nguyen & Subbarao Kambhampati Yochan Group:
Classical Planning Chapter 10 Mausam / Andrey Kolobov (Based on slides of Dan Weld, Marie desJardins)
© Daniel S. Weld 1 Logistics Travel Wed class led by Mausam Week’s reading R&N ch17 Project meetings.
Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making Graphplan Based on slides by: Ambite, Blyth and.
Graphplan CSE 574 April 4, 2003 Dan Weld. Schedule BASICS Intro Graphplan SATplan State-space Refinement SPEEDUP EBL & DDB Heuristic Gen TEMPORAL Partial-O.
By J. Hoffmann and B. Nebel
Heuristic Search Planners. 2 USC INFORMATION SCIENCES INSTITUTE Planning as heuristic search Use standard search techniques, e.g. A*, best-first, hill-climbing.
EBL & DDB for Graphplan (P lanning Graph as Dynamic CSP: Exploiting EBL&DDB and other CSP Techniques in Graphplan) Subbarao Kambhampati Arizona State University.
1 Chapter 6 Planning-Graph Techniques. 2 Motivation A big source of inefficiency in search algorithms is the branching factor  the number of children.
RePOP: Reviving Partial Order Planning
Temporal Graphplan David E. Smith & Dan Weld.
Planning AIMA: 10.1, 10.2, Follow slides and use textbook as reference
Class #17 – Thursday, October 27
Planning José Luis Ambite.
Graph-based Planning Slides based on material from: Prof. Maria Fox
Graphplan/ SATPlan Chapter
Planning Problems On(C, A)‏ On(A, Table)‏ On(B, Table)‏ Clear(C)‏
Class #19 – Monday, November 3
Graphplan/ SATPlan Chapter
Heuristic Planning with Time and Resources
Graphplan/ SATPlan Chapter
GraphPlan Jim Blythe.
[* based in part on slides by Jim Blythe and Dan Weld]
Presentation transcript:

1001 Ways to Skin a Planning Graph for Heuristic Fun and Profit Subbarao Kambhampati Arizona State University (With tons of help from Daniel Bryce, Minh Binh Do, Xuan Long Nguyen Romeo Sanchez Nigenda, Biplav Srivastava, Terry Zimmerman) Funding from NSF & NASA 987 WMD-in-the-toilet “After the flush, you may find that there were no bombs to begin with”

Planning Graph and Projection Envelope of Progression Tree (Relaxed Progression) –Proposition lists: Union of states at k th level –Mutex: Subsets of literals that cannot be part of any legal state Lowerbound reachability information p pqrspqrs pqrstpqrst A1 A2 A3 A1 A2 A3 A4 [Blum&Furst, 1995] [ECP, 1997] p pq pr ps pqr pq pqs psq ps pst A1 A2 A3 A2 A1 A3 A1 A3 A4 Planning Graphs can be used as the basis for heuristics!

And PG Heuristics for all.. –Classical (regression) planning –AltAlt (AAAI 2000; AIJ 2002); AltAlt p (JAIR 2003) Serial vs. Parallel graphs; Level and Adjusted heuristics; Partial expansion –Graphplan style search –GP-HSP (AIPS 2000) Variable/Value ordering heuristics based on distances –Partial order planning – RePOP (IJCAI 2001) Mutexes used to detect Indirect Conflicts –Metric Temporal Planning –Sapa (ECP 2001; AIPS 2002; JAIR 2003) Propagation of cost functions; Phased relaxation –Conformant Planning – CAltAlt (ICAPS Uncertanity Wkshp, 2003) Multiple graphs; Labelled graphs

And PG Heuristics for all.. –Classical (regression) planning –AltAlt (AAAI 2000; AIJ 2002); AltAlt p (JAIR 2003) Serial vs. Parallel graphs; Level and Adjusted heuristics; Partial expansion –Graphplan style search –GP-HSP (AIPS 2000); PEGG (IJCAI 2003; AAAI 1999] Variable/Value ordering heuristics based on distances –Partial order planning – RePOP (IJCAI 2001) Mutexes used to detect Indirect Conflicts –Metric Temporal Planning –Sapa (ECP 2001; AIPS 2002; JAIR 2003) Propagation of cost functions; Phased relaxation –Conformant Planning – CAltAlt (ICAPS Uncertanity Wkshp, 2003) Multiple graphs; Labelled graphs Caveat: “All Tempe, All the time”

I. PG Heuristics for State-space (Regression) planners [AAAI 2000; AIPS 2000; AIJ 2002; JAIR 2003] Problem: Given a set of subgoals (regressed state) estimate how far they are from the initial state

Planning Graphs: Optimistic Projection of Achievability At(0,0) Key(0,1) Prop list Level 0 At(0,1) At(1,0) noop Action list Level 0 Move(0,0,0,1) Move(0,0,1,0) x At(0,0) key(0,1) Prop list Level 1 x At(0,0) Key(0,1) noop x Action list Level 1 x Prop list Level 2 Move(0,1,1,1) At(1,1) At(1,0) At(0,1) Move(1,0,1,1) noop x x x x x x …... x Pick_key(0,1) Have_key ~Key(0,1)x x x x x Mutexes Initial state Goal state Grid Problem Serial PG: PG where any pair of non-noop actions are marked mutex lev(S): index of the first level where all props in S appear non-mutexed. –If there is no such level, then If the graph is grown to level off, then  Else k+1 (k is the current length of the graph)

Cost of a Set of Literals lev(p) : index of the first level at which p comes into the planning graph lev(S): index of the first level where all props in S appear non-mutexed. –If there is no such level, then If the graph is grown to level off, then  Else k+1 (k is the current length of the graph) SumSet-Level Partition-kAdjusted SumCombo Set-Level with memos h(S) =  p  S lev({p}) h(S) = lev(S) Admissible

Adjusting the Sum Heuristic Start with Sum heuristic and adjust it to take subgoal interactions into account –Negative interactions in terms of “degree of interaction” –Positive interactions in terms of co-achievement links Ignore negative interactions when accounting for positive interactions (and vice versa) [AAAI 2000] HAdjSum2M(S) = length(RelaxedPlan(S)) + max p,q  S  (p,q) Where  (p,q) = lev({p,q}) - max{lev(p), lev(q)} /*Degree of –ve Interaction */

Optimizations in Heuristic Computation Taming Space/Time costs Bi-level Planning Graph representation Partial expansion of the PG (stop before level-off) –It is FINE to cut corners when using PG for heuristics (instead of search)!! Branching factor can still be quite high –Use actions appearing in the PG Select actions in lev(S) vs Levels-off

AltAlt Performance Logistics Scheduling Problem sets from IPC 2000

Even Parallel Plans aren’t safe.. [JAIR 2003] Serial graph over-estimates Use “parallel” rather than serial PG as the basis for heuristics Projection over sets of actions too costly Select the branch with the best action and fatten it Use “push-up” to make the partial plans more parallel

II. PG heuristics for Graphplan..

PG Heuristics for Graphplan(!) Goal/Action Ordering Heuristics for Backward Search Propositions are ordered for consideration in decreasing value of their levels. Actions supporting a proposition are ordered for consideration in increasing values of their costs –Cost of an action = 1 + Cost of its set of preconditions Use of level heuristics improves the performance significantly. –The heuristics are surprisingly insensitive to the length of the planning graph [AIPS 2000]

2 4 3 …And then state-space heuristics for Graphplan (PEGG) EYQEYQ EYRTEYRT EFREFR - Init State A C E F K 0 1 Goal X Y Z 5 XWQXWQ WTSWTS WTRWTR Planning Graph (proposition levels) 6 1: Capture a state space view of Graphplan’s search in a search trace X Y a2 a3 a4 Z action assignments Regressed ‘states’ No solution? extend graph…

6 Init State A C E F K WEWE RERE CETCET FDKFDK FWFW 7 YFYF Goal X Y Z WRWR WREWRE XWQXWQ WTSWTS WTRWTR EYQEYQ EYRTEYRT EFREFR FRFR EFJEFJ FRAFRA EYRTEYRT WREWRE FRFR …And then state-space heuristics for Graphplan

PEGG now competitive with a heuristic state space planner [IJCAI 2003]

In the beginning it was all POP. Then it was cruelly UnPOPped The good times return with Re(vived)POP III. PG Heuristics for PO Planners

POP Algorithm 1.Plan Selection: Select a plan P from the search queue 2. Flaw Selection: Choose a flaw f (open cond or unsafe link) 3. Flaw resolution: If f is an open condition, choose an action S that achieves f If f is an unsafe link, choose promotion or demotion Update P Return NULL if no resolution exist 4. If there is no flaw left, return P S0S0 S1S1 S2S2 S3S3 S in f p ~p g1g1 g2g2 g2g2 oc 1 oc 2 q1q1 Choice points Flaw selection (open condition? unsafe link? Non-backtrack choice) Flaw resolution/Plan Selection (how to select (rank) partial plan?) S0S0 S inf g1g2g1g2 1. Initial plan: 2. Plan refinement (flaw selection and resolution):

Distance heuristics to estimate cost of partially ordered plans (and to select flaws) –If we ignore negative interactions, then the set of open conditions can be seen as a regression state Mutexes used to detect indirect conflicts in partial plans –A step threatens a link if there is a mutex between the link condition and the steps’ effect or precondition –Post disjunctive precedences and use propagation to simplify PG Heuristics for Partial Order Planning

RePOP’s Performance RePOP implemented on top of UCPOP –Dramatically better than any other partial order planner –Competitive with Graphplan and AltAlt –VHPOP carried the torch at ICP 2002 [IJCAI, 2001] You see, pop, it is possible to Re-use all the old POP work! Written in Lisp, runs on Linux, 500MHz, 250MB

IV. PG Heuristics for Metric Temporal Planning [ECP 2001; AIPS 2002; ICAPS 2003; JAIR 2003]

Multi-Objective Nature of MTP Plan quality in Metric Temporal domains is inherently Multi- dimensional –Temporal quality (e.g. makespan, slack) –Plan cost (e.g. cumulative action cost, resource consumption) Necessitates multi-objective search –Modeling objective functions –Tracking different quality metrics and heuristic estimation  Challenge: Inter-dependencies between different quality metrics  Typically cost will go down with higher makespan… Tempe Phoenix L.A

SAPA’s approach Use a temporal version of the Planning Graph (Smith & Weld) structure to track the time-sensitive cost function: –Estimation of the earliest time (makespan) to achieve all goals. –Estimation of the lowest cost to achieve goals –Estimation of the cost to achieve goals given the specific makespan value. Use this information to calculate the heuristic value for the objective function involving both time and cost  Challenge: How to propagate cost over planning graphs?

Search through time-stamped states Goal Satisfaction: S=(P,M, ,Q,t)  G if   G either: –   P, t j < t i and no event in Q deletes p i. –  e  Q that adds p i at time t e < t i. Action Application: Action A is applicable in S if: –All instantaneous preconditions of A are satisfied by P and M. –A’s effects do not interfere with  and Q. –No event in Q interferes with persistent preconditions of A. –A does not lead to concurrent resource change When A is applied to S: –P is updated according to A’s instantaneous effects. –Persistent preconditions of A are put in  –Delayed effects of A are put in Q.

Propagating Cost Functions Tempe Phoenix L.A time $300 $220 $100  t = 1.5 t = 10 Shuttle(Tempe,Phx): Cost: $20; Time: 1.0 hour Helicopter(Tempe,Phx): Cost: $100; Time: 0.5 hour Car(Tempe,LA): Cost: $100; Time: 10 hour Airplane(Phx,LA): Cost: $200; Time: 1.0 hour 1 Drive-car(Tempe,LA) Hel(T,P) Shuttle(T,P) t = 0 Airplane(P,LA) t = t = 1 Cost(At(LA))Cost(At(Phx)) = Cost(Flight(Phx,LA)) Airplane(P,LA) t = 2.0 $20

Issues in Cost Propagation Costing a set of literals Cost(f,t) = min {Cost(A,t) : f  Effect(A)} Cost(A,t) = Aggregate(Cost(f,t): f  Pre(A)) Aggregate can be Sum or Max Set-level idea would entail tracking costs of subsets of literals Termination Criteria Deadline Termination: Terminate at time point t if: –  goal G: Deadline(G)  t –  goal G: (Deadline(G) < t)  (Cost(G,t) =  Fix-point Termination: Terminate at time point t where we can not improve the cost of any proposition. K-lookahead approximation: At t where Cost(g,t) < , repeat the process of applying (set) of actions that can improve the cost functions k times.

Heuristics based on cost functions If we want to minimize makespan: – h = t 0 If we want to minimize cost – h = CostAggregate(G, t  ) If we want to minimize a function f(time,cost) of cost and makespan –h = min f(t,Cost(G,t)) s.t. t 0  t  t  E.g. f(time,cost) = 100.makespan + Cost then h = 100x at t 0  t = 2  t  time cost 0 t 0 =1.52t  = 10 $300 $220 $100  Cost(At(LA)) Time of Earliest achievement Time of lowest cost Direct Extract a relaxed plan using h as the bias –If the objective function is f(time,cost), then action A ( to be added to RP) is selected such that: f(t(RP+A),C(RP+A)) + f(t(G new ),C(G new )) is minimal G new = (G  Precond(A)) \ Effects) Using Relaxed Plan

Phased Relaxation Adjusting for Resource Interactions: Estimate the number of additional resource-producing actions needed to make-up for any resource short-fall in the relaxed plan C = C +  R  (Con(R) – (Init(R)+Pro(R)))/  R  * C(A R ) Adjusting for Mutexes: Adjust the make-span estimate of the relaxed plan by marking actions that are mutex (and thus cannot be executed concurrently The relaxed plan can be adjusted to take into account constraints that were originally ignored

Handling Cost/Makespan Tradeoffs Results over 20 randomly generated temporal logistics problems involve moving 4 packages between different locations in 3 cities: O = f(time,cost) = .Makespan + (1-  ).TotalCost

SAPA at IPC-2002 Rover (time setting) Satellite (complex setting) [JAIR 2003]

IV. PG Heuristics for Conformant Planning

Conformant Planning as Regression Actions: A1: M P => K A2: M Q => K A3: M R => L A4: K => G A5: L => G Initially: (P V Q V R) & (~P V ~Q) & (~P V ~R) & (~Q V ~R) & M Goal State: G G (G V K) (G V K V L) A4 A1 (G V K V L V P) & M A2 A5 A3 G or K must be true before A4 For G to be true after A4 (G V K V L V P V Q) & M (G V K V L V P V Q V R) & M Each Clause is Satisfied by a Clause in the Initial Clausal State -- Done! (5 actions) Initially: (P V Q V R) & (~P V ~Q) & (~P V ~R) & (~Q V ~R) & M (G V K V L V P V Q V R) & M

Using a Single, Unioned Graph P M Q M R M P Q R M A1 A2 A3 Q R M K L A4 G A5 P A1 A2 A3 Q R M K L P G A4 K A1 P M Heuristic Estimate = 2 Not effective Lose world specific support information Incorrect mutexes Union literals from all initial states into a conjunctive initial graph level Easy to implement

Using Multiple Graphs P M A1 P M K P M K A4 G R M A3 R M L R M L G A5 P M Q M R M Q M A2 Q M K Q K A4 G M G K A1 M P G A4 K A2 Q M G A5 L A3 R M Accurate Mutexes Moderate Implementation Difficulty Memory Intensive Heuristic Computation Can be costly Unioning these graphs a priori would give much savings …

Using a Single, Labeled Graph P Q R A1 A2 A3 P Q R M L A1 A2 A3 P Q R L A5 Action Labels: Conjunction of Labels of Supporting Literals Literal Labels: Disjunction of Labels Of Supporting Actions P M Q M R M K A4 G K A1 A2 A3 P Q R M G A5 A4 L K A1 A2 A3 P Q R M Heuristic Value = 5 Memory Efficient Cheap Heuristics Scalable Extensible Tricky to Implement Benefits from BDD’s and a model checker ATMS ~Q & ~R ~P & ~R ~P & ~Q (~P & ~R) V (~Q & ~R) (~P & ~R) V (~Q & ~R) V (~P & ~Q) M True Label Key Label of a literal signifies the set of worlds in which it is supported --Full support means all init worlds

CAltAlt Performance Label-graph based heuristics make CAltAlt competitive with the current best approaches

The Damage until now.. –Classical (regression) planning –AltAlt (AAAI 2000; AIJ 2002); AltAlt p (JAIR 2003) Serial vs. Parallel graphs; Level and Adjusted heuristics; Partial expansion –Graphplan style search –GP-HSP (AIPS 2000); PEGG (IJCAI 2003; AAAI 1999] Variable/Value ordering heuristics based on distances –Partial order planning – RePOP (IJCAI 2001) Mutexes used to detect Indirect Conflicts –Metric Temporal Planning –Sapa (ECP 2001; AIPS 2002; JAIR 2003) Propagation of cost functions; Phased relaxation –Conformant Planning – CAltAlt (ICAPS Uncertanity Wkshp, 2003) Multiple graphs; Labelled graphs Still to come: PG Heuristics for— Probabilistic Conformant Planning Conditional Planning Lifted Planning Trans-Atlantic camaraderie Post-war reconstruction Middle-east peace…

Meanwhile outside Tempe… Hoffman’s FF uses relaxed plans from PG Geffner & Haslum derive DP-versions of PG- heuristics Gerevini & Serina’s LPG uses PG heuristics to cost the various repairs Smith back-propagates (convolves) probability distributions over PG to decide the contingencies worth focusing on Trinquart proposes a PG-clone that directly computes reachability in plan-space… …

Why do we love PG Heuristics? They work! They are “forgiving” – You don't like doing mutex? okay – You don't like growing the graph all the way? okay. Allow propagation of many types of information –Level, subgoal interaction, time, cost, world support, Support phased relaxation –E.g. Ignore mutexes and resources and bring them back later… Graph structure supports other synergistic uses –e.g. action selection Versatility…

PG Variations –Serial –Parallel –Temporal –Labelled Propagation Methods –Level –Mutex –Cost –Label Planning Problems –Classical –Resource/Temporal –Conformant Planners –Regression –Progression –Partial Order –Graphplan-style Versatility of PG Heuristics