Learning to Improve the Quality of Plans Produced by Partial-order Planners M. Afzal Upal Intelligent Agents & Multiagent Systems Lab.

Slides:

Advertisements

Similar presentations

Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson March 2004.

Advertisements

Knowledge Engineering for Planning Domain Design Ron Simpson University of Huddersfield.

Heuristic Search techniques

Practical Planning: Scheduling and Hierarchical Task Networks Chapter CS 63 Adapted from slides by Tim Finin and Marie desJardins.

REVIEW : Planning To make your thinking more concrete, use a real problem to ground your discussion. –Develop a plan for a person who is getting out of.

Language for planning problems

CSE391 – 2005 NLP 1 Planning The Planning problem Planning with State-space search.

Logic: The Big Picture Propositional logic: atomic statements are facts –Inference via resolution is sound and complete (though likely computationally.

G5BAIM Artificial Intelligence Methods

AI Pathfinding Representing the Search Space

Chapter 12 Analyzing Semistructured Decision Support Systems Systems Analysis and Design Kendall and Kendall Fifth Edition.

Planning II: Partial Order Planning

CSC411Artificial Intelligence 1 Chapter 3 Structures and Strategies For Space State Search Contents Graph Theory Strategies for Space State Search Using.

Planning Module THREE: Planning, Production Systems,Expert Systems, Uncertainty Dr M M Awais.

Planning Module THREE: Planning, Production Systems,Expert Systems, Uncertainty Dr M M Awais.

CLASSICAL PLANNING What is planning ?  Planning is an AI approach to control  It is deliberation about actions  Key ideas  We have a model of the.

Maintaining Arc-consistency over Mutex Relations in Planning Graphs during Search Pavel Surynek Roman Barták Charles University, Prague Czech Republic.

Classical Planning via Plan-space search COMP3431 Malcolm Ryan.

Time Constraints in Planning Sudhan Kanitkar

Plan Generation & Causal-Link Planning 1 José Luis Ambite.

Modeling Maze Navigation Consider the case of a stationary robot and a mobile robot moving towards a goal in a maze. We can model the utility of sharing.

IBM Labs in Haifa © 2005 IBM Corporation Adaptive Application of SAT Solving Techniques Ohad Shacham and Karen Yorav Presented by Sharon Barner.

Probabilistic Planning Jim Blythe November 6th. 2 CS 541 Probabilistic planning A slide from August 30th: Assumptions (until October..) Atomic time All.

Opmaker2: Efficient Action Schema Acquisition T.L.McCluskey, S.N.Cresswell, N. E. Richardson and M.M.West The University of Huddersfield,UK

PLANNING IN AI. Determine the set of steps that are necessary to achieve a goal Some steps might be conditional, i.e., they are only taken when a set.

Part2 AI as Representation and Search

1 Classical STRIPS Planning Alan Fern * * Based in part on slides by Daniel Weld.

Problem Solving and Search in AI Part I Search and Intelligence Search is one of the most powerful approaches to problem solving in AI Search is a universal.

ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.

Evaluation of representations in AI problem solving Eugene Fink.

Chapter 4 DECISION SUPPORT AND ARTIFICIAL INTELLIGENCE

Planning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 11.

Artificial Intelligence Chapter 11: Planning

Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.

Knowledge and Systems Research Group, University of Huddersfield B vs OCL: Comparing Specification Languages for Planning Domains Diane Kitchin, Lee McCluskey,

1 Planning. R. Dearden 2007/8 Exam Format  4 questions You must do all questions There is choice within some of the questions  Learning Outcomes: 1.Explain.

Constraint Satisfaction Problems

Uninformed Search Reading: Chapter 3 by today, Chapter by Wednesday, 9/12 Homework #2 will be given out on Wednesday DID YOU TURN IN YOUR SURVEY?

Planning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 11.

Classical Planning Chapter 10.

by B. Zadrozny and C. Elkan

Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.

Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:

GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.

GAME PLAYING 1. There were two reasons that games appeared to be a good domain in which to explore machine intelligence: 1.They provide a structured task.

Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:

AI Lecture 17 Planning Noémie Elhadad (substituting for Prof. McKeown)

Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:

(Classical) AI Planning. General-Purpose Planning: State & Goals Initial state: (on A Table) (on C A) (on B Table) (clear B) (clear C) Goals: (on C Table)

Robust Planning using Constraint Satisfaction Techniques Daniel Buettner and Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science.

1 CMSC 471 Fall 2004 Class #21 – Thursday, November 11.

Some Thoughts to Consider 5 Take a look at some of the sophisticated toys being offered in stores, in catalogs, or in Sunday newspaper ads. Which ones.

Search Control.. Planning is really really hard –Theoretically, practically But people seem ok at it What to do…. –Abstraction –Find “easy” classes of.

CSE573 Autumn /11/98 Planning Administrative –PS3 due 2/23 –Midterm Friday Last time –regression planning algorithm –introduction to plan-space.

Heuristic Search Planners. 2 USC INFORMATION SCIENCES INSTITUTE Planning as heuristic search Use standard search techniques, e.g. A*, best-first, hill-climbing.

Experience Report: System Log Analysis for Anomaly Detection

Inference and search for the propositional satisfiability problem

Planning as Search State Space Plan Space Algorihtm Progression

Alternative Representations for Artificial Immune Systems

Introduction to Artificial Intelligence

Class #17 – Thursday, October 27

Planning José Luis Ambite.

Graphplan/ SATPlan Chapter

Class #19 – Monday, November 3

Class #20 – Wednesday, November 5

Graphplan/ SATPlan Chapter

Graphplan/ SATPlan Chapter

Brad Clement and Ed Durfee University of Michigan

Chapter 12 Analyzing Semistructured Decision Support Systems

Presentation transcript:

Learning to Improve the Quality of Plans Produced by Partial-order Planners M. Afzal Upal Intelligent Agents & Multiagent Systems Lab

Outline Artificial Intelligence Planning: Problems and Solutions Why Learn to Improve Plan Quality? The Performance Improving Partial-order planner (PIP) Intra-solution Learning (ISL) algorithm Search-control vs Rewrite rules Empirical Evaluation Conclusion

The Performance Task: The Classical AI Planning Given: Initial State Goals Find: Actions :{ up, down, left, right} A sequence of actions that achieves the goals when executed in the initial state e.g., down(4), right(3), up(2)

Automated Planning Systems Domain Independent Planning Systems Modular, Sound, and Complete Domain-dependent Planning Systems Practical, Efficient, Produce high quality plans

Domain Independent Systems State-space Search (each search node is a valid world state) e.g., PRODIGY, FF Partial-order Plan Space Search (each search node is a partially-ordered plan) Partial-order planners e.g., SNLP, UCPOP Graphplan-based Search (a search node is a union of world states) e.g., STAN Compilation to General Search satisfiability engines e.g., SATPLAN constraint satisfaction engines e.g., CPLAN

State-space vs Plan-space Planning right(8) down(2) left(4) right(8) l(4) d(2) up(6) END

Partial-order Plan-space Planning Partial-order planning is the process of removing flaws (unresolved goals and unordered actions that cannot take place at the same time)

Partial-order Plan-space Planning Decouple the order in which actions are added during planning from the order in which they appear in the final plan

Learning to Improve Plan Quality for Partial-order Planners How to represent plan quality information? Extended STRIPS operators + value function How to identify learning opportunities? (there are no planning failures or successes to learn from) Assume a better quality model plan for a given problem is available (from a domain expert or a through a more extensive automated search of the problems search space) What search features to base the quality improving search control knowledge on?

The Logistics Transportation Domain Initial State: Goals: at-object(parcel, postoffice) at-truck(truck1, postoffice) at-plane(plane1, airport) at-object(parcel, airport)

STRIPS encoding of the Logistics Transportation Domain Preconditions: {at-object(Object,Location), at-truck(Truck,Location)} LOAD-TRUCK(Object, Truck, Location) Effects: {in(Object,Truck), not(at-object(Object,Location))} Preconditions: {at-truck(Truck,From)} DRIVE-TRUCK(Truck, From, To) Effects: {at-truck(Truck,To), not(at-truck(Truck,From), same- city(From, To)} UNLOAD-TRUCK(Object, Truck, Location) Preconditions: {in(Object,Truck), at-truck(Truck,Location)} Effects: {at-object(Object,Location), not(in(Object,Truck))}

PR-STRIPS (similar to PDDL 2.1 level 2) A state is described using propositional as well as metric attributes (that specify the levels of the resources in that state). An action can have propositional as well as metric effects (functions which specify the amount of resources the action consumes). A value function that specifies the relative importance of the amount of each resource consumed and defines plan quality as a function of the amount of resources consumed by all actions in the plan.

PR-STRIPS encoding of the Logistics Transportation Domain Preconditions: {at-object(Object,Location), at-truck(Truck,Location)} LOAD-TRUCK(Object, Truck, Location) Effects: {in(Object,Truck), not(at-object(Object,Location)), time(-0.5), money(-5)} Preconditions: {at-truck(Truck,From)} DRIVE-TRUCK(Truck, From, To) Effects: {at-truck(Truck,To), not(at-truck(Truck,From), time(-.02*distance(From, To)), money(-distance(From, To))} UNLOAD-TRUCK(Object, Truck, Location) Preconditions: {in(Object,Truck), at-truck(Truck,Location)} Effects: {at-object(Object,Location), not(in(Object,Truck)), time(-0.5), money(-5) }

PR-STRIPS encoding of the Logistics Transportation Domain Preconditions: {at-object(Object, Location), at-plane(Plane, Location)} LOAD-PLANE(Object, Plane, Location) Effects: {in(Object, Plane), not(at-object(Object, Location)), time(-0.5), money(-5)} Preconditions: {at-plane(Plane, From), airport(To)} FLY-PLANE(Plane, From, To) Effects: {at-plane(Plane,To), not(at-plane(Plane, From), time(-.02*distance(From, To)), money(-distance(From, To))} UNLOAD-PLANE(Object, Plane, Location) Preconditions: {in(Object, Plane), at-plane(Plane, Location)} Effects: {at-object(Object, Location), not(in(Object, Plane)), time(-0.5), money(-5) }

PR-STRIPS encoding of the Logistics Transportation Domain Quality(Plan) = 1/ (2*time-used(Plan) + 5*money-used(Plan))

The Learning Problem Given A planning problem (goals, initial state, and initial resource level) Domain knowledge (actions, plan quality knowledge) A partial-order planner A model plan for the given problem Find Domain specific rules that can be used by the given planner to produce better quality plans (than the plans it wouldve produced had it not learned those rules).

Solution: The Intra-solution Learning Algorithm 1. Find a learning opportunity 2. Choose the relevant information and ignore the rest 3. Generalize the relevant information using a generalization theory

Phase 1: Find a Learning Opportunity 1. Generate a systems default plan and a default planning trace using the given partial-order planner for the given problem 2. Compare the default plan with the model plan. If the model plan is not of higher quality then goto Step 1 3. Infer the planning decisions that produced the model plan 4. Compare the inferred model planning trace with the default planning trace to identify the decision points where the two traces differ. These are the conflicting choice points

Model Trace Systems Planning Trace Common Nodes

Phase 2: Choose the relevant Information 1. Examine the downstream planning traces identifying relevant planning decisions using the heuristics 1. A planning decision to add an action Q is relevant if Q supplies a relevant condition to a relevant action 2. A planning decision to establish an open condition is relevant if it binds an uninstantiated variable of a relevant open condition 3. A planning decision to resolve a threat is relevant if all three actions involved are relevant

Phase 3: Generalize the Relevant Information 1. Generalize the relevant information using a generalization theory 1. Replace all constants with variables

An Example Logistics Problem Initial-state: {at-object(o1, lax), at-object(o2, lax), at-truck(tr1, lax), at-plane(p1, lax), airport(sjc), distance(lax, sjc)=250, time=0, money=500} Goals:{at-object(o1, sjc), at-object(o2, sjc)}

Generate Systems Default Plan and Default Planning Trace Use the given planner to generate systems default planning trace (an ordered constraint set) Each add-step/establishment decision adds a causal-link and an ordering constraint Each threat-resolution decision adds an ordering constraint 1- START END, 2- unload-truck() END, unload-truck(o1,Tr,sjc) at-object(o1,sjc) END 3- load-truck() unload-truck(),load-truck(o1,Tr, sjc) in-truck(o1,Tr) unload-truck(o1,Tr, sjc) 4- drive-truck() unload-truck(),drive-truck(Tr, X, sjc) at-truck(Tr, sjc) unload-truck(o1,Tr, sjc) 5- …

Compare Systems Default Plan with the Model Plan load-truck(o1, tr1, lax), load-truck(o2, tr1,lax), drive-truck(tr1, lax, sjc), unload-truck(o1, tr1, sjc), unload-truck(o2, tr1, sjc) load-plane(o1, p1, lax), load-plane(o2, p1, lax), fly-plane(p1, lax, sjc), unload-plane(o1, p1, sjc), unload-plane(o2, p1, sjc) Systems Default PlanModel Plan

Infer the Unordered Model Constraint Set unload-plane(ol,p1,sjc) at-object(o1,sjc) END load-plane(ol,p1,lax) at-object(o1,sjc) unload-plane(o1,p1,sjc) fly-plane(p1,sjc,lax) at-plane(p1,sjc) unload-plane(o1,p1,sjc) START at-plane(p1,lax) load-plane(ol,p1,lax) START at-plane(p1,lax) fly-plane(ol,p1,lax) START at-object(o1,lax) load-plane(ol,p1,lax) unload-plane(o2,p1,sjc) at-object(o2,sjc) END load-plane(o2,p1,lax) at-object(o2,sjc) unload-plane(o2,p1,sjc) fly-plane(p1,sjc,lax) at-plane(p1,sjc) unload-plane(o2,p1,sjc) START at-plane(p1,lax) load-plane(o2,p1,lax) START at-plane(p1,lax) fly-plane(o2,p1,lax) START at-object(o2,lax) load-plane(o2,p1,lax)

Compare the two Planning Traces to Identify Learning Opportunities START END at-object(o1,sjc) START END, unload-truck(o1,tr1,sjc) END unload-truck(o1,t1,sjc) at-object(o1,tr1,sjc) END START END, unload-plane(o1,p1,ap) END unload-plane(o1,p1,sjc) at-object(o1,p1,sjc) END A learning opportunity

Choose the Relevant Planning Decisions add-actions:START-END add-action:unload-plane(o1)add-actions:unload-truck(o1) add-action:fly-plane() add-action:load-plane(o1) add-action:unload-plane(o2) add-action:load-plane(o2) add-action:drive-truck() add-actions:load-truck(o1) add-action:drive-truck() add-actions:load-truck(o2) learning opportunity relevant decisions irrelevant decisions

Generalize the relevant planning decisions chains add-actions:START-END add-action:unload-plane(O, T)add-actions:unload-truck(O, P) add-action:fly-plane(T,X,Y) add-action:load-plane(O, T) add-action:drive-truck(P,X,Y) add-actions:load-truck(O, P)

In What Form Should the Learned Knowledge be Stored? Rewrite Rule To-be-replaced actions {load-truck(O,T,X), drive-truck(T,X,Y), unload(O,T, Y)} Replacing actions {load-plane(O,P,X), fly-plane(P,X,Y), unload-plane(O,P,Y))} Search-Control Rule Given the goals {at-object(O,Y)} to resolve and effects {at-truck(T,X), at- plane(P, X), airport(Y)}, and distance(X, Y) > 100 prefer the planning decisions {add-step(unload-plane(O,P,Y)), add- step(load-plane(O,P,X)), add- step(fly-plane(P,X,Y))} over the planning decisions {add-step(unload-truck(O,T,Y)), add-step(load-truck(O,T,X)), add- step(drive-truck(T,X,Y))}

Search Control Knowledge A heuristic function that provides an estimate of the quality of the plan a node is expected to lead to root n quality=8 quality=4 quality=2

Rewrite Rules A Rewrite rule is a 2-tuple to-be-replaced-subplan, replacing-subplan Used after search has produced a complete plan to rewrite it into a higher quality plan. Only useful in those domains where it is possible to efficiently produce a low quality plan but hard to produce a higher quality plan E.g., To-be-replaced-subplan: A4, A5 Replacing subplan: B1

Planning by Rewriting A1 A2 A3 A4 A5 A6 B1

Empirical Evaluation I: What Form Should the Learned Knowledge be Stored in? Perform empirical experiments to compare the performance of a version of PIP that learns search-control rules (Sys-search- control) with a version that learns rewrite rules (Sys-rewrite). Both Sys-rewrite-first and Sys-rewrite-best perform up to two rewritings. At each rewriting Sys-rewrite-first randomly chooses one of the applicable rewrite rules Sys-rewrite-best applies all applicable rewrite rules to try all ways of rewriting a plan.

Experimental Set-up Three benchmark planning domains logistics, softbot, and process planning Randomly generate 120 unique problem instances Train Sys-search-control and Sys-rewrite on optimal quality solutions for 20, 30, 40, and 60 examples and test them on the remaining examples (cross-validation) Plan quality is one minus the average distance of the plans generated by a system from the optimal quality plans Planning efficiency is measured by counting the average number of new nodes generated by each system

Results SoftbotLogisticsProcess Planning

Conclusion I Both search control and rewrite rules lead to improvements in plan quality. Rewrite-rules have a larger cost in terms of the loss of planning efficiency than search control rules Need a mechanism to distinguish good rules from bad rules and to forget the bad rules Comparing planning traces seems to be a better technique for learning search control rules than rewrite rules Need to explore alternate strategies for learning rewrite rules By comparing two completed plans of different quality Through static domain analysis

Empirical Evaluation II: A Study of the Factors Affecting PIPs Learning Performance Generated 25 abstract domains varying along a number of seemingly relevant dimensions Instance Similarity Quality Branching Factor (average number of multiple quality solutions per problem) Association between the default planning bias and the quality bias Are there any statistically significant differences in PIPs performance as each factor is varied (student t-test)?

Results PIPs learning leads to greater improvements in domains where Quality branching factor is large Planners default biases are negatively correlated with the quality improving heuristic function There is no simple relationship between instance similarity and PIPs learning performance

Conclusion II Need to address scale up issues Need to keep up with advances in AI planning technologies It is arguably more difficult to accelerate a new generation planner by outfiting it with learning as the overhead cost by the learning system can overwhelm the gains in search efficiency (Kambhampati 2001) Problem is not the lack of a well defined task! Organize a symposium/special issue on issues of how to efficiently organize, retrieve, and forget learned knowledge An open source planning and learning software?