Refinement Planing CSE 574 April 15, 2003 Dan Weld.

Slides:



Advertisements
Similar presentations
Hybrid BDD and All-SAT Method for Model Checking Orna Grumberg Joint work with Assaf Schuster and Avi Yadgar Technion – Israel Institute of Technology.
Advertisements

CLASSICAL PLANNING What is planning ?  Planning is an AI approach to control  It is deliberation about actions  Key ideas  We have a model of the.
Top 5 Worst Times For A Conference Talk 1.Last Day 2.Last Session of Last Day 3.Last Talk of Last Session of Last Day 4.Last Talk of Last Session of Last.
Probabilistic Planning (goal-oriented) Action Probabilistic Outcome Time 1 Time 2 Goal State 1 Action State Maximize Goal Achievement Dead End A1A2 I A1.
1 Graphplan José Luis Ambite * [* based in part on slides by Jim Blythe and Dan Weld]
Classical Planning via Plan-space search COMP3431 Malcolm Ryan.
Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.
Plan Generation & Causal-Link Planning 1 José Luis Ambite.
4/22: Scheduling (contd) Planning with incomplete info (start) Earth which has many heights, and slopes and the unconfined plain that bind men together,
Planning CSE 473 Chapters 10.3 and 11. © D. Weld, D. Fox 2 Planning Given a logical description of the initial situation, a logical description of the.
1 Classical STRIPS Planning Alan Fern * * Based in part on slides by Daniel Weld.
Classical Planning via State-space search COMP3431 Malcolm Ryan.
Planning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 11.
3/25  Monday 3/31 st 11:30AM BYENG 210 Talk by Dana Nau Planning for Interactions among Autonomous Agents.
A: A Unified Brand-name-Free Introduction to Planning Subbarao Kambhampati.
Reinforcement Learning
CPSC 322, Lecture 12Slide 1 CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12 (Textbook Chpt ) January, 29, 2010.
CSE 5731 Lecture 21 State-Space Search vs. Constraint- Based Planning CSE 573 Artificial Intelligence I Henry Kautz Fall 2001.
1 Planning. R. Dearden 2007/8 Exam Format  4 questions You must do all questions There is choice within some of the questions  Learning Outcomes: 1.Explain.
Planning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised.
Nov 14 th  Homework 4 due  Project 4 due 11/26.
Problem Spaces & Search CSE 473. © Daniel S. Weld Topics Agents & Environments Problem Spaces Search & Constraint Satisfaction Knowledge Repr’n.
Handling non-determinism and incompleteness. Problems, Solutions, Success Measures: 3 orthogonal dimensions  Incompleteness in the initial state  Un.
A: A Unified Brand-name-Free Introduction to Planning Subbarao Kambhampati CSE 574 Planning & Learning (which is actually more of the former and less of.
Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle.
Planning II CSE 473. © Daniel S. Weld 2 Logistics Tournament! PS3 – later today Non programming exercises Programming component: (mini project) SPAM detection.
Planning Where states are transparent and actions have preconditions and effects Notes at
5/6: Summary and Decision Theoretic Planning  Last homework socket opened (two more problems to be added—Scheduling, MDPs)  Project 3 due today  Sapa.
1 BLACKBOX: A New Paradigm for Planning Bart Selman Cornell University.
1 Planning Chapters 11 and 12 Thanks: Professor Dan Weld, University of Washington.
Dynamic Bayesian Networks CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanningLearning.
Classical Planning via State-space search COMP3431 Malcolm Ryan.
RePOP: Reviving Partial Order Planning
Planning Where states are transparent and actions have preconditions and effects Notes at
Planning II CSE 573. © Daniel S. Weld 2 Logistics Reading for Wed Ch 18 thru 18.3 Office Hours No Office Hour Today.
CSE 573 Artificial Intelligence Dan Weld Peng Dai
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
Homework 1 ( Written Portion )  Max : 75  Min : 38  Avg : 57.6  Median : 58 (77%)
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
Design of an Evolutionary Algorithm M&F, ch. 7 why I like this textbook and what I don’t like about it!
Reinforcement Learning
Heuristics in Search-Space CSE 574 April 11, 2003 Dan Weld.
Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
CSE473 Winter /04/98 State-Space Search Administrative –Next topic: Planning. Reading, Chapter 7, skip 7.3 through 7.5 –Office hours/review after.
MDPs (cont) & Reinforcement Learning
AI Lecture 17 Planning Noémie Elhadad (substituting for Prof. McKeown)
RePOP: Reviving Partial Order Planning XuanLong Nguyen & Subbarao Kambhampati Yochan Group:
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
1 CMSC 471 Fall 2004 Class #21 – Thursday, November 11.
Planning I: Total Order Planners Sections
Graphplan CSE 574 April 4, 2003 Dan Weld. Schedule BASICS Intro Graphplan SATplan State-space Refinement SPEEDUP EBL & DDB Heuristic Gen TEMPORAL Partial-O.
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
Search Control.. Planning is really really hard –Theoretically, practically But people seem ok at it What to do…. –Abstraction –Find “easy” classes of.
Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.
Hybrid BDD and All-SAT Method for Model Checking
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Planning as Search State Space Plan Space Algorihtm Progression
Classical Planning via State-space search
CS b659: Intelligent Robotics
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Introduction Contents Sungwook Yoon, Postdoctoral Research Associate
RePOP: Reviving Partial Order Planning
Review for the Midterm Exam
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Planning José Luis Ambite.
Planning CSE 573 A handful of GENERAL SEARCH TECHNIQUES lie at the heart of practically all work in AI We will encounter the SAME PRINCIPLES again and.
Class #20 – Wednesday, November 5
Lecture 3: Environs and Algorithms
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Refinement Planing CSE 574 April 15, 2003 Dan Weld

Planning Applications RAX/PS (The NASA Deep Space planning agent) HSTS (Hubble Space Telescope scheduler) Similar work in planning earth obs: satellite / plane Spacecraft Repair / Workflow Shuttle refurbishment Optimum AIV system Elevator control Koehler's Miconic domain – fielded in skyscrapers. An airport-ground-traffic-control Company of Wolfgang Hatzack

Planning Applications 2 Diagnose, reconfigure power distribution systems Sylvie Thiebaux / EDF Data Transformation VICAR (JPL image enhancing system); CELWARE (CELCorp) Online games & training “Intelligent” characters Classics (fielded?) Robotics NLP to database interfaces

Planning Applications 3 Control of an Australian Brewery (SIPE) Desert Storm Logistics Planning “DART saved more $$ during this campaign than the whole DARPS budget for the past 25 years”

More Administrivia No class Fri 4/18 But: Read p1-30 Boutilier, Dean & Hanks No review necessary Experiment with  1 Planner Write a review of the planner Plan Project What Who: 1-3 person groups

Project 1: Goal Selection Input Init state Action schemata Goals Output Plan each with assoc utility = f(g) assoc resource cost = f(act) maximizing utility subject to resource bound Resource bound

Project 2: Embedded Agent Implement Simulator Takes init state + action schemata as input Communicates with agent Integrate MDP Agent SPUDD, GPT, or ?? Extensions Augment to take user-specified goals Incremental policy changes One-shot vs recurring reward Real time issues

Project 3: Incomplete Info & Time Extend SAPA or other temporal planner Sensory effects Handle information gathering Interleaved execution or conditional plans (Likely adopt ideas from Petrick & Bacchus)

A recent (turbulent) history of planning 1995 Advent of CSP style compilation approach: Graphplan [Blum & Furst] SATPLAN [Kautz & Selman] Use of reachability analysis and Disjunctive constraints 1970s-1995 UCPOP, Zeno [Penberthy &Weld] IxTeT [Ghallab et al] The whole world believed in POP and was happy to stack 6 blocks! UCPOP Domination of heuristic state search approach: HSP/R [Bonet & Geffner] UNPOP [McDermott]: POP is dead! Importance of good Domain-independent heuristics 1997 UNPOP Hoffman’s FF – a state search planner won the AIPS-00 competition! … but NASA’s highly publicized RAX still a POP dinosaur! POP believed to be good framework to handle temporal and resource planning [Smith et al, 2000] RePOP

In the beginning it was all POP. Then it was cruelly UnPOPped The good times return with Re(vived)POP

Too many brands of classical planners Planning as Search Search in the space of States (progression, regression, MEA) (STRIPS, PRODIGY, TOPI, HSP, HSP-R, UNPOP, FF) Search in the space of Plans (total order, partial order, protections, MTC) (Interplan,SNLP,TOCL, UCPOP,TWEAK) Search in the space of Task networks (reduction of non-primitive tasks) (NOAH, NONLIN, O-Plan, SIPE) Planning as CSP/ILP/SAT/BDD (Graphplan, IPP, STAN, SATPLAN, BLackBOX,GP-CSP,BDDPlan) Planning as Theorem Proving (Green’s planner) Planning as Model Checking

A Unifying View CONTROL Heuristics/Optimizations Reachability Relevance Relax Subgoal interactions Directed Partial Consistency enforcement PART 2 HTN Schemas TL Formulas Cutting Planes Domain-customization Case-based Abstraction-based Failure-based Domain Analysis* Hand-coded Learned PART 3 Refinement Planning Disjunctive Refinement Planning Conjunctive Refinement Planning CSPILPBDD What are Plans? Refinements? How are sets of plans represented compactly? How are they refined? How are they searched? Graph-basedSAT SEARCH FSS, BSS, PS Candidate set semantics PART I

Main Points Framework Types of refinements Presatisfaction, preordering, tractability Refinement vs. solution extraction Splitting as a way to decrease soln extraction time … at a cost Use of disjunctive representations

Tradeoffs among Basic Strategies Progression/regression must commit to both position and relevance of actions (Regression can judge relevance— sort of-- but handles sets of states) + Give state information (Easier plan validation) - Leads to premature commitment >but better heuristic guidance - Too many states when actions have durations Plan-space refinement (PSR) avoids constraining position + Reduces commitment (large candidate set /branch) >But harder to get heuristic estimate - Increases plan-validation costs + Easily extendible to actions with duration State SpacePlan Space

(Dis)advantages of partial order planning The Heuristic Angle Estimating the distance of a partial plan from a Flaw-less solution plan is conceptually harder Than estimating the distance of a set of states from The init state which in turn is harder than estimating The cost of a single state from the goal state The Commitment angle Progression/regression planners commit to both Position and relevance. PS planners only commit To relevance. --Unnecessary commitments increase the chance of backtracking >>But also make it easier to validate/evalute the partial plan Action Position, Relevance Branching Factor Depth of Search Tree Maintenance Goals Durative Actions

Weaknesses Numerous terms, far from first use Is this interesting? While (I still have candidate plans) If I have a solution plan, return it Else, improve the existing plans EndWhile Interleaving different strategies Dynamically determine which strategy to use Exploiting learning, symmetry in planning

Future Work Filling in holes Can unsupervised learning can be used? For supervised learning, Can sufficient training samples be obtained? Can one extend refinement strategies E.g. planning under uncertainty?

Transition System Perspective Model agent-env. dynamics as transition systems A transition system is a 2-tuple where S is a set of states A is a set of actions, each action a being a subset of SXS Graphs with states = to nodes, and actions =edges If transitions are not deterministic, then the edges will be “hyper-edges Agent may know that its initial state is subset S’ of S If the env. is not fully observable, then |S’|>1. Consider some subset Sg of S as desirable Finding a plan is equivalent to finding (shortest) path in the graph corresponding to the transition system

Transition System Models A transition system is a two tuple Where S is a set of “states” A is a set of “transitions” each transition a is a subset of SXS --If a is a (partial) function then deterministic transition --otherwise, it is a “non-deterministic” transition --It is a stochastic transition If there are probabilities associated with each state a takes s to --Finding plans becomes is equivalent to finding “paths” in the transition system Transition system models are called “Explicit state-space” models In general, we would like to represent the transition systems more compactly e.g. State variable representation of states. These latter are called “Factored” models Each action in this model can be Represented by incidence matrices (e.g. below) The set of all possible transitions Will then simply be the SUM of the Individual incidence matrices

Manipulating Transition Systems

MDPs = general transition systems A Markov Decision Process) is a general (deterministic or non-) transition system where the states have “Rewards” In the general case, all states can have varying amount of rewards Planning defined as finding a “policy” A mapping from states to actions which has the maximal expected reward

Problems with transition systems Transition systems are a great conceptual tool …However direct manipulation of transition systems tends to be too cumbersome The size of the explicit graph corresponding to a transition system is often very large The remedy is to provide “compact” representations Start by explicating the structure of the “states” e.g. states specified in terms of state variables Represent actions not as incidence matrices but rather functions specified directly in terms of the state vars An action will work in any state where some state variables have certain values. When it works, it will change the values of certain (other) state variables

Factoring States 3 prop variables: P, Q, R 8 world states

Boolean Functions P Q TF T {P, Q, R} -> {T/F}  P  Q BDDs