Probabilistic Temporal Planning with Uncertain Durations Mausam Joint work with Daniel S. Weld University of Washington Seattle.

Slides:

Advertisements

Similar presentations

Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson March 2004.

Advertisements

A Technique for Parallel Reachability Analysis of Java Programs Raghuraman R. Sridhar Iyer G. Sajith.

Dialogue Policy Optimisation

Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University.

Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.

Plan Generation & Causal-Link Planning 1 José Luis Ambite.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

SARSOP Successive Approximations of the Reachable Space under Optimal Policies Devin Grady 4 April 2013.

CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Decision Theoretic Planning

1 Classical STRIPS Planning Alan Fern * * Based in part on slides by Daniel Weld.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

A Hybridized Planner for Stochastic Domains Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento.

An Introduction to Markov Decision Processes Sarah Hickmott

Planning under Uncertainty

POMDPs: Partially Observable Markov Decision Processes Advanced AI

Planning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 11.

Announcements Homework 3: Games Project 2: Multi-Agent Pacman

SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.

Concurrent Markov Decision Processes Mausam, Daniel S. Weld University of Washington Seattle.

Markov Decision Processes CSE 473 May 28, 2004 AI textbook : Sections Russel and Norvig Decision-Theoretic Planning: Structural Assumptions.

KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.

Markov Decision Processes

Planning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised.

Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Handling non-determinism and incompleteness. Problems, Solutions, Success Measures: 3 orthogonal dimensions  Incompleteness in the initial state  Un.

Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle.

4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)

Dynamic Bayesian Networks CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanningLearning.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

MDP Reinforcement Learning. Markov Decision Process “Should you give money to charity?” “Would you contribute?” “Should you give money to charity?” $

A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell.

Policy Generation for Continuous-time Stochastic Domains with Concurrency Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.

Classical Planning Chapter 10.

Planning Concurrent Actions under Resources and Time Uncertainty Éric Beaudry Étudiant au doctorat en informatique.

MAKING COMPLEX DEClSlONS

Planning and Verification for Stochastic Processes with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.

Overview  Decision processes and Markov Decision Processes (MDP)  Rewards and Optimal Policies  Defining features of Markov Decision Process  Solving.

CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Planning and Execution with Phase Transitions Håkan L. S. Younes Carnegie Mellon University Follow-up paper to Younes & Simmons’ “Solving Generalized Semi-Markov.

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.

Probabilistic Reasoning for Robust Plan Execution Steve Schaffer, Brad Clement, Steve Chien Artificial Intelligence.

CPSC 422, Lecture 9Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 9 Sep, 28, 2015.

CSE 473Markov Decision Processes Dan Weld Many slides from Chris Bishop, Mausam, Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer.

By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Extending PDDL to Model Stochastic Decision Processes Håkan L. S. Younes Carnegie Mellon University.

Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.

Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.

Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.

Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas.

Heuristic Search for problems with uncertainty CSE 574 April 22, 2003 Mausam.

Announcements  Upcoming due dates  Wednesday 11/4, 11:59pm Homework 8  Friday 10/30, 5pm Project 3  Watch out for Daylight Savings and UTC.

CSE 473Markov Decision Processes Dan Weld Many slides from Chris Bishop, Mausam, Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer.

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

Markov Decision Processes Chapter 17 Mausam. Planning Agent What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents.

CS b659: Intelligent Robotics

Markov Decision Processes

Markov Decision Processes

Course Logistics CS533: Intelligent Agents and Decision Making

Representing Uncertainty

CS 416 Artificial Intelligence

Reinforcement Learning Dealing with Partial Observability

Markov Decision Processes

Markov Decision Processes

Presentation transcript:

Probabilistic Temporal Planning with Uncertain Durations Mausam Joint work with Daniel S. Weld University of Washington Seattle

Motivation Three features of real world planning domains Concurrency Calibrate while rover moves Uncertain Effects ‘Grip a rock’ may fail Uncertain Durative actions Wheels spin, so speed uncertain

Contributions Novel Challenges Large number of decision epochs Results to manage this blowup in different cases Large branching factors Approximation algorithms Five planning algorithms  DUR prun : optimal  DUR samp : near-optimal  DUR hyb : anytime with user defined error  DUR exp : super-fast  DUR arch : balance between speed and quality Identify fundamental issues for future research

Outline of the talk Background Theory Algorithms and Experiments Summary and Future Work

Outline of the talk Background MDP Decision Epochs: happenings, pivots Theory Algorithms and Experiments Summary and Future Work

Markov Decision Process S : a set of states, factored into Boolean variables. A : a set of actions P r ( S£A£S! [0,1]): the transition model C ( A! R ) : the cost model s 0 : the start state G : a set of absorbing goals unit duration

GOAL of an MDP Find a policy ( S ! A ) which: minimises expected cost of reaching a goal for a fully observable Markov decision process if the agent executes for indefinite horizon. Algorithms Value iteration, Real Time Dynamic Programming, etc. iterative dynamic programming algorithms

Definitions (Durative Actions) Assumption: (Prob.) TGP Action model Preconditions must hold until end of action. Effects are usable only at the end of action. Decision epochs: time point when a new action may be started. Happenings: A point when action finishes. Pivot: A point when action could finish.

Outline of the talk Background Theory Explosion of Decision Epochs Algorithms and Experiments Summary and Future Work

Decision Epochs (TGP Action Model) Deterministic Durations [Mausam&Weld05] : Decision Epochs = set of happenings Uncertain Durations: Non-termination has information! Theorem: Decision Epochs = set of pivots

Illustration: A bimodal distribution Duration distribution of a Expected Completion Time

Conjecture if all actions have duration distributions independent of effects unimodal duration distributions then Decision Epochs = set of happenings

Outline of the talk Background Theory Algorithms and Experiments Expected Durations Planner Archetypal Durations Planner Summary and Future Work

Planning with Durative Actions MDP in an augmented state space X 1 : Application of b on X X a b c Time

Uncertain Durations: Transition Fn a, b 0.25 a b b a b a a b action a : uniform(1,2) action b : uniform(1,2)

Branching Factor If n actions m possible durations r probabilistic effects Then Potential Successors (m-1)[(r+1) n – r n – 1] + r n

Algorithms Five planning algorithms  DUR prun : optimal  DUR samp : near-optimal  DUR hyb : anytime with user defined error  DUR exp : super-fast  DUR arch : balance between speed and quality

Expected Durations Planner (  DUR exp ) assign each action a deterministic duration equal to the expected value of its distribution. build a deterministic duration policy for this domain. repeat execute this policy and wait for interrupt (a) action terminated as expected – do nothing (b) action terminated early – replan from this state (c) action terminated late – revise a’s deterministic duration and replan for this domain until goal is reached

Planning Time

Multi-modal distributions Recall: conjecture holds only for unimodal distributions happenings if unimodal Decision epochs = pivots if multimodal

Multi-modal Durations: Transition Fn a, b 0.25 a b b a b a a b action a : uniform(1,2) action b : 50% : 1 50% : 3

Multi-modal Distributions Expected Durations Planner (  Dur exp ) One deterministic duration per action Big approximation for multi-modal distribution Archetypal Durations Planner (  Dur arch ) Limited uncertainty in durations One duration per mode of distribution

Planning Time (multi-modal)     

Expected Make-span (multi-modal)     

Outline of the talk Background Theory Algorithms and Experiments Summary and Future Work Observations on Concurrency

Summary Large number of Decision Epochs Results to manage explosion in specific cases Large branching factors Expected Durations Planner Archetypal Durations Planner (multi-modal)

Handling Complex Action Models So Far: Probabilistic TGP Preconditions hold over-all. Effects usable only at end. What about: Probabilistic PDDL2.1 ? Preconditions at-start, over-all, at-end Effects at-start, at-end Decision epochs must be arbitrary points.

Ramifications Result independent of uncertainty!! Existing decision epoch planners are incomplete. SAPA, Prottle, etc. All IPC winners p, : q a b G G q : p q p preconditions effects

PDDL2.1 (NO UNCERTAINTY!) Theorem: Restricting decision epochs to pivots causes ‘incompleteness’ A problem may be incorrectly deemed unsolvable Exciting future work! p, : q a b G G q : p q p preconditions effects

Related Work Tempastic (Younes and Simmons’ 04) Generate, Test and Debug Prottle (Little, Aberdeen, Thiebaux’ 05) Planning Graph based heuristics Uncertain Durations w/o concurrency Foss and Onder’05 Boyan and Littman’00 Bresina et.al.’02, Dearden et.al.’03