Decision-Theoretic Planning with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.

Slides:

Advertisements

Similar presentations

Partially Observable Markov Decision Process (POMDP)

Advertisements

CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)

1 Chapter 5 Continuous time Markov Chains Learning objectives : Introduce continuous time Markov Chain Model manufacturing systems using Markov Chain Able.

Statistical Probabilistic Model Checking Håkan L. S. Younes Carnegie Mellon University.

Ymer: A Statistical Model Checker Håkan L. S. Younes Carnegie Mellon University.

A Hierarchical Framework for Composing Nested Web Processes Haibo Zhao, Prashant Doshi LSDIS Lab, Dept. of Computer Science, University of Georgia 4 th.

An Introduction to Markov Decision Processes Sarah Hickmott

COSC 878 Seminar on Large Scale Statistical Machine Learning 1.

Planning under Uncertainty

1 Policies for POMDPs Minqing Hu. 2 Background on Solving POMDPs MDPs policy: to find a mapping from states to actions POMDPs policy: to find a mapping.

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.

Reinforcement Learning

A. BobbioBertinoro, March 10-14, Dependability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica Università del Piemonte.

Section 6.1 Let X 1, X 2, …, X n be a random sample from a distribution described by p.m.f./p.d.f. f(x ;  ) where the value of  is unknown; then  is.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Discretization Pieter Abbeel UC Berkeley EECS

Probabilistic Verification of Discrete Event Systems Håkan L. S. Younes.

1 Spare part modelling – An introduction Jørn Vatn.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell.

Policy Generation for Continuous-time Stochastic Domains with Concurrency Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.

Introduction to Stochastic Models GSLM 54100

Search and Planning for Inference and Learning in Computer Vision

Planning and Verification for Stochastic Processes with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.

Generalized Semi-Markov Processes (GSMP)

Chapter 2 Machine Interference Model Long Run Analysis Deterministic Model Markov Model.

CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Planning and Execution with Phase Transitions Håkan L. S. Younes Carnegie Mellon University Follow-up paper to Younes & Simmons’ “Solving Generalized Semi-Markov.

Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn.

Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.

Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.

1 Markov Decision Processes Basics Concepts Alan Fern.

By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering.

Carnegie Mellon University

Generalized Semi- Markov Processes (GSMP). Summary Some Definitions The Poisson Process Properties of the Poisson Process  Interarrival times  Memoryless.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Extending PDDL to Model Stochastic Decision Processes Håkan L. S. Younes Carnegie Mellon University.

© 2015 McGraw-Hill Education. All rights reserved. Chapter 19 Markov Decision Processes.

Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.

CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.

Chapter 01 Probability and Stochastic Processes References: Wolff, Stochastic Modeling and the Theory of Queues, Chapter 1 Altiok, Performance Analysis.

Probabilistic Verification of Discrete Event Systems using Acceptance Sampling Håkan L. S. Younes Carnegie Mellon University.

Dynamic Programming Discrete time frame Multi-stage decision problem Solves backwards.

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

CSE 473Markov Decision Processes Dan Weld Many slides from Chris Bishop, Mausam, Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer.

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

A Formalism for Stochastic Decision Processes with Asynchronous Events Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.

Markov Decision Process (MDP)

A Structured Solution Approach for Markov Regenerative Processes Elvio G. Amparore 1, Peter Buchholz 2, Susanna Donatelli 1 1 Dipartimento di Informatica,

1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.

Instructor: Spyros Reveliotis IE7201: Production & Service Systems Engineering Fall 2009 Closure.

Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.

1 Chapter 5 Continuous time Markov Chains Learning objectives : Introduce continuous time Markov Chain Model manufacturing systems using Markov Chain Able.

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

Partially Observable Markov Decision Process and RL

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Making complex decisions

Analytics and OR DP- summary.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Markov Decision Processes

Markov Decision Processes

Markov Decision Problems

CS 416 Artificial Intelligence

CS 416 Artificial Intelligence

Reinforcement Nisheeth 18th January 2019.

CS723 - Probability and Stochastic Processes

Presentation transcript:

Decision-Theoretic Planning with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University

2 Introduction Asynchronous processes are abundant in the real world Discrete-time models are inappropriate for systems with asynchronous events Generalized semi-Markov (decision) processes are great for this!

3 Stochastic Processes with Asynchronous Events m1m1 m2m2 m 1 up m 2 up t = 0

4 Stochastic Processes with Asynchronous Events m1m1 m2m2 m 1 up m 2 up m 1 up m 2 down m 2 crashes t = 0t = 2.5

5 Stochastic Processes with Asynchronous Events m1m1 m2m2 m 1 up m 2 up m 1 up m 2 down m 1 down m 2 down m 1 crashesm 2 crashes t = 0t = 2.5t = 3.1

6 Stochastic Processes with Asynchronous Events m1m1 m2m2 m 1 up m 2 up m 1 up m 2 down m 1 down m 2 down m 1 down m 2 up m 1 crashes m 2 rebooted t = 0t = 2.5t = 3.1t = 4.9

7 A Model of Stochastic Discrete Event Systems Generalized semi-Markov process (GSMP) [Matthes 1962] A set of events E A set of states S

8 Events In a state s, events E s  E are enabled With each event e is associated: A distribution G e governing the time e must remain enabled before it triggers A next-state probability distribution p e (s′|s)

9 Semantics of GSMP Model Associate a real-valued clock t e with e For each e  E s sample t e from G e Let e* = argmin e  E s t e, t* = min e  E s t e Sample s′ from p e* (s′|s) For each e  E s′ t e ′ = t e – t* if e  E s \ {e*} sample t e ′ from G e otherwise Repeat with s = s′ and t e = t e ′

10 Semantics: Example m 1 up m 2 up m 1 up m 2 down m 2 crashes m 2 crashes: 2.5 m 1 crashes: 3.1 m 1 crashes: 0.6 reboot m 2 : 2.4

11 Notes on Semantics Events that remain enabled across state transitions without triggering are not rescheduled Asynchronous events! Differs from semi-Markov process in this respect Continuous-time Markov chain if all G e are exponential distributions

12 General State-Space Markov Chain (GSSMC) Model is Markovian if we include the clocks in the state space Extended state space X Next-state distribution f(x′|x) well-defined Clock values are not know to observer Time events have been enabled is know

13 Observation Model An observation o is a state s and a real value u e for each event representing the time e has currently been enabled f(x|o) is well-defined

14 Observations: Example m 1 up m 2 up m 1 up m 2 down m 2 crashes m 2 crashes: 2.5 m 1 crashes: 3.1 m 1 crashes: 0.6 reboot m 2 : 2.4 m 1 up m 2 up m 1 up m 2 down m 2 crashes m 2 crashes: 0.0 m 1 crashes: 0.0 m 1 crashes: 2.5 reboot m 2 : 0.0 Observed model: Actual model:

15 Actions and Policies (GSMDPs) Identify a set A  E of controllable events (actions) A policy is a mapping from observations to sets of actions Action choice can change at any time in a state s

16 Rewards and Discounts Lump sum reward k(s,e,s′) associated with transition from s to s′ caused by e Continuous reward rate r(s,a) associated with a being enabled in s Discount factor  Unit reward earned at time t counts as e –  t

17 Value Function for GSMDPs

18 GSMDP Solution Method [Younes & Simmons 2004] Continuous-time MDPGSMDPDiscrete-time MDP Phase-type distributionsUniformization [Jensen 1953] GSMDPContinuous-time MDP

19 Continuous Phase-Type Distributions [Neuts 1981] Time to absorption in a continuous-time Markov chain with n transient states

20 Exponential Distribution 0

21 Two-Phase Coxian Distribution 10 p 1 (1 – p) 1 2

22 Generalized Erlang Distribution n – 110 … p (1 – p)

23 Method of Moments Approximate general distribution G with phase-type distribution PH by matching the first n moments

24 Moments of a Distribution The i th moment:  i = E[X i ] Mean:  1 Variance:  2 =  2 –  1 2 Coefficient of variation: cv =  /  1

25 Matching One Moment Exponential distribution: = 1/  1

26 Matching Two Moments cv 2 01 Exponential Distribution

27 Matching Two Moments cv 2 01 Exponential Distribution Generalized Erlang Distribution

28 cv 2 Matching Two Moments 01 Generalized Erlang Distribution Two-Phase Coxian DistributionExponential Distribution

29 Matching Moments: Example 1 Weibull distribution: W(1,1/2)  1 = 2, cv 2 = /10 9/10 1/10 W(1,1/2) t F(t)F(t)

30 Matching Moments: Example 2 Uniform distribution: U(0,1)  1 = 1/2, cv 2 = 1/ F(t)F(t) U(0,1) t 12

31 Matching More Moments Closed-form solution for matching three moments of positive distributions [Osogami & Harchol-Balter 2003] Combination of Erlang distribution and two-phase Coxian distribution

32 Approximating GSMDP with Continuous-time MDP Each event with a non-exponential distribution is approximated by a set of events with exponential distributions Phases become part of state description

33 Policy Execution Phases represent discretization into random-length intervals of the time events have been enabled Phases are not part of real model Simulate phase transition during execution

34 The Foreman’s Dilemma When to enable “Service” action in “Working” state? Working c = 1 Failed c = 0 Serviced c = 0.5 Service Exp(10) Return Exp(1) Fail G Replace Exp(1/100)

35 The Foreman’s Dilemma: Optimal Solution Find t 0 that maximizes v 0 Y is the time to failure in “Working” state

36 The Foreman’s Dilemma: SMDP Solution Same formulas, but restricted choice: Action is immediately enabled ( t 0 = 0 ) Action is never enabled ( t 0 = ∞ )

37 The Foreman’s Dilemma: Performance Failure-time distribution: U(5,x) x Percent of optimal

38 The Foreman’s Dilemma: Performance Failure-time distribution: W(1.6x,4.5) x Percent of optimal

39 System Administration Network of n machines Reward rate c(s) = k in states where k machines are up One crash event and one reboot action per machine At most one action enabled at any time

40 System Administration: Performance Reward n Reboot-time distribution: U(0,1)

41 System Administration: Performance 1 moment2 moments3 moments sizestates q time (s)states q time (s)states q time (s) n2n (n+1)2 n (1.5n+1)2 n

42 The Role of Phases Foreman’s dilemma: Phases permit delay in enabling of actions System administration: Phases allow us to take into account the time an action has already been enabled

43 Summary Generalized semi-Markov (decision) processes allow asynchronous events Phase-type distributions can be used to approximate a GSMDP with an MDP Allows us to approximately solve GSMDPs using existing MDP techniques Phase does matter!

44 Future Work Discrete phase-type distributions Handles deterministic distributions Avoids uniformization step Value function approximation Take advantage of GSMDP structure Other optimization criteria Finite horizon, etc.

45 Tempastic-DTP A tool for GSMDP planning: