Presentation is loading. Please wait.

Presentation is loading. Please wait.

Decision-Theoretic Planning with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.

Similar presentations


Presentation on theme: "Decision-Theoretic Planning with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University."— Presentation transcript:

1 Decision-Theoretic Planning with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University

2 2 Introduction Asynchronous processes are abundant in the real world Discrete-time models are inappropriate for systems with asynchronous events Generalized semi-Markov (decision) processes are great for this!

3 3 Stochastic Processes with Asynchronous Events m1m1 m2m2 m 1 up m 2 up t = 0

4 4 Stochastic Processes with Asynchronous Events m1m1 m2m2 m 1 up m 2 up m 1 up m 2 down m 2 crashes t = 0t = 2.5

5 5 Stochastic Processes with Asynchronous Events m1m1 m2m2 m 1 up m 2 up m 1 up m 2 down m 1 down m 2 down m 1 crashesm 2 crashes t = 0t = 2.5t = 3.1

6 6 Stochastic Processes with Asynchronous Events m1m1 m2m2 m 1 up m 2 up m 1 up m 2 down m 1 down m 2 down m 1 down m 2 up m 1 crashes m 2 rebooted t = 0t = 2.5t = 3.1t = 4.9

7 7 A Model of Stochastic Discrete Event Systems Generalized semi-Markov process (GSMP) [Matthes 1962] A set of events E A set of states S

8 8 Events In a state s, events E s  E are enabled With each event e is associated: A distribution G e governing the time e must remain enabled before it triggers A next-state probability distribution p e (s′|s)

9 9 Semantics of GSMP Model Associate a real-valued clock t e with e For each e  E s sample t e from G e Let e* = argmin e  E s t e, t* = min e  E s t e Sample s′ from p e* (s′|s) For each e  E s′ t e ′ = t e – t* if e  E s \ {e*} sample t e ′ from G e otherwise Repeat with s = s′ and t e = t e ′

10 10 Semantics: Example m 1 up m 2 up m 1 up m 2 down m 2 crashes m 2 crashes: 2.5 m 1 crashes: 3.1 m 1 crashes: 0.6 reboot m 2 : 2.4

11 11 Notes on Semantics Events that remain enabled across state transitions without triggering are not rescheduled Asynchronous events! Differs from semi-Markov process in this respect Continuous-time Markov chain if all G e are exponential distributions

12 12 General State-Space Markov Chain (GSSMC) Model is Markovian if we include the clocks in the state space Extended state space X Next-state distribution f(x′|x) well-defined Clock values are not know to observer Time events have been enabled is know

13 13 Observation Model An observation o is a state s and a real value u e for each event representing the time e has currently been enabled f(x|o) is well-defined

14 14 Observations: Example m 1 up m 2 up m 1 up m 2 down m 2 crashes m 2 crashes: 2.5 m 1 crashes: 3.1 m 1 crashes: 0.6 reboot m 2 : 2.4 m 1 up m 2 up m 1 up m 2 down m 2 crashes m 2 crashes: 0.0 m 1 crashes: 0.0 m 1 crashes: 2.5 reboot m 2 : 0.0 Observed model: Actual model:

15 15 Actions and Policies (GSMDPs) Identify a set A  E of controllable events (actions) A policy is a mapping from observations to sets of actions Action choice can change at any time in a state s

16 16 Rewards and Discounts Lump sum reward k(s,e,s′) associated with transition from s to s′ caused by e Continuous reward rate r(s,a) associated with a being enabled in s Discount factor  Unit reward earned at time t counts as e –  t

17 17 Value Function for GSMDPs

18 18 GSMDP Solution Method [Younes & Simmons 2004] Continuous-time MDPGSMDPDiscrete-time MDP Phase-type distributionsUniformization [Jensen 1953] GSMDPContinuous-time MDP

19 19 Continuous Phase-Type Distributions [Neuts 1981] Time to absorption in a continuous-time Markov chain with n transient states

20 20 Exponential Distribution 0

21 21 Two-Phase Coxian Distribution 10 p 1 (1 – p) 1 2

22 22 Generalized Erlang Distribution n – 110 … p (1 – p)

23 23 Method of Moments Approximate general distribution G with phase-type distribution PH by matching the first n moments

24 24 Moments of a Distribution The i th moment:  i = E[X i ] Mean:  1 Variance:  2 =  2 –  1 2 Coefficient of variation: cv =  /  1

25 25 Matching One Moment Exponential distribution: = 1/  1

26 26 Matching Two Moments cv 2 01 Exponential Distribution

27 27 Matching Two Moments cv 2 01 Exponential Distribution Generalized Erlang Distribution

28 28 cv 2 Matching Two Moments 01 Generalized Erlang Distribution Two-Phase Coxian DistributionExponential Distribution

29 29 Matching Moments: Example 1 Weibull distribution: W(1,1/2)  1 = 2, cv 2 = 5 10 1/10 9/10 1/10 W(1,1/2) 12345678 0.5 1.0 t F(t)F(t)

30 30 Matching Moments: Example 2 Uniform distribution: U(0,1)  1 = 1/2, cv 2 = 1/3 10 66 2 6 0.5 1.0 F(t)F(t) U(0,1) t 12

31 31 Matching More Moments Closed-form solution for matching three moments of positive distributions [Osogami & Harchol-Balter 2003] Combination of Erlang distribution and two-phase Coxian distribution

32 32 Approximating GSMDP with Continuous-time MDP Each event with a non-exponential distribution is approximated by a set of events with exponential distributions Phases become part of state description

33 33 Policy Execution Phases represent discretization into random-length intervals of the time events have been enabled Phases are not part of real model Simulate phase transition during execution

34 34 The Foreman’s Dilemma When to enable “Service” action in “Working” state? Working c = 1 Failed c = 0 Serviced c = 0.5 Service Exp(10) Return Exp(1) Fail G Replace Exp(1/100)

35 35 The Foreman’s Dilemma: Optimal Solution Find t 0 that maximizes v 0 Y is the time to failure in “Working” state

36 36 The Foreman’s Dilemma: SMDP Solution Same formulas, but restricted choice: Action is immediately enabled ( t 0 = 0 ) Action is never enabled ( t 0 = ∞ )

37 37 The Foreman’s Dilemma: Performance Failure-time distribution: U(5,x) x Percent of optimal 100 90 80 70 60 50 5101520253035404550

38 38 The Foreman’s Dilemma: Performance Failure-time distribution: W(1.6x,4.5) x Percent of optimal 100 90 80 70 60 50 5101520253035400

39 39 System Administration Network of n machines Reward rate c(s) = k in states where k machines are up One crash event and one reboot action per machine At most one action enabled at any time

40 40 System Administration: Performance Reward 50 45 40 35 30 25 20 15 12345678910111213 n Reboot-time distribution: U(0,1)

41 41 System Administration: Performance 1 moment2 moments3 moments sizestates q time (s)states q time (s)states q time (s) 41650.363293.5711221.010.30 53260.8280107.7227222.022.33 66471.891921116.2464023.040.98 712883.654481228.04147224.069.06 825696.9810241348.11332825.0114.63 95121016.0423041480.27742426.0176.93 1010241133.58512015136.41638427.0291.70 1120481266.002457616264.173584028.0481.10 12409613111.965324817646.977782429.01051.33 13819214210.03114688182588.9516793630.03238.16 2n2n (n+1)2 n (1.5n+1)2 n

42 42 The Role of Phases Foreman’s dilemma: Phases permit delay in enabling of actions System administration: Phases allow us to take into account the time an action has already been enabled

43 43 Summary Generalized semi-Markov (decision) processes allow asynchronous events Phase-type distributions can be used to approximate a GSMDP with an MDP Allows us to approximately solve GSMDPs using existing MDP techniques Phase does matter!

44 44 Future Work Discrete phase-type distributions Handles deterministic distributions Avoids uniformization step Value function approximation Take advantage of GSMDP structure Other optimization criteria Finite horizon, etc.

45 45 Tempastic-DTP A tool for GSMDP planning: http://www.cs.cmu.edu/~lorens/tempastic-dtp.html


Download ppt "Decision-Theoretic Planning with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University."

Similar presentations


Ads by Google