Download presentation
Presentation is loading. Please wait.
Published byAusten Lucas Modified over 9 years ago
1
Decision-Theoretic Planning with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University
2
2 Introduction Asynchronous processes are abundant in the real world Discrete-time models are inappropriate for systems with asynchronous events Generalized semi-Markov (decision) processes are great for this!
3
3 Stochastic Processes with Asynchronous Events m1m1 m2m2 m 1 up m 2 up t = 0
4
4 Stochastic Processes with Asynchronous Events m1m1 m2m2 m 1 up m 2 up m 1 up m 2 down m 2 crashes t = 0t = 2.5
5
5 Stochastic Processes with Asynchronous Events m1m1 m2m2 m 1 up m 2 up m 1 up m 2 down m 1 down m 2 down m 1 crashesm 2 crashes t = 0t = 2.5t = 3.1
6
6 Stochastic Processes with Asynchronous Events m1m1 m2m2 m 1 up m 2 up m 1 up m 2 down m 1 down m 2 down m 1 down m 2 up m 1 crashes m 2 rebooted t = 0t = 2.5t = 3.1t = 4.9
7
7 A Model of Stochastic Discrete Event Systems Generalized semi-Markov process (GSMP) [Matthes 1962] A set of events E A set of states S
8
8 Events In a state s, events E s E are enabled With each event e is associated: A distribution G e governing the time e must remain enabled before it triggers A next-state probability distribution p e (s′|s)
9
9 Semantics of GSMP Model Associate a real-valued clock t e with e For each e E s sample t e from G e Let e* = argmin e E s t e, t* = min e E s t e Sample s′ from p e* (s′|s) For each e E s′ t e ′ = t e – t* if e E s \ {e*} sample t e ′ from G e otherwise Repeat with s = s′ and t e = t e ′
10
10 Semantics: Example m 1 up m 2 up m 1 up m 2 down m 2 crashes m 2 crashes: 2.5 m 1 crashes: 3.1 m 1 crashes: 0.6 reboot m 2 : 2.4
11
11 Notes on Semantics Events that remain enabled across state transitions without triggering are not rescheduled Asynchronous events! Differs from semi-Markov process in this respect Continuous-time Markov chain if all G e are exponential distributions
12
12 General State-Space Markov Chain (GSSMC) Model is Markovian if we include the clocks in the state space Extended state space X Next-state distribution f(x′|x) well-defined Clock values are not know to observer Time events have been enabled is know
13
13 Observation Model An observation o is a state s and a real value u e for each event representing the time e has currently been enabled f(x|o) is well-defined
14
14 Observations: Example m 1 up m 2 up m 1 up m 2 down m 2 crashes m 2 crashes: 2.5 m 1 crashes: 3.1 m 1 crashes: 0.6 reboot m 2 : 2.4 m 1 up m 2 up m 1 up m 2 down m 2 crashes m 2 crashes: 0.0 m 1 crashes: 0.0 m 1 crashes: 2.5 reboot m 2 : 0.0 Observed model: Actual model:
15
15 Actions and Policies (GSMDPs) Identify a set A E of controllable events (actions) A policy is a mapping from observations to sets of actions Action choice can change at any time in a state s
16
16 Rewards and Discounts Lump sum reward k(s,e,s′) associated with transition from s to s′ caused by e Continuous reward rate r(s,a) associated with a being enabled in s Discount factor Unit reward earned at time t counts as e – t
17
17 Value Function for GSMDPs
18
18 GSMDP Solution Method [Younes & Simmons 2004] Continuous-time MDPGSMDPDiscrete-time MDP Phase-type distributionsUniformization [Jensen 1953] GSMDPContinuous-time MDP
19
19 Continuous Phase-Type Distributions [Neuts 1981] Time to absorption in a continuous-time Markov chain with n transient states
20
20 Exponential Distribution 0
21
21 Two-Phase Coxian Distribution 10 p 1 (1 – p) 1 2
22
22 Generalized Erlang Distribution n – 110 … p (1 – p)
23
23 Method of Moments Approximate general distribution G with phase-type distribution PH by matching the first n moments
24
24 Moments of a Distribution The i th moment: i = E[X i ] Mean: 1 Variance: 2 = 2 – 1 2 Coefficient of variation: cv = / 1
25
25 Matching One Moment Exponential distribution: = 1/ 1
26
26 Matching Two Moments cv 2 01 Exponential Distribution
27
27 Matching Two Moments cv 2 01 Exponential Distribution Generalized Erlang Distribution
28
28 cv 2 Matching Two Moments 01 Generalized Erlang Distribution Two-Phase Coxian DistributionExponential Distribution
29
29 Matching Moments: Example 1 Weibull distribution: W(1,1/2) 1 = 2, cv 2 = 5 10 1/10 9/10 1/10 W(1,1/2) 12345678 0.5 1.0 t F(t)F(t)
30
30 Matching Moments: Example 2 Uniform distribution: U(0,1) 1 = 1/2, cv 2 = 1/3 10 66 2 6 0.5 1.0 F(t)F(t) U(0,1) t 12
31
31 Matching More Moments Closed-form solution for matching three moments of positive distributions [Osogami & Harchol-Balter 2003] Combination of Erlang distribution and two-phase Coxian distribution
32
32 Approximating GSMDP with Continuous-time MDP Each event with a non-exponential distribution is approximated by a set of events with exponential distributions Phases become part of state description
33
33 Policy Execution Phases represent discretization into random-length intervals of the time events have been enabled Phases are not part of real model Simulate phase transition during execution
34
34 The Foreman’s Dilemma When to enable “Service” action in “Working” state? Working c = 1 Failed c = 0 Serviced c = 0.5 Service Exp(10) Return Exp(1) Fail G Replace Exp(1/100)
35
35 The Foreman’s Dilemma: Optimal Solution Find t 0 that maximizes v 0 Y is the time to failure in “Working” state
36
36 The Foreman’s Dilemma: SMDP Solution Same formulas, but restricted choice: Action is immediately enabled ( t 0 = 0 ) Action is never enabled ( t 0 = ∞ )
37
37 The Foreman’s Dilemma: Performance Failure-time distribution: U(5,x) x Percent of optimal 100 90 80 70 60 50 5101520253035404550
38
38 The Foreman’s Dilemma: Performance Failure-time distribution: W(1.6x,4.5) x Percent of optimal 100 90 80 70 60 50 5101520253035400
39
39 System Administration Network of n machines Reward rate c(s) = k in states where k machines are up One crash event and one reboot action per machine At most one action enabled at any time
40
40 System Administration: Performance Reward 50 45 40 35 30 25 20 15 12345678910111213 n Reboot-time distribution: U(0,1)
41
41 System Administration: Performance 1 moment2 moments3 moments sizestates q time (s)states q time (s)states q time (s) 41650.363293.5711221.010.30 53260.8280107.7227222.022.33 66471.891921116.2464023.040.98 712883.654481228.04147224.069.06 825696.9810241348.11332825.0114.63 95121016.0423041480.27742426.0176.93 1010241133.58512015136.41638427.0291.70 1120481266.002457616264.173584028.0481.10 12409613111.965324817646.977782429.01051.33 13819214210.03114688182588.9516793630.03238.16 2n2n (n+1)2 n (1.5n+1)2 n
42
42 The Role of Phases Foreman’s dilemma: Phases permit delay in enabling of actions System administration: Phases allow us to take into account the time an action has already been enabled
43
43 Summary Generalized semi-Markov (decision) processes allow asynchronous events Phase-type distributions can be used to approximate a GSMDP with an MDP Allows us to approximately solve GSMDPs using existing MDP techniques Phase does matter!
44
44 Future Work Discrete phase-type distributions Handles deterministic distributions Avoids uniformization step Value function approximation Take advantage of GSMDP structure Other optimization criteria Finite horizon, etc.
45
45 Tempastic-DTP A tool for GSMDP planning: http://www.cs.cmu.edu/~lorens/tempastic-dtp.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.