Continuous Time Markov Chains

Continuous Time Markov Chains
Chapter 8 Continuous Time Markov Chains

Definition A discrete-state continuous-time stochastic process is called a Markov chain if for t0 < t1 < t2 < …. < tn < t , the conditional pmf satisfies the relation A CTMC is characterized by state changes that can occur at any arbitrary time Index space is continuous. The state space is discrete valued.

Continuous Time Markov Chain (CTMC)
A CTMC can be completely described by: Initial state probability vector for X(t0): Transition probabilities. Also,

Homogenous CTMCs is a time-homogenous CTMC iff
Or, the conditional pmf satisfies: A CTMC is said to be irreducible if every state can be reached from every other state, with a non-zero probability. A state is said to be absorbing if no other state can be reached from it with non-zero probability.

CTMC Chapman-Kolmogorov Equation
It can also be written as : In the matrix form, (Matrix Q is called the infinitesimal generator matrix (or simply Generator Matrix)

CTMC Steady-state Solution
Steady state solution of CTMC Irreducible CTMCs having +ve steady-state {πj} values are called recurrent non-null. Performance measures may be computed by assigning reward rates to states and computing expected steady state reward rates Accumulated reward (over an interval of time) Transient solutions, in general are rather difficult to obtain.

Continuous Time Birth-Death Process
The CTMC and i={0,1,2,…} forms a B-D process, if λi, i={0,1,2,..} and μi, i={1,2,..} exists, and λi: Birth rate (>= 0) and μi: Death rate (>= 0) Note that these values of qi,j imply that the all the transition pmfs are exponentially distributed.

Continuous Time Birth-Death Process (contd.)
In Steady-state,

Steady State Equations
These are called balance eqs. Re-arranging above, = 0

M/M/1 Queue Arrivals follow Poisson distribution, i.e., inter-arrival times are all i.i.d, EXP(λ). Inter-departure times are i.i.d, EXP(μ). N(t): birth-death proc., λk=λ; μk=μ. Define, ρ=λ/μ (traffic intensity, in Erlangs) Poisson arrival Process with rate λ

M/M/1 queue (contd.) From the balance flow equations, we get
ρ < 1 (for reasons of stability). Expected # of customers,

M/M/1 queue (contd.) This measure can be viewed as a weighted average, By choosing suitable weights to the states of a CTMC, we can get most measures of interest and the resulting model is known as the MRM(Markov Reward Model). Other measures: Average queue length (E[n]) Average (expected) response time Average (expected) wait time etc.

M/M/1 queue: Little’s formula
Let the random variable R denote the response time (defined as the time elapsed from the instant of job arrival until its completion) Little’s law states E[R] = E[N]/λ Here Response time (R) = wait time (W) + service time (S) E[W] = E[R] – E[S] = 1/μ(1-ρ) - 1/ μ .

Response time distribution (tagged job approach)
Assuming FCFS and steady-state conditions If there are already n jobs in the system, the next job (N+1)st will experience a response time =R= S*+S’1+S2+..+SN S* : service time for the (N+1)st job; S’1+: residual service time for job currently undergoing service (#1). Because of the memory-less property, these times are EXP( ). Hence, for some N=n, the LST of R is, Therefore,

M/M/m queue m-servers service the queue. Poisson arrivals (λ) μ

M/M/m Queue Solution

M/M/m Queue performance measures
Average queue length E[N]: rk = k

M/M/m Queue performance measures
Server utilization: rv M - number of busy servers. For number of customers 0 <= k <= m, the number of busy servers = k. Beyond that the number of busy servers = m. A customer may have to join the queue. If # items k =0,1,..,m-1, then only k processors are busy. If k >= m, then m procs are busy.

Poisson stream behavior
M/M/m: input/output both form Poisson streams. m=2 case Case 1: Two independent queues Case 2: M/M/2 case Two separate Poisson streams  2 separate M/M/1 queues Two separate Poisson streams Combined Poisson steams

Comparative performance
Case 1: For each M/M/1 queue, Case 2: Common queue M/M/2

M/M/1/n Queue Finite queue size, finite buffer space  finite state space. Steady State Solution:

M/M/1/n Queue Performance Measures
Mean queue length (expected # of jobs in the system). rk = k, Loss probability rn = 1, rk = 0, k=0,1,..,n-1 Throughput rk = m , k=1,2, ..,n; r0 = 0 (or, rk = l , k=0,1,2, ..,n-1; rn = 0)

M/M/1/n: Response time distribution
Response time distribution: Job may be rejected (or accepted) Unconditional Conditional (conditioned on the job being accepted): Reward assignment: for the kth state, response time experienced by the tagged task is sum of k-service times, each of which is EXP(μ), i.e., k-stage Erlang. Conditional

Special cases of Birth-Death Process
Pure birth processes Poisson process Software Reliability Growth Model: NHPP Number of software failures occurring in (0, t] is N(t), and N(t) is Poisson with, λ(t) = abe-bt and m(t) = E[N(t)] = a(1- e-bt) Instantaneous failure intensity, λ(t) = b[a-m(t)] Transient solution may be found using Laplace transforms Pure death processes No-repairs

Markov Availability Model

2-State Markov Availability Model
UP 1 DN 1) Steady-state balance equations for each state: Rate of flow IN = rate of flow OUT State1: State0: 2 unknowns, 2 equations, but there is only one independent equation.

2-State Markov Availability Model (Continued)
Need an additional equation: Downtime in minutes per year = * 8760*60

2) Transient Availability for each state: Rate of buildup = rate of flow IN - rate of flow OUT This equation can be solved to obtain assuming P1(0)=1

3) 4) Steady State Availability:

Using SHARPE to Solve the models

Markov availability model
Assume we have a two-component parallel redundant system with repair rate . Assume that the failure rate of both the components is . When both the components have failed, the system is considered to have failed.

Markov availability model (Continued)
Let the number of properly functioning components be the state of the system. The state space is {0,1,2} where 0 is the system down state. We wish to examine effects of shared vs. non-shared repair.

2 1 Non-shared (independent) repair 2 1 Shared repair

Note: Non-shared case can be modeled & solved using a RBD or a FTREE but shared case needs the use of Markov chains.

Steady-state balance equations
For any state: Rate of flow in = Rate of flow out Consider the shared case i: steady state probability that system is in state i

Steady-state balance equations (Continued)
Hence Since We have or

Steady-state balance equations (Continued)
Steady-state unavailability = 0= 1 - Ashared Similarly for non-shared case, steady-state unavailability = 1 - Anon-shared Downtime in minutes per year = (1 - A)* 8760*60

Steady-state balance equations

Homework Return to the 2 control and 3 voice channels example and assume that the control channel failure rate is c, voice channel failure rate is v. Repair rates are c and v, respectively. Assuming a single shared repair facility and control channel having preemptive repair priority over voice channels, draw the state diagram of a Markov availability model. Using SHARPE GUI, solve the Markov chain for steady-state and instantaneous availability.

Markov Reliability Model

Markov reliability model with repair
Consider the 2-component parallel system but disallow repair from system down state Note that state 0 is now an absorbing state. The state diagram is given in the following figure. This reliability model with repair cannot be modeled using a reliability block diagram or a fault tree. We need to resort to Markov chains. (This is a form of dependency since in order to repair a component you need to know the status of the other component).

Markov reliability model with repair (Continued)
Absorbing state Markov chain has an absorbing state. In the steady-state, system will be in state 0 with probability 1. Hence transient analysis is of interest. States 1 and 2 are transient states.

Assume that the initial state of the Markov chain is 2, that is, P2(0) = 1, Pk (0) = 0 for k = 0, 1. Then the system of differential Equations is written based on: rate of buildup = rate of flow in - rate of flow out for each state

After solving these equations, we get R(t) = P2(t) +P1(t) Recalling that , we get:

Note that the MTTF of the two component parallel redundant system, in the absence of a repair facility (i.e.,  = 0), would have been equal to the first term, 3 / ( 2* ), in the above expression. Therefore, the effect of a repair facility is to increase the mean life by  / (2*2), or by a factor

Markov Reliability Model with Imperfect Coverage

Markov model with imperfect coverage
Next consider a modification of the above example proposed by Arnold as a model of duplex processors of an electronic switching system. We assume that not all faults are recoverable and that c is the coverage factor which denotes the conditional probability that the system recovers given that a fault has occurred. The state diagram is now given by the following picture:

Now allow for Imperfect coverage

Markov model with imperfect coverage (Continued)
Assume that the initial state is 2 so that: Then the system of differential equations are:

Markov model with imperfect coverage (Continued)
After solving the differential equations we obtain: R(t)=P2(t) + P1(t) From R(t), we can system MTTF: It should be clear that the system MTTF and system reliability are critically dependent on the coverage factor.

2-component Availability model with detection delay
Steady state availability Ass = 1-π0 Failures detection stage takes random time, EXP(δ) Down states are ‘0’ and ‘1D’  Ass = 1- π0- π1D Therefore, steady state unavailability U(δ) is given by

2-component availability model with finite coverage
Coverage factor = c (probability that the fault is covered) ‘1C’ state is a re-boot (down) state.

2-components availability model : delay+finite coverage
Model has detection delay+coverage factor Down states are ‘0’, ‘1C’ and ‘1D’.

Preventive Maintenance example
Prolonged usage of a component may lead to increased failure rate (i.e. IFR situation) Hence, life time may be modeled as HypoEXP() distribution, say 2-stage Hypo. Component is inspected randomly. Time between inspections is a random, following EXP(λi). Inspection completion time is EXP(μi). What does inspection do? First stage of life – no action Second stage of life – repair That is, preventive maintenance State = <#stage, faulty>

Performance Models Example: 2-servers with different service times.
State = <n1, n2> Performance: Average no. of jobs in the system, E[n1+n2] Reward rate rn1, n2 = n1+n2 Except for the <0,0>, in all other states, viz., <k,0> and <k,1>, there are k jobs in the system.

SOURCES OF COVERAGE DATA
Measurement Data from an Operational system: Large amount of data needed; Improved Instrumentation Needed Fault/Error Injection Experiments Costly yet badly needed: tools from CMU, Illinois, Toulouse

SOURCES OF COVERAGE DATA (Continued)
A Fault/Error Handling Submodel Phases of FEHM: Detection, Location, Retry, Reconfig, Reboot Estimate Duration & Prob. of success of each phase IBM(EDFI), HARP(FEHM), Draper(FDIR)

Homework 6: Modify the Markov model with imperfect coverage to allow for finite time to detect as well as imperfect detection. You will need to add an extra state, say D. The rate at which detection occurs is  . Draw the state diagram and using SHARPE GUI investigate the effects of detection delay on system reliability and mean time to failure.

Continuous Time Markov Chains

Similar presentations

Presentation on theme: "Continuous Time Markov Chains"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Continuous Time Markov Chains

Similar presentations

Presentation on theme: "Continuous Time Markov Chains"— Presentation transcript:

Similar presentations

About project

Feedback