Download presentation
Presentation is loading. Please wait.
1
Markov Method of Availability Analysis, APS systems, and Availability Simulation Methods E E 681 Module 21 W. D. Grover TRLabs & University of Alberta © Wayne D. Grover 2002, 2003
2
E E 681 - Module 21 © Wayne D. Grover 2002, 2003 2 Markov chain: A system with discrete states in statistical equilibrium and with constant memoryless probabilities of making transition between states... Example: two-state working-failed system Markov Techniques for Availability Analysis 1 2 p 12 = p 21 = p 22 = 1- p 11 = 1- State 1: working State 2: failed = 1/MTTF = failure rate = 1/MTTR = repair rate
3
E E 681 - Module 21 © Wayne D. Grover 2002, 2003 3 Consider that: This means that: 1 = prob. of being in state 1 n = time-step p ij = state transition probability Thus.... “spinning” the transition matrix Once a vector is known, availability is known or computable over non-operating states Etc. Markov Techniques for Availability Analysis
4
E E 681 - Module 21 © Wayne D. Grover 2002, 2003 4 That is, that the state probability vector doesn’t change under repeated multiplication by matrix P. Which can be solved numerically for . A separate computational method is based on observing that the system must reach steady-state probabilities wherein, as every time epoch passes, the state occupancy probabilities remain unchanged, implying that:...or more generally that : is reached in the limit. Markov Techniques for Availability Analysis
5
E E 681 - Module 21 © Wayne D. Grover 2002, 2003 5 Functional view of 1+1 APS Markov chain model of 1 + 1 APS Markov Techniques: Model of a 1+1 APS System State transition probability matrix Failure state
6
E E 681 - Module 21 © Wayne D. Grover 2002, 2003 6 Functional view of 1:N APS K1-K2 byte signalling: K1 - Head end bridge request (channel number) K2- Head end bridge confirm (channel number) Markov Techniques: Model of 1:N APS Systems
7
E E 681 - Module 21 © Wayne D. Grover 2002, 2003 7 Availability model view of 1:N APS U w = working system unavailability U s = spare system unavailability U b1 = head end bridge unavailability, failure mode 1 U b2 = head end bridge unavailability, failure mode 2 U t1 = tail end transfer unavailability, failure mode 1 U t2 = tail end transfer unavailability, failure mode 2 Markov Techniques: Models of APS Systems
8
E E 681 - Module 21 © Wayne D. Grover 2002, 2003 8 Method: List outage-causing failure combinations from functional inspection / system knowledge: Outage if: (1) two or more working systems fail. (2) If HEB or TET fail at any time in mode 2, or the spare span fails, and there is one or more working system failure. (3) If HEB or TET fail at any time in mode 1 there is outage. 2 or more working systems 1 working system and spare or HEB / TET active-function HEB / TET through-function by itself: “series element” “Cutset” (algebraic) model of APS availability
9
E E 681 - Module 21 © Wayne D. Grover 2002, 2003 9 Markov chain model of 1:N APSOutage-causing states Markov Techniques: Models of APS Systems
10
E E 681 - Module 21 © Wayne D. Grover 2002, 2003 10 It is usual to provide redundancy in the form of a “standby” system to enhance availability. It is also easy to overlook the impact of “silent failure” of the protection system. 1) Both working and spare sys- tems up 2) “working sys- tem” down, spare takes over l l m 3) Both working and spare down 1- m l l - m 2m Ideal view: (spare assumed always to be ready) Actual system (spare can have “silent failure”) This can be approximated by the above iff the spare unit is routinely “exercised” An engineering issue: “exercising the spare”
11
E E 681 - Module 21 © Wayne D. Grover 2002, 2003 11 Basic idea: - Represent system in Markov chain form - Simulate random transitions and observe state probabilities (instead of attempting symbolic analysis of the model) Some Advantages: - can obtain experimental distributions of frequency / duration data - can model transition rates that are state dependent - can drive system through transient episodes on non-stationary statistical behaviour Limitation: - strictly only valid model of real systems for negative exponential (i.e., memoryless) failure and repair models - reason is that transitions are generated into / out-of states, they are not generated based on individual equipment items Simulation techniques inspired by Markov modelling
12
E E 681 - Module 21 © Wayne D. Grover 2002, 2003 12 Basic simulation engine: t= 0; assign initial state S(t=0); repeat generate random variable x i for each transition type exiting current state; t := t+ min{x 1, x 2,....X n }, k <- i : x i is min. (i.e. k indicates transition selected) ; next_state := case of {current_state, transition k} record / update any relevant statistical data -examples: time since last entering a failure state,.... Until {forever} General method for generating r.v.s with any desired probability density function: For “memoryless” processes: u* is uniform r.v. on (0,1) t* is then distributed as: Simulation inspired by Markov modelling(2)
13
E E 681 - Module 21 © Wayne D. Grover 2002, 2003 13 Method: - Identify each independent subsystem or component that is subject to failure and repair. - Specify (in CDF form) the distribution of times between failure and repair times for that element. - For each element individually simulate a time-line history of its failure and repair. - Merged the event-sequences for all elements into one time line - Use the merged time-line to drive the walk through the system state model. - Accumulate all desired statistics such as: total outage time, times between system outage, outage times duration, etc. Even more general simulation method....
14
E E 681 - Module 21 © Wayne D. Grover 2002, 2003 14 Issue is: virtually all analytically obtainable results provide only the mean (not distribution) of outage times duration or times between failure and... simulation based on memorylessness, always implies times between failure and repair times duration are both negative exponentially distributed p r o b Time between failure or time to repair Believable for times between failures but certainly not accurate pdf shape for distribution of repair times. Q. (project): How different is the accurately simulated distribution of outage-time durations from the analytical model ? Q. How much might this affect SLA policies ? (Service Level Agreements) Repair times more like Issue (and related project topic)...
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.