Dependability Evaluation through Markovian model.

Slides:



Advertisements
Similar presentations
COE 444 – Internetwork Design & Management Dr. Marwan Abu-Amara Computer Engineering Department King Fahd University of Petroleum and Minerals.
Advertisements

11. Practical fault-tolerant system design Reliable System Design 2005 by: Amir M. Rahmani.
LIFE CYCLE MODELS FORMAL TRANSFORMATION
Fault Tree Analysis Part 12 – Redundant Structure and Standby Units.
Chapter 8 Continuous Time Markov Chains. Markov Availability Model.
6. Reliability Modeling Reliable System Design 2010 by: Amir M. Rahmani.
1 Chapter 5 Continuous time Markov Chains Learning objectives : Introduce continuous time Markov Chain Model manufacturing systems using Markov Chain Able.
Markov Analysis Jørn Vatn NTNU.
SMJ 4812 Project Mgmt and Maintenance Eng.
Oct. 2007State-Space ModelingSlide 1 Fault-Tolerant Computing Motivation, Background, and Tools.
Al-Imam Mohammad Ibn Saud University
Reliable System Design 2011 by: Amir M. Rahmani
Lecture 13 – Continuous-Time Markov Chains
Oct State-Space Modeling Slide 1 Fault-Tolerant Computing Motivation, Background, and Tools.
1 Software Testing and Quality Assurance Lecture 36 – Software Quality Assurance.
Oct Combinational Modeling Slide 1 Fault-Tolerant Computing Motivation, Background, and Tools.
A. BobbioBertinoro, March 10-14, Dependability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica Università del Piemonte.
Using Rational Approximations for Evaluating the Reliability of Highly Reliable Systems Z. Koren, J. Rajagopal, C. M. Krishna, and I. Koren Dept. of Elect.
1 Chapter Fault Tolerant Design of Digital Systems.
A. BobbioReggio Emilia, June 17-18, Dependability & Maintainability Theory and Methods Part 2: Repairable systems: Availability Andrea Bobbio Dipartimento.
Dependability Evaluation. Techniques for Dependability Evaluation The dependability evaluation of a system can be carried out either:  experimentally.
1 Software Testing and Quality Assurance Lecture 35 – Software Quality Assurance.
1 Spare part modelling – An introduction Jørn Vatn.
Introduction to Dependability slides made with the collaboration of: Laprie, Kanoon, Romano.
DynaTraffic – Models and mathematical prognosis
PowerPoint presentation to accompany
Introduction to Stochastic Models GSLM 54100
Software Reliability SEG3202 N. El Kadri.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Introduction to Dependability. Overview Dependability: "the trustworthiness of a computing system which allows reliance to be justifiably placed on the.
Lecture 2: Combinatorial Modeling CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at.
Fault-Tolerant Systems Design Part 1.
Maintenance Policies Corrective maintenance: It is usually referred to as repair. Its purpose is to bring the component back to functioning state as soon.
Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC.
Generalized Semi- Markov Processes (GSMP). Summary Some Definitions The Poisson Process Properties of the Poisson Process  Interarrival times  Memoryless.
Chapter 61 Continuous Time Markov Chains Birth and Death Processes,Transition Probability Function, Kolmogorov Equations, Limiting Probabilities, Uniformization.
Fault-Tolerant Computing Systems #4 Reliability and Availability
Reliability and availability considerations for CLIC modulators Daniel Siemaszko OUTLINE : Give a specification on the availability of the powering.
Relevant Subgraph Extraction Longin Jan Latecki Based on : P. Dupont, J. Callut, G. Dooms, J.-N. Monette and Y. Deville. Relevant subgraph extraction from.
CS433 Modeling and Simulation Lecture 07 – Part 01 Continuous Markov Chains Dr. Anis Koubâa 14 Dec 2008 Al-Imam.
TOPIC : Transition Count Compression UNIT 5 : BIST and BIST Architectures Module 5.4 Compression Techniques.
CS433 Modeling and Simulation Lecture 11 Continuous Markov Chains Dr. Anis Koubâa 01 May 2009 Al-Imam Mohammad Ibn Saud University.
Meaning of Markov Chain Markov Chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only.
REPAIRABLE SYSTEMS 1 AVAILABILITY ENGINEERING. A Single Repairable Component With Failure Rate λ and Repair Rate μ The component may exist in one of Two.
1 Section 8.2 Basics of Hypothesis Testing Objective For a population parameter (p, µ, σ) we wish to test whether a predicted value is close to the actual.
Fault Tree Analysis Part 11 – Markov Model. State Space Method Example: parallel structure of two components Possible System States: 0 (both components.
1 Chapter 5 Continuous time Markov Chains Learning objectives : Introduce continuous time Markov Chain Model manufacturing systems using Markov Chain Able.
CS203 – Advanced Computer Architecture Dependability & Reliability.
Reliability Engineering
Prof. Enrico Zio Availability of Systems Prof. Enrico Zio Politecnico di Milano Dipartimento di Energia.
Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.4.1 FAULT TOLERANT SYSTEMS Part 4 – Analysis Methods Chapter 2 – HW Fault Tolerance.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
Discrete-time Markov chain (DTMC) State space distribution
ECE 753: FAULT-TOLERANT COMPUTING
Availability Availability - A(t)
Fault-Tolerant Computing Systems #5 Reliability and Availability2
Quantitative evaluation of Dependability
Testing a Claim About a Mean:  Known
Markov Chains Carey Williamson Department of Computer Science
Reliability.
The Exponential Distribution
Quantitative evaluation of Dependability
T305: Digital Communications
Reliability Engineering
Carey Williamson Department of Computer Science University of Calgary
Overview Dependability: "[..] the trustworthiness of a computing system which allows reliance to be justifiably placed on the service it delivers [..]"
Discrete-time markov chain (continuation)
Solutions Markov Chains 6
Quantitative evaluation of dependability
Lecture 11 – Stochastic Processes
Presentation transcript:

Dependability Evaluation through Markovian model

Markovian model The combinatorial methods are unable to: - take care easily of the coverage factor - model the maintenance The Markov model is an alternative to the combinatorial methods. Two main concepts: - state - state transition

State and state transitions State: the state of a system represents all that must be known to describe the system at any given instant of time For the reliability/availability models each state represents a distinct combination of faulty and fault-free components State transitions govern the changes if state that occur within a system For the reliability/availability models each transition take place when one or more components change state for the event of a fault or for a repair action

State and state transitions (cnt.) State transitions are characterized by probabilities, such probability of fault, fault coverage and the probability of repair The probability of being in any given state, s, at some time,t+  t depends both: – the probability that the system was in a state from which it could transit to state state s given that the transition occurs during  t – the probability that the system was in state s at instant t and there was no event in the interval time  t The initial state should be any state, normally it is that representing all fault-free components IMPORTANT: IN A MARKOV CHAIN THE PROBABILITY TRANSITION DEPEND ONLY BY THE ACTUAL STATE, i.e. THE STATE CAPTURES THE HISTORY OF THE OLD TRANSITIONS

TMR reliability evaluation IO C1C1 C2C2 C3C3 r/n There are 4 components (1 voter + computation module), therefore each state is represented by 4 bit: if the component is fault-free then the bit value is 1 otherwise the bit value is 0. For example (1,1,1,1) represents the faut-free state For example (0,0,0,0) represents all components faulty

TMR reliability evaluation: states diagram

Markov chain reliability evaluation methodology State transition probability evaluation: If the fault occurence of a component is exponentially distributed (e - t ) with fault rate equal to ( ), then the probability that the fault-free component at istant t in the interval  t become faulty is equal to: 1 – e -  t

Probability property Prob{there is a fault between t e t+  t} = = Prob{there is a fault before t+  t / the component was fault-free at t} = = Prob{there is a faul before t+  t and the component was fault-free at t} Prob{the component was fault-free at t} = Prob{there is a fault before t+  t}  Prob{there is a fault before t} = Prob{the component was fault-free at t} = (1 – e - (t+  t) )  (1 – e - t )  1 – e - (t+  t)  1 + e - t = e - t e - t

Probability property = e - t – e - (t+  t) = e - t = e - t _ e - (t+  t) = 1  e -  t e - t e - t If we expand the exponential part we have the following series: 1 – e -  t = 1  1 + (  t) + (  t) 2 + … 2!   t  (  t) 2  … 2! For value of  t << 1, we have the following good approximation: 1 – e -  t   t

TMR reliability evaluation: reduced states diagram State (3,1)  (1,1,1,1) State (2,1)  (0,1,1,1) + (1,0,1,1) + (1,1,0,1) State (G)  all the other states Transition probability (in the interval between t and t+  t): · from state (3,1) to state (2,1) -> 3 e  t ; · from state (3,1) to state (G) -> v  t ; from state(2,1) to state (G) -> 2 e  t+ v  t.

TMR reliability evaluation Transition probability (in the interval between t and t+  t): · from state (3,1) to state (2,1) -> 3 e  t ; · from state (3,1) to state (G) -> v  t ; from state(2,1) to state (G) -> 2 e  t+ v  t.

TMR reliability evaluation Given the Markov process properties, i.e. the probability of being in any given state, s, at some time, t+  t depends both: – the probability that the system was in a state from which it could transit to state state s given that the transition occurs during  t – the probability that the system was in state s at instant t and there was no event in the interval time  t we have that P (3,1) (t+  t) = (1  3 e  t  v  t) P (3,1) (t) P (2,1) (t+  t) = 3 e  t P (3,1) (t) + (1  2 e  t  v  t) P (2,1) (t) P (G) (t+  t) = v  t P (3,1) (t) + (2 e  t + v  t) P (2,1) (t) + P (G) (t)

TMR reliability evaluation With algebric operations:  t  0 P (3,1) (t+  t)  P (3,1) (t) =  (3 e + v ) P (3,1) (t) = d P (3,1) (t)  t dt  t  0 P (2,1) (t+  t)  P (2,1) (t) = 3 e P (3,1) (t)  (2 e + v ) P (2,1) (t) = d P (2,1) (t)  t dt  t  0 P (G) (t+  t)  P (G) (t) = v P (3,1) (t) + (2 e + v ) P (2,1) (t) = d P (G) (t)  t dt

TMR reliability evaluation i.e: P' 3,1 (t) =  (3 e + v )P 3,1 (t) P' 2,1 (t) = 3 e P 3,1 (t)  (2 e + v )P 2,1 (t) P' G (t) = v P 3,1 (t) + (2 e + v )P 2,1 (t) That in matrix notation can be expressed as:  (t) =  (t) Q(t) dt (P' 3,1 P' 2,1 P' G ) = (P 3,1 P 2,1 P G ) * Q 

TMR reliability evaluation the reliability is the probability of being in any fault-free state, i.e, in this case of being in state (3,1) or (2,1). R(t) = P 3,1 (t) + P 2,1 (t) = 1  P G (t) with the initial conditionP 3,1 (0) = 1

aaTMR reliability evaluation where:  ( 3 e + v ) 3 e v Q = 0  ( 2 e + v ) (2 e + v ) P = Q + I  Q = P  I 1  ( 3 e + v ) 3 e v P = 0 1  ( 2 e + v ) (2 e + v ) 0 0 1

Properties of Laplace’s transformation

Markov Processes for maintenable systems Two kind of events: - fault of a component (module or voter) - repair of the system (of a module or the voter or both) Hypothesis: the maintenance process is exponentially distributed with repair rate equal to 

Availability evaluation of TMR system P 3,1 (t) + P 2,1 (t) + P G (t) = 1P 3,1 (0) = 1 P’ 3,1 (t) =  ( 3 e + v ) P 3,1 (t) +  P 2,1 (t) +  P G (t) P’ 2,1 (t) = 3 e P 3,1 (t)  (2 e + v +  ) P 2,1 (t) P’ G (t) = v P 3,1 (t) + (2 e + v ) P 2,1 (t)   P G (t) d  (t) =  (t) Q(t) dt i.e. (P' 3,1 P' 2,1 P' G ) = (P 3,1 P 2,1 P G ) * Q

Availability evaluation of TMR system  ( 3 e + v ) 3 e v Q =  ( 2 e + v +  ) (2 e + v )  0    Q = P  I  P = Q + I 1  ( 3 e + v ) 3 e v P =  1  (2 e + v +  ) (2 e + v )  0 1  

Istantaneous Availability evaluation of TMR system the Istantaneous Availability is the probability of being in any fault-free state, i.e, in this case of being in state (3,1) or (2,1). A(t) = P 3,1 (t) + P 2,1 (t) = 1  P G (t) with the initial conditionP 3,1 (0) = 1

Limiting or steady state Availability evaluation of TMR system P 3,1 (t) + P 2,1 (t) + P G (t) = 1P 3,1 (0) = 1 with t   we have that  P’(t) = 0 P’ 3,1 (t) = 0 =  ( 3 e + v ) P 3,1 (t) +  P 2,1 (t) +  P G (t) P’ 2,1 (t) = 0 = 3 e P 3,1 (t)  (2 e + v +  ) P 2,1 (t) P’ G (t) = 0 = v P 3,1 (t) + (2 e + v ) P 2,1 (t)   P G (t)

Limiting or steady state Availability evaluation of TMR system P 3,1 (t) + P 2,1 (t) + P G (t) = 1P 3,1 (0) = 1 with t   we have that  P’(t) = 0 and P(t) = P P’ 3,1 (t) = 0 =  ( 3 e + v ) P 3,1 +  P 2,1 +  P G P’ 2,1 (t) = 0 = 3 e P 3,1  (2 e + v +  ) P 2,1 P’ G (t) = 0 = v P 3,1 + (2 e + v ) P 2,1 (t)   P G

Limiting or steady state Availability evaluation of TMR system P 3,1 + P 2,1 + P G = 1 P 3,1 = P 2,1 = P G =

Safety evaluation Four kind of events: - fault of a component (module or voter) correcttly diagnoticated - fault of a component not detected - correct repair of the system (of a module or the voter or both) - uncorrect repair of the system  fault rate  repair rate C g  fault detection coverage factor C r  correct repair coverage factor

Single component Safety evaluation 0  fault free state GS  safe fault state GI  unsafe fault state Hypothesis: -if a fault is not well diagnosticated then it will never be detected -If a reconfiguration is not wel done then it will be never detected Therefore GI is an absorbing state

Single component Safety evaluation Safety = probability to stay in state 0 or GS P O (t) + P GS (t) = 1  P GI (t)P O (0) = 1 P’ O (t) =  ( (1  C g ) + C g )) P O (t) +  C r P GS (t) P’ GS (t) = C g P O (t)  (  (1  C r )+  C r ) P GS (t) P’ GI (t) = (1  C g ) P O (t) + (  (1  C r )P GS (t)

Single component Safety evaluation  C g (1-C g ) Q = CrCr  (1-C r ) 00 0 d  (t) =  (t) Q(t) dt i.e. (P' 3,1 P' 2,1 P' G ) = (P 3,1 P 2,1 P G ) * Q

Performability Index taking into account even the performance of the system given its state (related to the number of fault-free components) We will discuss it when we will know how evaluate the performance of a system

I O C 21 C 23 C 22 C 111 C 12 C 112 C 11 C1C1 C2C2 R 11 =  R 111. R 112 R 1 = 1 - (1 - R 11 ). (1 - R 12 ) R 2 = 1 - (1 - R 21 ). (1 - R 22 ). (1 - R 23 ) R =  R 1. R 2 Reliability/Availability/Safety evaluation of complex system