A. BobbioReggio Emilia, June 17-18, 20031 Dependability & Maintainability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica.

Slides:



Advertisements
Similar presentations
A. BobbioBertinoro, March 10-14, Dependability Theory and Methods 3. State Enumeration Andrea Bobbio Dipartimento di Informatica Università del Piemonte.
Advertisements

TinyOS Tutorial, Part I Phil Levis et al. MobiSys 2003.
WS Choreography v.0-1 Overview This is work-in-progress David Burdett, Commerce One 18 June 2003.
Sinaia, of May Law Enforcement Best Practice Manual For Fighting Against Trafficking of Human Beings Train the Trainers.
Discrete time Markov Chain
The Central Processing Unit: What Goes on Inside the Computer
Algorithms and Data Structures Lecture III
SWCAP Budgeting July 30, 2003.
5.1 Real Vector Spaces.
October 20, Performance Measurement Report to the Finance and Administration Committee of the County of Renfrew Finance and Administration Committee.
A. BobbioReggio Emilia, June 17-18, Dependability & Maintainability Theory and Methods 3. Reliability Block Diagrams Andrea Bobbio Dipartimento di.
Andrea Bobbio Dipartimento di Informatica
Copyright 2004 Koren & Krishna ECE655/DataRepl.1 Fall 2006 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing.
COE 444 – Internetwork Design & Management Dr. Marwan Abu-Amara Computer Engineering Department King Fahd University of Petroleum and Minerals.
Lecture 6  Calculating P n – how do we raise a matrix to the n th power?  Ergodicity in Markov Chains.  When does a chain have equilibrium probabilities?
Queueing Models and Ergodicity. 2 Purpose Simulation is often used in the analysis of queueing models. A simple but typical queueing model: Queueing models.
Markov Chains.
1 A class of Generalized Stochastic Petri Nets for the performance Evaluation of Mulitprocessor Systems By M. Almone, G. Conte Presented by Yinglei Song.
Fault Tree Analysis Part 12 – Redundant Structure and Standby Units.
Chapter 8 Continuous Time Markov Chains. Markov Availability Model.
6. Reliability Modeling Reliable System Design 2010 by: Amir M. Rahmani.
1 Chapter 5 Continuous time Markov Chains Learning objectives : Introduce continuous time Markov Chain Model manufacturing systems using Markov Chain Able.
INDR 343 Problem Session
Discrete Time Markov Chains
Markov Analysis Jørn Vatn NTNU.
Lecture 12 – Discrete-Time Markov Chains
Probability and Statistics with Reliability, Queuing and Computer Science Applications: Chapter 8 on Continuous-Time Markov Chains Kishor Trivedi.
Flows and Networks (158052) Richard Boucherie Stochastische Operations Research -- TW wwwhome.math.utwente.nl/~boucherierj/onderwijs/158052/ html.
Al-Imam Mohammad Ibn Saud University
Markov Reward Models By H. Momeni Supervisor: Dr. Abdollahi Azgomi.
NETE4631:Capacity Planning (3)- Private Cloud Lecture 11 Suronapee Phoomvuthisarn, Ph.D. /
Reliable System Design 2011 by: Amir M. Rahmani
Lecture 13 – Continuous-Time Markov Chains
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Review.
A. BobbioBertinoro, March 10-14, Dependability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica Università del Piemonte.
TCOM 501: Networking Theory & Fundamentals
Chapter 4: Stochastic Processes Poisson Processes and Markov Chains
A. BobbioReggio Emilia, June 17-18, Dependability & Maintainability Theory and Methods Part 2: Repairable systems: Availability Andrea Bobbio Dipartimento.
If time is continuous we cannot write down the simultaneous distribution of X(t) for all t. Rather, we pick n, t 1,...,t n and write down probabilities.
1 Spare part modelling – An introduction Jørn Vatn.
Adviser: Frank, Yeong-Sung Lin Present by Wayne Hsiao.
Chapter 2 Machine Interference Model Long Run Analysis Deterministic Model Markov Model.
A. BobbioBertinoro, March 10-14, Dependability Theory and Methods 2. Reliability Block Diagrams Andrea Bobbio Dipartimento di Informatica Università.
Pisa, 11/25/2002Susanna Donatelli1 Modelling process and heterogeneous model construction Susanna Donatelli Modelling and evaluation groups.
NETE4631:Capacity Planning (2)- Lecture 10 Suronapee Phoomvuthisarn, Ph.D. /
Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC.
Markov Chains X(t) is a Markov Process if, for arbitrary times t1 < t2 < < tk < tk+1 If X(t) is discrete-valued If X(t) is continuous-valued i.e.
Chapter 61 Continuous Time Markov Chains Birth and Death Processes,Transition Probability Function, Kolmogorov Equations, Limiting Probabilities, Uniformization.
Generalized stochastic Petri nets (GSPN)
CS433 Modeling and Simulation Lecture 07 – Part 01 Continuous Markov Chains Dr. Anis Koubâa 14 Dec 2008 Al-Imam.
NETE4631: Network Information System Capacity Planning (2) Suronapee Phoomvuthisarn, Ph.D. /
Discrete Time Markov Chains
Continuous Time Markov Chains
Flows and Networks (158052) Richard Boucherie Stochastische Operations Research -- TW wwwhome.math.utwente.nl/~boucherierj/onderwijs/158052/ html.
CS433 Modeling and Simulation Lecture 11 Continuous Markov Chains Dr. Anis Koubâa 01 May 2009 Al-Imam Mohammad Ibn Saud University.
A Structured Solution Approach for Markov Regenerative Processes Elvio G. Amparore 1, Peter Buchholz 2, Susanna Donatelli 1 1 Dipartimento di Informatica,
Part.2.1 In The Name of GOD FAULT TOLERANT SYSTEMS Part 2 – Canonical Structures Chapter 2 – Hardware Fault Tolerance.
1 Chapter 5 Continuous time Markov Chains Learning objectives : Introduce continuous time Markov Chain Model manufacturing systems using Markov Chain Able.
Reliability Engineering
Flows and Networks (158052) Richard Boucherie Stochastische Operations Research -- TW wwwhome.math.utwente.nl/~boucherierj/onderwijs/158052/ html.
Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.4.1 FAULT TOLERANT SYSTEMS Part 4 – Analysis Methods Chapter 2 – HW Fault Tolerance.
Discrete Time Markov Chains (A Brief Overview)
Discrete-time Markov chain (DTMC) State space distribution
Availability Availability - A(t)
Fault Tolerance & Reliability CDA 5140 Spring 2006
Courtesy of J. Akinpelu, Anis Koubâa, Y. Wexler, & D. Geiger
Reliability Engineering
Discrete time Markov Chain
Discrete time Markov Chain
Discrete-time markov chain (continuation)
Presentation transcript:

A. BobbioReggio Emilia, June 17-18, Dependability & Maintainability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica Università del Piemonte Orientale, “A. Avogadro” Alessandria (Italy) - IFOA, Reggio Emilia, June 17-18, 2003

A. BobbioReggio Emilia, June 17-18, States and labeled state transitions State can keep track of: –Number of functioning resources of each type –States of recovery for each failed resource –Number of tasks of each type waiting at each resource –Allocation of resources to tasks A transition: –Can occur from any state to any other state –Can represent a simple or a compound event State-Space-Based Models

A. BobbioReggio Emilia, June 17-18, Transitions between states represent the change of the system state due to the occurrence of an event Drawn as a directed graph Transition label: –Probability: homogeneous discrete-time Markov chain (DTMC) –Rate: homogeneous continuous-time Markov chain (CTMC) –Time-dependent rate: non-homogeneous CTMC –Distribution function: semi-Markov process (SMP) State-Space-Based Models (Continued)

A. BobbioReggio Emilia, June 17-18, Modeler’s Options Should I Use Markov Models? State-Space-Based Methods + Model Dependencies + Model Fault-Tolerance and Recovery/Repair + Model Contention for Resources + Model Concurrency and Timeliness + Generalize to Markov Reward Models for Modeling Degradable Performance

A. BobbioReggio Emilia, June 17-18, Modeler’s Options Should I Use Markov Models? + Generalize to Markov Regenerative Models for Allowing Generally Distributed Event Times + Generalize to Non-Homogeneous Markov Chains for Allowing Weibull Failure Distributions + Performance, Availability and Performability Modeling Possible - Large (Exponential) State Space

A. BobbioReggio Emilia, June 17-18, In order to fulfil our goals Modeling Performance, Availability and Performability Modeling Complex Systems We Need Automatic Generation and Solution of Large Markov Reward Models

A. BobbioReggio Emilia, June 17-18, Model-based evaluation Choice of the model type is dictated by: –Measures of interest –Level of detailed system behavior to be represented –Ease of model specification and solution –Representation power of the model type –Access to suitable tools or toolkits

A. BobbioReggio Emilia, June 17-18, State space models A transition represents the change of state of a single component x i s s’ Pr {s  s’,  t} = Pr {Z(t+  t) = s’ | Z(t) = s} Z(t) is the stochastic process Pr {Z(t) = s} is the probability of finding Z(t) in state s at time t.

A. BobbioReggio Emilia, June 17-18, State space models If s  s’ represents a failure event: x i s s’ Pr {s  s’,  t} = = Pr {Z(t+  t) = s’ | Z(t) = s} = i  t If s  s’ represents a repair event: Pr {s  s’,  t} = = Pr {Z(t+  t) = s’ | Z(t) = s} =  i  t

A. BobbioReggio Emilia, June 17-18, Markov Process: definition

Transition Probability Matrix initial

State Probability Vector

Chapman-Kolmogorov Equations

Time-homogeneous CTMC

The transition rate matrix

C-K Equations for CTMC

Solution equations

Transient analysis Given that the initial state of the Markov chain, then the system of differential Equations is written based on: rate of buildup = rate of flow in - rate of flow out for each state (continuity equation).

Steady-state condition If the process reaches a steady state condition, then:

Steady-state analysis (balance equation) The steady-state equation can be written as a flow balance equation with a normalization condition on the state probabilities. (rate of buildup) = rate of flow in - rate of flow out rate of flow in = rate of flow out for each state (balance equation).

State Classification

A. BobbioReggio Emilia, June 17-18, component system

A. BobbioReggio Emilia, June 17-18, component system

A. BobbioReggio Emilia, June 17-18, component system

A. BobbioReggio Emilia, June 17-18, component series system A1A1A2 2-component parallel system A1A1 A2

A. BobbioReggio Emilia, June 17-18, component stand-by system A B

Markov Models Repairable systems - Availability

A. BobbioReggio Emilia, June 17-18, Repairable system: Availability

A. BobbioReggio Emilia, June 17-18, Repairable system: 2 identical components

A. BobbioReggio Emilia, June 17-18, Repairable system: 2 identical components

A. BobbioReggio Emilia, June 17-18,  Assume we have a two-component parallel redundant system with repair rate .  Assume that the failure rate of both the components is.  When both the components have failed, the system is considered to have failed. 2-component Markov availability model

A. BobbioReggio Emilia, June 17-18, Markov availability model  Let the number of properly functioning components be the state of the system.  The state space is {0,1,2} where 0 is the system down state.  We wish to examine effects of shared vs. non- shared repair.

A. BobbioReggio Emilia, June 17-18, Non-shared (independent) repair Shared repair Markov availability model

A. BobbioReggio Emilia, June 17-18, Note: Non-shared case can be modeled & solved using a RBD or a FTREE but shared case needs the use of Markov chains. Markov availability model

A. BobbioReggio Emilia, June 17-18, Steady-state balance equations For any state: Rate of flow in = Rate of flow out Considering the shared case  i : steady state probability that system is in state i

A. BobbioReggio Emilia, June 17-18, Steady-state balance equations Hence Since We have Or

A. BobbioReggio Emilia, June 17-18, Steady-state balance equations (Continued) Steady-state Unavailability: For the Shared Case =  0 = 1 - A shared Similarly, for the Non-Shared Case, Steady-state Unavailability = 1 - A non-shared Downtime in minutes per year = (1 - A)* 8760*60

A. BobbioReggio Emilia, June 17-18, Steady-state balance equations

A. BobbioReggio Emilia, June 17-18, Absorbing states MTTF

A. BobbioReggio Emilia, June 17-18, Absorbing states - MTTF

Markov Reliability Model with Imperfect Coverage

A. BobbioReggio Emilia, June 17-18, Markov model with imperfect coverage Next consider a modification of the 2-component parallel system proposed by Arnold as a model of duplex processors of an electronic switching system. We assume that not all faults are recoverable and that c is the coverage factor which denotes the conditional probability that the system recovers given that a fault has occurred. The state diagram is now given by the following picture:

A. BobbioReggio Emilia, June 17-18, Now allow for Imperfect coverage c

A. BobbioReggio Emilia, June 17-18, Markov model with imperfect coverage Assume that the initial state is 2 so that: Then the system of differential equations are:

A. BobbioReggio Emilia, June 17-18, Markov model with imperfect coverage After solving the differential equations we obtain: R(t)=P 2 (t) + P 1 (t) From R(t), we can obtain system MTTF: It should be clear that the system MTTF and system reliability are critically dependent on the coverage factor.

A. BobbioReggio Emilia, June 17-18, Source of fault coverage data Measurement data from an operational system  Large amount of data needed  Improved instrumentation needed Fault-injection experiments  Expensive but badly needed  Tools from CMU,Illinois, LAAS (Toulouse) A fault/error handling submodel (FEHM)  Phases: detection, location, retry, reconfig, reboot  Estimate duration and probability of success of each phase

A. BobbioReggio Emilia, June 17-18, Redundant System with Finite Detection Switchover Time  Modify the Markov model with imperfect coverage to allow for finite time to detect as well as imperfect detection.  You will need to add an extra state, say D.  The rate at which detection occurs is .  Draw the state diagram and investigate the effects of detection delay on system reliability and mean time to failure.

A. BobbioReggio Emilia, June 17-18, Redundant System with Finite Detection Switchover Time Assumptions:  Two units have the same MTTF and MTTR;  Single shared repair person;  Average detection/switchover time t sw =1/  ;  We need to use a Markov model.

A. BobbioReggio Emilia, June 17-18, Redundant System with Finite Detection Switchover Time 1 1D2 0

A. BobbioReggio Emilia, June 17-18, Redundant System with Finite Detection Switchover Time After solving the Markov model, we obtain steady-state probabilities:

A. BobbioReggio Emilia, June 17-18, Closed-form

A. BobbioReggio Emilia, June 17-18, WFS Example

A. BobbioReggio Emilia, June 17-18, A Workstations-Fileserver Example Computing system consisting of: –A file-server –Two workstations –Computing network connecting them System operational as long as: –One of the Workstations and –The file-server are operational Computer network is assumed to be fault-free

A. BobbioReggio Emilia, June 17-18, The WFS Example

A. BobbioReggio Emilia, June 17-18, Assuming exponentially distributed times to failure – w : failure rate of workstation – f : failure rate of file-server Assume that components are repairable –  w : repair rate of workstation –  f : repair rate of file-server File-server has priority for repair over workstations (such repair priority cannot be captured by non-state- space models) Markov Chain for WFS Example

A. BobbioReggio Emilia, June 17-18, Markov Availability Model for WFS 0,0 2,11,1 1,02,0 0,1 f 2 w w ww ww w ff ff ff f f Since all states are reachable from every other states, the CTMC is irreducible. Furthermore, all states are positive recurrent.

A. BobbioReggio Emilia, June 17-18, In the figure, the label (i,j) of each state is interpreted as follows:  i represents the number of workstations that are still functioning  j is 1 or 0 depending on whether the file-server is up or down respectively. Markov Availability Model for WFS (Continued)

A. BobbioReggio Emilia, June 17-18, For the example problem, with the states ordered as (2,1), (2,0), (1,1), (1,0), (0,1), (0,0) the Q matrix is given by: Markov Availability Model for WFS (Continued) Q =

A. BobbioReggio Emilia, June 17-18, Markov Model (steady-state)  : Steady-state probability vector These are called steady-state balance equations rate of flow in = rate of flow out after solving for obtain Steady-state availability

A. BobbioReggio Emilia, June 17-18, We compute the availability of the system: System is available as long as it is in states (2,1) and (1,1). Instantaneous availability of the system: Markov Availability Model

A. BobbioReggio Emilia, June 17-18, Markov Availability Model (Continued)

A. BobbioReggio Emilia, June 17-18, Assume that the computer system does not recover if both workstations fail, or if the file-server fails Markov Reliability Model with Repair

A. BobbioReggio Emilia, June 17-18, Markov Reliability Model with Repair States (0,1), (1,0) and (2,0) become absorbing states while (2,1) and (1,1) are transient states. Note: we have made a simplification that, once the CTMC reaches a system failure state, we do not allow any more transitions.

A. BobbioReggio Emilia, June 17-18, Markov Model with Absorbing States If we solve for P 2,1 (t) and P 1,1 (t) then R(t)=P 2,1 (t) + P 1,1 (t) For a Markov chain with absorbing states: A: the set of absorbing states B =  - A: the set of remaining states z i,j : Mean time spent in state i,j until absorption

A. BobbioReggio Emilia, June 17-18, Markov Model with Absorbing States (Continued) Mean time to absorption MTTA is given as: Q B derived from Q by restricting it to only states in B

A. BobbioReggio Emilia, June 17-18, Markov Reliability Model with Repair (Continued) [ ]

A. BobbioReggio Emilia, June 17-18, Mean time to failure is hours. Markov Reliability Model with Repair (Continued)

A. BobbioReggio Emilia, June 17-18, Assume that neither workstations nor file- server is repairable Markov Reliability Model without Repair

A. BobbioReggio Emilia, June 17-18, Markov Reliability Model without Repair (Continued) States (0,1), (1,0) and (2,0) become absorbing states

A. BobbioReggio Emilia, June 17-18, Mean time to failure is 9333 hours. Markov Reliability Model without Repair (Continued) [ ]