Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

Slides:



Advertisements
Similar presentations
Reliability Engineering (Rekayasa Keandalan)
Advertisements

COE 444 – Internetwork Design & Management Dr. Marwan Abu-Amara Computer Engineering Department King Fahd University of Petroleum and Minerals.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Fault Tree Analysis Part 12 – Redundant Structure and Standby Units.
5/18/2015CPE 731, 4-Principles 1 Define and quantify dependability (1/3) How decide when a system is operating properly? Infrastructure providers now offer.
SMJ 4812 Project Mgmt and Maintenance Eng.
James Ngeru Industrial and System Engineering
E E Module 17 W. D. Grover TRLabs & University of Alberta © Wayne D. Grover 2002, 2003 ATM VP-based (or MPLS path) Restoration with Controlled Over-
Markov Method of Availability Analysis, APS systems, and Availability Simulation Methods E E 681 Module 21 W. D. Grover TRLabs & University of Alberta.
Reliability of Systems
Quantitative Comparison of End-to-End Availability of Service Paths in Ring and Mesh- Restorable Networks Matthieu Clouqueur, Wayne D. Grover
The Architecture Design Process
E E Module 18 M.H. Clouqueur and W. D. Grover TRLabs & University of Alberta © Wayne D. Grover 2002, 2003 Analysis of Path Availability in Span-Restorable.
High availability survivable networks Wayne D. Grover, Anthony Sack 9 October 2007 High Availability Survivable Networks: When is Reducing MTTR Better.
Dependability Evaluation. Techniques for Dependability Evaluation The dependability evaluation of a system can be carried out either:  experimentally.
Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.2.1 FAULT TOLERANT SYSTEMS Part 2 – Canonical.
1 Review Definition: Reliability is the probability that a component or system will perform a required function for a given period of time when used under.
Introduction Before… Next…
THE MANAGEMENT AND CONTROL OF QUALITY, 5e, © 2002 South-Western/Thomson Learning TM 1 Chapter 13 Reliability.
PowerPoint presentation to accompany
System Reliability. Random State Variables System Reliability/Availability.
1 Logistics Systems Engineering Availability NTU SY-521-N SMU SYS 7340 Dr. Jerrell T. Stracener, SAE Fellow.
4. Dynamic reliability models Objectives Be able to find reliability of series, parallel, stand-by and shared load parallel systems, when the reliabilities.
1 Reliability Application Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS.
©2004 Prentice-Hall S. Thomas Foster, Jr. Boise State University PowerPoint prepared by prepared by Dave Magee University of Kentucky Lexington Community.
MS SANNA BT TAKING / July 12, 2006 EMT 361/3: RELIABILITY & FAILURE ANALYSIS.
Chapter 6 Time dependent reliability of components and system.
-Exponential Distribution -Weibull Distribution
Transition of Component States N F Component fails Component is repaired Failed state continues Normal state continues.
1 Logistics Systems Engineering Reliability Fundamentals NTU SY-521-N SMU SYS 7340 Dr. Jerrell T. Stracener, SAE Fellow.
A. BobbioBertinoro, March 10-14, Dependability Theory and Methods 2. Reliability Block Diagrams Andrea Bobbio Dipartimento di Informatica Università.
Lecture 2: Combinatorial Modeling CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at.
Stracener_EMIS 7305/5305_Spr08_ System Reliability Analysis - Concepts and Metrics Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering.
1 © A. Kwasinski, 2015 Cyber Physical Power Systems Fall 2015 Power in Communications.
Maintenance Policies Corrective maintenance: It is usually referred to as repair. Its purpose is to bring the component back to functioning state as soon.
 How do you know how long your design is going to last?  Is there any way we can predict how long it will work?  Why do Reliability Engineers get paid.
An Application of Probability to
1 Component reliability Jørn Vatn. 2 The state of a component is either “up” or “down” T 1, T 2 and T 3 are ”Uptimes” D 1 and D 2 are “Downtimes”
Fault-Tolerant Computing Systems #4 Reliability and Availability
EML EML 4550: Engineering Design Methods Probability and Statistics in Engineering Design: Reliability Class Notes Hyman: Chapter 5.
Reliability and availability considerations for CLIC modulators Daniel Siemaszko OUTLINE : Give a specification on the availability of the powering.
L Berkley Davis Copyright 2009 MER035: Engineering Reliability Lecture 6 1 MER301: Engineering Reliability LECTURE 6: Chapter 3: 3.9, 3.11 and Reliability.
Reliability Failure rates Reliability
Stracener_EMIS 7305/5305_Spr08_ Systems Reliability Modeling & Analysis Series and Active Parallel Configurations Dr. Jerrell T. Stracener, SAE.
Unit-3 Reliability concepts Presented by N.Vigneshwari.
SYSTEMS RELIABILTY 1. SYSTEMS are basically built of different components and /or subsystems. For each component, there is an assigned role in the system.
Mean Time To Repair
Stracener_EMIS 7305/5305_Spr08_ Systems Availability Modeling & Analysis Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7305/5305.
Part.2.1 In The Name of GOD FAULT TOLERANT SYSTEMS Part 2 – Canonical Structures Chapter 2 – Hardware Fault Tolerance.
CS203 – Advanced Computer Architecture Dependability & Reliability.
Prof. Enrico Zio Availability of Systems Prof. Enrico Zio Politecnico di Milano Dipartimento di Energia.
1 Introduction to Engineering Spring 2007 Lecture 16: Reliability & Probability.
Continuity - Basic Concepts CSC/ECE 772: Survivable Networks Spring, 2009, Rudra Dutta.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
More on Exponential Distribution, Hypo exponential distribution
Most people will have some concept of what reliability is from everyday life, for example, people may discuss how reliable their washing machine has been.
Fault-Tolerant Computing Systems #5 Reliability and Availability2
TIME TO FAILURE AND ITS PROBABILITY DISTRIBUTIONS
Network Survivability
Reliability Failure rates Reliability
Reliability.
T305: Digital Communications
Dept. of Electrical & Computer engineering
Reliability Engineering
Reliability.
RELIABILITY Reliability is -
Chapter 6 Time dependent reliability of components and system
سیستم های تحمل پذیر خرابی
Definitions Cumulative time to failure (T): Mean life:
Presentation transcript:

Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002, 2003 E E Module 2 ( Version for book website )

E E 681 Lecture #2 © Wayne D. Grover 2002, Overview of the lecture Concept of Reliability –Reliability function, Failure density function, hazard rate Concept of Availability: –Availability function, unavailability, availability of elements in series/parallel Methodology for Availability Analysis –Quick Unavailability Lower bound estimation –Cut sets method –Tie paths method Automatic Protection Switching (APS) Systems –Principle –Availability Analysis of an APS system

E E 681 Lecture #2 © Wayne D. Grover 2002, Reliability is a mission-oriented question Technical meaning of Reliability In everyday English: –“My car is very reliable”  It works well, it starts every time (even at -30°). Technical meaning: –Reliability is the probability of a device performing its purpose adequately for the period of time intended under the operating conditions intended. –Example: Reliability of a fuel-pump during a rocket launch

E E 681 Lecture #2 © Wayne D. Grover 2002, –Q(t) = probability { at least one failure in interval [0,t] } –Q(t) = 1 - R(t) R(t) t (R(t) is always a non- increasing function) Reliability The reliability function R(t): –R(t) = probability { no failure in interval [0,t] } R(0) = 1 R(  ) = 0 1

E E 681 Lecture #2 © Wayne D. Grover 2002, f(t) can be seen as the pdf of time to next failure Reliability R(t) = prob { no failure in [0,t] } Related function: failure density function, f(t)

E E 681 Lecture #2 © Wayne D. Grover 2002, Rate of failures Given that the element has survived this long Reliability Hazard rate (t) (age specific failure rate) : Rate of failure of an element given that this element has survived this long integration

E E 681 Lecture #2 © Wayne D. Grover 2002, TTF 1 Failure0t Reliability Expected Time To Failure or Mean Time To Failure (MTTF): –It is the expected value of the random variable with pdf f(t):

E E 681 Lecture #2 © Wayne D. Grover 2002, Reliability Special case: constant hazard rate (memoryless) –In this case we can apply the Poisson distribution:

E E 681 Lecture #2 © Wayne D. Grover 2002, Reliability Numerical example: –Poisson distribution with = 1 / (5 years) Probability of 1 failure in the first year: P = 16.4% Probability of at least one failure in the first year: P = 18.1% Probability of 1 failure in the first 5 years: P = 36.8% Probability of at least one failure in the first 5 years: P = 63.2%

E E 681 Lecture #2 © Wayne D. Grover 2002, “What is the probability that the engine of a formula 1 car will work during the whole race?” –This is a reliability question “How often do I hear the dial tone when I pick up the phone?” –This is an availability question Availability is the probability of finding the system in the operating state at any arbitrary time in the future Unlike in the context of reliability we now consider systems that can be repaired Availability (Repairable systems)

E E 681 Lecture #2 © Wayne D. Grover 2002, Region 1: R(t) and A(t) are the same Region 2: Repair actions begin to hold up A Region 3: A reaches a steady state Availability Comparison of Availability and Reliability Functions:

E E 681 Lecture #2 © Wayne D. Grover 2002, Time to Repair Time to Failure Time to Repair Time Between Failures t MTTF: Mean-Time To Failure MTBF: Mean-Time Between Failures MTTR: Mean-Time To Repair FailureRepair Failure Availability

E E 681 Lecture #2 © Wayne D. Grover 2002, In availability analysis we usually work with unavailability quantities because of some simplifications that can be done on the unavailability of elements in series and in parallel FIT: Unit corresponding to 1 failure in 10 9 hours 1 FIT = 1 failure in 114,155 years 1 failure / year = 114,155 FITS (high!) Typical value for telecom equipment: 1500 FITS ( MTTF = 76 years ) Availability Unavailability:

E E 681 Lecture #2 © Wayne D. Grover 2002, n n Approximation based on the fact that U i << 1 Numerical examp. U i = 10 -3, n = 3  U s =  U s = Availability Series elements unavailability reduction: Parallel elements unavailability reduction:

E E 681 Lecture #2 © Wayne D. Grover 2002, Availability Analysis The reliability engineer can use different techniques to evaluate the availability of a system: – 1) Quick estimate of a lower bound for the unavailability –2) Series and parallel unavailability reductions –3) Cut set method –4) Tie paths method –5) Conditional decomposition The general methodology is explained next…

E E 681 Lecture #2 © Wayne D. Grover 2002, Availability Analysis General Methodology: 1) Get unavailability values of all components and sub-systems. 2) Draw parallel and series availability relationships 3) Reduce the system availability model by repeated applications of the parallel/series availability simplifications. 4) If not completely reduced, do quick unavailability lower bound estimation, use the tie paths method, the cut sets method or the conditional decomposition

E E 681 Lecture #2 © Wayne D. Grover 2002, A B C D E F G H Lower bound of U s : U A +U H Availability Analysis Lower bound on unavailability –The contributions of parallel elements to the unavailability is not taken into account –In some cases this quick evaluation of a lower bound on U can be enough to conclude that the system does not meet the availability requirements

E E 681 Lecture #2 © Wayne D. Grover 2002, A B 4 I O Availability Analysis Tie paths method: –We enumerate all the paths from I to O 8 || A B IO –The availability of each paths is calculated: –The availability of the system is:

E E 681 Lecture #2 © Wayne D. Grover 2002, Availability Analysis Cut sets method: –Which combinations of element failures can bring the system down? –The probability of each cut is calculated: – The availability of the system is : 1 A B 4

E E 681 Lecture #2 © Wayne D. Grover 2002, A syst 1 A syst 2 A d low Availability Analysis Conditional decomposition (High Unavailability Elements): –When some elements have high U, it becomes less acceptable to sum unavailabilities. –Solution: Conditional decomposition: –The availability of the system is :

E E 681 Lecture #2 © Wayne D. Grover 2002, Automatic Protection Switching (APS) Systems Basic idea: –to provide a standby transmission channel that is kept in fully operating condition and used to replace any of the other traffic bearing channels in the event of their failure Characteristics of an APS system: –spare to working ratio: ‘1-to-1’ or ‘1-to-N’ –co-routed / diversely routed: ‘1-to-1’ or ‘1-to-1 /DP’ ‘1-to-N’ or ‘1-to-N /DP’ –1+1 or 1:1: ‘1+1’: Signal always sent on the spare channel ‘1:1’: Signal sent on spare channel upon failure of the working channel

E E 681 Lecture #2 © Wayne D. Grover 2002, For Head End Bridge(HEB) and Tail End Transfer(TET): Mode 1 failure: working signal is not relayed Mode 2 failure: no bridging or transfer to/from spare channel Automatic Protection Switching (APS) Systems 1:N APS system:

E E 681 Lecture #2 © Wayne D. Grover 2002, Automatic Protection Switching (APS) Systems Cut sets approach to 1:N APS availability analysis: –Combinations creating outage for a specific channel (cut sets): Cut set 1: Failure that channel with prior failure of at least one other working channel Cut set 2: Failure of that working channel plus the spare channel or head end bridge or tail end transfer in mode 2 Cut set 3: Failure of head end bridge or tail end transfer in mode 1 –The probability of each cut set is: Cut set 1: U w  (N-1)U w  0.5 Cut set 2: U w  (U s + U b2 + U t2 ) Cut set 3: U b1 + U t1

E E 681 Lecture #2 © Wayne D. Grover 2002, O(U) A B O(Uc)O(Uc) U A = U B = U C = U S = U A U B + 2 U C  2 U C  c c Automatic Protection Switching (APS) Systems 1:N APS Unavailability –The unavailability of a channel is: –The term in O(U) reflects the irreducible series-availability elements: the HEB and the TET in their mode 1 failure. It is impossible to make a perfectly redundant system. There is always some parallelism-accessing device c that brings series unavailability contribution

E E 681 Lecture #2 © Wayne D. Grover 2002, Summary Reliability is a mission oriented question for non-repairable systems In telecom engineering we are interested in the availability of the system designed There are several techniques that can be used for availability analysis. The one we will use in the rest of the course is the algebraic approach (equivalent to cut sets) APS is a protection scheme that enhances availability by providing a spare channel for restoration of failed working channels