Presentation is loading. Please wait.

Presentation is loading. Please wait.

Failure Mode Assumptions and Assumption Coverage David Powell.

Similar presentations


Presentation on theme: "Failure Mode Assumptions and Assumption Coverage David Powell."— Presentation transcript:

1 Failure Mode Assumptions and Assumption Coverage David Powell

2 Fault-Tolerance Key questions –How components may fail?  Prevention strategies –At what rate they may fail?  The Amount of redundancy needed –What are the important type of faults? Types of redundancy needed –The relation between dependability, redundancy and faults?  General FT design guidelines

3 An F-T Paradox/Dilemma More faulty  More redundancy  More redundancy  More possibility of faults  More possibility of faults???

4 Solution- Some Key Steps Classify, quantify and verify the assumptions

5 Type of Failures

6 Overview Single-user service –Service Model –Potential Errors Multiple-user service –Service Model –Potential Errors

7 Single-user Service Model Service items: s i, i=1,2,… Values of s i : vs i Observation time of s i : ts i Service Model: S i = S i = An omniscient observer

8 Correctness Model Service item s i is correct iff (vs i  SV i )  (ts i  ST i ) (vs i  SV i )  (ts i  ST i ) SV i and ST i are respectively the specified sets of values and times for service item s i

9 Potential Errors Arbitrary value error: s i : vs i  SV i Noncode error: s i : vs i  CV (CV defines a code) Arbitrary timing error: s i : ts i  ST i Early timing error: s i : ts i < min(ST i ) Late timing error: s i : ts i > max(ST i ) Omission error: s i : ts i =  Impromptu error: s i : (vs i =  )  (ts i =  )

10 Multi-user Service Model Service item s i ={s i (1), s i (2),…, s i (n),} Service model:, all i,u New issues: “consistency”

11 Correctness Model vs i (u)– the value of service item i on process u vs i -- the value of service item i SV i – the set of specified service item i ts i (u)– the observation time of service item i on process u ST i (u) – the range of specified observation time of service item i on process u uv -- the time bound of related occurrences uv -- the time bound of related occurrences

12 Examples of Potential Errors Consistent value error Consistent timing error Semi-consistent value error

13 Failure Mode Assumptions Attempt to formalize the concept of an assumed failure mode By assertions on the sequences of service items delivered by a component

14 Examples of Value Error Assertions No value errors occur (V none )  i, vs i  SV i  i, vs i  SV i The only value errors that occur are noncode value errors (V n )  i, (vs i  SV i )  (vs i  CV )  i, (vs i  SV i )  (vs i  CV ) Arbitrary value error can occur (V arb )  i, (vs i  SV i )  (vs i  SV i )  i, (vs i  SV i )  (vs i  SV i )

15 Examples of Timing Error Assertions No timing error occurs (T none ) The only timing errors are omission errors (T O ) The only timing errors are late timing errors (T L ) The only timing errors are early timing errors (T E ) Arbitrary timing error can occur (T arb ) Permanent omission/crash (T p ) Bounded omission degree (T Bk )

16 Timing Error Implications

17 Failure Mode Assertions(FMA) A complete FMA entails an assertion on errors occurring on both value and time domains By taking the Cartesian production of the two domains, we get a family of FMA

18 FMA Implication Graph

19 So what? The FMA classification and implication graph can serve as a guideline to design families of FT algorithms that can process errors in increasing severity!

20 Assumption Coverage Establishing a link between assumed component failure mode and system dependability (The design a FT system relies on the assumption they make) (The dependability of a FT system is related to the failure mode they assume)

21 Motivation Components may fail They may fail in a bad way  leads to a violation of assumptions of the system The system, in turn, can fail Question: to what degree can a component FMA prove to be true in the real system?

22 The Coverage of the Assumption Definition P(X) = Pr{ X= true | component failed} P(X) = Pr{ X= true | component failed} P(V arb  T arb ) = 1 P(V none  T none ) = 0

23 Coverage of an FT system PS(X) = PS(X) = Pr{ correct error processing |X= true} Pr{ correct error processing |X= true} *Pr{ X= true | component failed} *Pr{ X= true | component failed}

24 Influence of Assumption Coverage on System Dependability A Case Study

25 The System A system of n processors Connected via unidirectional message-passing bus Each processor carries out the same computation steps The result of each processing step is communicated to all other processors Each process has a decision function (DF) The DF is applied to the results received from other processors … Each processor and its associated bus is viewed as a single component

26 Fail-Silent Processor-bus A fail-silent processor –Only has semi-consistent value errors –Always produces message on time –Or ceases to produce messages forever –If a message is delivered to a processor, it is to be delivered to all processors with consistent fixed delay

27 Fail-Consistent Processor Bus Only semi-consistent value errors may occur Faulty processors may send erroneous values Consistent timing error may occur

28 Fail-uncontrolled Processor Bus Arbitrary timing error Arbitrary value error

29 Implications of Assumption Coverage Failure mode relations Coverage relations

30 Dependability Expressions From Markov Models r = e –λt λ = failure rate

31 A Life-critical Application System reliability objective: R > 1-10 -9 over 10 hours Single processor reliability: –r = e -λt –1/λ = 5 years

32

33 A Money-Critical Application It is about availability of the system rather than reliability of the system Please look at the paper for more details

34 Unavailability v.s. Coverage

35 Conclusion A formalism for describing component failure modes Multiplicity of value and timing errors The notion of assumption coverage The relation between dependability, availability and assumption coverage

36 Thank you


Download ppt "Failure Mode Assumptions and Assumption Coverage David Powell."

Similar presentations


Ads by Google