Presentation is loading. Please wait.

Presentation is loading. Please wait.

Practical Reports on Dependability Manifestation of System Failure Site unavailability System exception /access violation Incorrect result Data loss/corruption.

Similar presentations


Presentation on theme: "Practical Reports on Dependability Manifestation of System Failure Site unavailability System exception /access violation Incorrect result Data loss/corruption."— Presentation transcript:

1

2 Practical Reports on Dependability

3 Manifestation of System Failure Site unavailability System exception /access violation Incorrect result Data loss/corruption Slow down

4 PAGE UNAVAILABLE

5

6 System Exception

7 Performance Slowdown

8 DOWNTIME 15% contribution

9 DOWNTIME unplanned 20 % planned 80 %

10 DOWNTIME

11 UNPLANNED DOWNTIME

12

13

14 Software Errors Triggers Resource exhaustion Logical errors System Overload Recovery code Failed upgrade

15 Logical Error

16 SYSTEM OVERLOAD

17 Operator Errors Triggers Configurational –Incorrect parameter setting Procedural –Omit/inncorect maintainance action Miscellaneous

18 FAILURE DURATION Short (minutes) Long (weeks) –Implies large fault chains FREQUENCY Permanent (down until problem fixed) Transient (resolves without intervention) Intermittent (trasient + occasional) SCOPE Entire system Parts of the System

19 Fault Chains ”the series of component failures that led up to a user- visible failure” Uncoupled –Independent failures Tightly Coupled –Cascading/corelated failure

20 Non-Malicious Software Failure Most Common Causes –Routine maintenance –Software upgrade –System integration Other Causes –System overload –Resource exaustsion –Complex fault tolerant routines

21 ”ROUTINE” MAINTAINANCE Danske Bank 2003 –March 11: routine operation to replace a defective electrical unit in IBM DB2 disk system –System failure: Disks becomes inaccessable –6 hours later: system restarted –March 12: Batch systems running incorrectly –Three More errors discovered: 1.Recovery process on several tables won’t start 2.Recovery jobs won’t run symultaneously 3.Recovery jobs can’t reastablish data in tables –March 14: All data recovered and system functional


Download ppt "Practical Reports on Dependability Manifestation of System Failure Site unavailability System exception /access violation Incorrect result Data loss/corruption."

Similar presentations


Ads by Google