Download presentation
Presentation is loading. Please wait.
Published byHollie Hoover Modified over 9 years ago
2
Practical Reports on Dependability
3
Manifestation of System Failure Site unavailability System exception /access violation Incorrect result Data loss/corruption Slow down
4
PAGE UNAVAILABLE
6
System Exception
7
Performance Slowdown
8
DOWNTIME 15% contribution
9
DOWNTIME unplanned 20 % planned 80 %
10
DOWNTIME
11
UNPLANNED DOWNTIME
14
Software Errors Triggers Resource exhaustion Logical errors System Overload Recovery code Failed upgrade
15
Logical Error
16
SYSTEM OVERLOAD
17
Operator Errors Triggers Configurational –Incorrect parameter setting Procedural –Omit/inncorect maintainance action Miscellaneous
18
FAILURE DURATION Short (minutes) Long (weeks) –Implies large fault chains FREQUENCY Permanent (down until problem fixed) Transient (resolves without intervention) Intermittent (trasient + occasional) SCOPE Entire system Parts of the System
19
Fault Chains ”the series of component failures that led up to a user- visible failure” Uncoupled –Independent failures Tightly Coupled –Cascading/corelated failure
20
Non-Malicious Software Failure Most Common Causes –Routine maintenance –Software upgrade –System integration Other Causes –System overload –Resource exaustsion –Complex fault tolerant routines
21
”ROUTINE” MAINTAINANCE Danske Bank 2003 –March 11: routine operation to replace a defective electrical unit in IBM DB2 disk system –System failure: Disks becomes inaccessable –6 hours later: system restarted –March 12: Batch systems running incorrectly –Three More errors discovered: 1.Recovery process on several tables won’t start 2.Recovery jobs won’t run symultaneously 3.Recovery jobs can’t reastablish data in tables –March 14: All data recovered and system functional
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.