Download presentation
Presentation is loading. Please wait.
1
Critical Systems Development
IS301 – Software Engineering Lecture #27 – M. E. Kabay, PhD, CISSP Assoc. Prof. Information Assurance Division of Business & Management, Norwich University V: M. E. Kabay, PhD, CISSP Copyright © 2004 M. E. Kabay. All rights reserved.
2
First, take a deep breath. You are about to enter the fire-hose zone.
3
Objectives To explain how fault tolerance and fault avoidance contribute to the development of dependable systems To describe characteristics of dependable software processes To introduce programming techniques for fault avoidance To describe fault tolerance mechanisms and their use of diversity and redundancy
4
Topics covered Dependable processes Dependable programming
Fault tolerance Fault tolerant architectures
5
Dependable Software Development
Programming techniques for building dependable software systems. Copyright © 2004 M. E. Kabay. All rights reserved.
6
Software Dependability
In general, software customers expect all software to be dependable For non-critical applications, may be willing to accept some system failures Some applications have very high dependability requirements Special programming techniques req’d Copyright © 2004 M. E. Kabay. All rights reserved.
7
Dependability Achievement
Fault avoidance Software developed so Human error avoided and System faults minimized Development process organized so Faults in software detected and Repaired before delivery to customer Fault tolerance Software designed so Faults in delivered software do not result in system failure Copyright © 2004 M. E. Kabay. All rights reserved.
8
Diversity and Redundancy
Keep more than 1 version of a critical component available so that if one fails then a backup is available. Diversity Provide the same functionality in different ways so that they will not fail in the same way. However, adding diversity and redundancy adds complexity and this can increase the chances of error. Some engineers advocate simplicity and extensive verification & validation (V&V) as a more effective route to software dependability.
9
Diversity and Redundancy Examples
Where availability is critical (e.g. in e-commerce systems), companies normally keep backup servers and switch to these automatically if failure occurs. Diversity. To provide resilience against external attacks, different servers may be implemented using different operating systems (e.g. Windows and Linux)
10
Because of specification errors.
Fault Minimization Current methods of software engineering now allow for production of fault-free software Fault-free software means it conforms to its specification Does NOT mean software which will always perform correctly Why not? Because of specification errors. Copyright © 2004 M. E. Kabay. All rights reserved.
11
Cost of Producing Fault-Free Software (1)
Very high Cost-effective only in exceptional situations Which? May be cheaper to accept software faults But who will bear costs? Users? Manufacturers? Both? Will the risk-sharing be with full knowledge?
12
Cost of Producing Fault-Free Software (2)
The Pareto Principle Costs Total % of Errors Fixed 20% 80% 100% If curve really is asymptotic to 100%, cost may approach
13
Cost of Producing Fault-Free Software (3)
Just a different way of looking at it. Many Few Very few Number of residual errors Cost per error detected
14
Validation activities
Requirements inspections. Requirements management. Model checking. Design and code inspection. Static analysis. Test planning and management. Configuration management, discussed in Chapter 29, is also essential. Copyright © 2004 M. E. Kabay. All rights reserved.
15
Safe Programming Faults in programs are usually a consequence of programmers making mistakes. These mistakes occur because people lose track of the relationships among program variables. Some programming constructs are more error-prone than others so avoiding their use reduces programmer mistakes.
16
Fault-Free Software Development
Needs precise (preferably formal) specification Requires organizational commitment to quality Information hiding and encapsulation in software design essential Use programming language with strict typing and run-time checking Avoid error-prone constructs Use dependable and repeatable development process Copyright © 2004 M. E. Kabay. All rights reserved.
17
Structured Programming
First discussed in 1970's Programming without goto While loops and if statements as only control statements Top-down design Important because it promoted thought and discussion about programming Programs easier to read and understand than old spaghetti code Copyright © 2004 M. E. Kabay. All rights reserved.
18
Error-Prone Constructs (1)
Floating-point numbers Inherently imprecise – and machine-dependent Imprecision may lead to invalid comparisons Pointers Pointers referring to wrong memory as can corrupt data Aliasing can make programs difficult to understand and change Dynamic memory allocation Run-time allocation can cause memory overflow Copyright © 2004 M. E. Kabay. All rights reserved.
19
Error-Prone Constructs (2)
Parallelism Can result in subtle timing errors (race conditions) because of unforeseen interaction between parallel processes Recursion Errors in recursion can cause memory overflow Interrupts Interrupts can cause critical operation to be terminated and make program difficult to understand Similar to goto statements Copyright © 2004 M. E. Kabay. All rights reserved.
20
Error-Prone Constructs (3)
Inheritance Code not localized Can result in unexpected behavior when changes made Can be hard to understand Difficult to debug problems All of these constructs don’t have to be absolutely eliminated But must be used with great care
21
Reliable Software Processes
Well-defined, repeatable software process: Reduces software faults Does not depend entirely on individual skills – can be enacted by different people Process activities should include significant verification and validation Copyright © 2004 M. E. Kabay. All rights reserved.
22
Process Validation Activities
Requirements inspections Requirements management Model checking Design and code inspection Static analysis Test planning and management Configuration management also essential Copyright © 2004 M. E. Kabay. All rights reserved.
23
Copyright © 2004 M. E. Kabay. All rights reserved.
Fault Tolerance Critical software systems must be fault tolerant System can continue operating in spite of software failure Fault tolerance required in High availability requirements or System failure costs very high Even “fault-free” systems need fault tolerance May be specification errors or Validation may be incorrect Copyright © 2004 M. E. Kabay. All rights reserved.
24
Fault Tolerance Actions
Fault detection Incorrect system state has occurred Damage assessment Identify parts of system state affected by fault Fault recovery Return to known safe state Fault repair Prevent recurrence of fault Identify underlying problem If not transient*, then fix errors of design, implementation, documentation or training that led to error E.g., hardware failure * Copyright © 2004 M. E. Kabay. All rights reserved.
25
Approaches to Fault Tolerance
Defensive programming Programmers assume faults in code Check state after modifications to ensure consistency Fault-tolerant architectures HW & SW system architectures support redundancy and fault tolerance Controller detects problems and supports fault recovery Complementary rather than opposing techniques Copyright © 2004 M. E. Kabay. All rights reserved.
26
Copyright © 2004 M. E. Kabay. All rights reserved.
Fault Detection (1) Strictly-typed languages E.g., Java and Ada Many errors trapped at compile-time Some classes of error can only be discovered at run-time Fault detection: Detecting erroneous system state Throwing exception To manage detected fault Copyright © 2004 M. E. Kabay. All rights reserved.
27
Copyright © 2004 M. E. Kabay. All rights reserved.
Fault Detection (2) Preventative fault detection Check conditions before making changes If bad state detected, don’t make change Retrospective fault detection Check validity after system state has been changed Used when Incorrect sequence of correct actions leads to erroneous state or When preventative fault detection involves too much overhead Copyright © 2004 M. E. Kabay. All rights reserved.
28
Copyright © 2004 M. E. Kabay. All rights reserved.
Damage Assessment Analyze system state Judge extent of corruption caused by system failure Assess what parts of state space have been affected by failure Generally based on ‘validity functions’ Can be applied to state elements Assess if their value within allowed range Copyright © 2004 M. E. Kabay. All rights reserved.
29
Damage Assessment Techniques
Checksums Used for damage assessment in data transmission Verify integrity after transmission Redundant pointers Check integrity of data structures E.g., databases Watch-dog timers Check for non-terminating processes If no response after certain time, there’s a problem Copyright © 2004 M. E. Kabay. All rights reserved.
30
Copyright © 2004 M. E. Kabay. All rights reserved.
Fault Recovery Forward recovery Apply repairs to corrupted system state Domain knowledge required to compute possible state corrections Forward recovery usually application specific Backward recovery Restore system state to known safe state Simpler than forward recovery Details of safe state maintained and replaces corrupted system state Copyright © 2004 M. E. Kabay. All rights reserved.
31
Copyright © 2004 M. E. Kabay. All rights reserved.
Forward Recovery Data communications Add redundancy to coded data Use to repair data corrupted during transmission Redundant pointers E.g., doubly-linked lists Damaged list / file may be repaired if enough links are still valid Often used for database and file system repair Copyright © 2004 M. E. Kabay. All rights reserved.
32
Copyright © 2004 M. E. Kabay. All rights reserved.
Backward Recovery Transaction processing often uses conservative methods to avoid problems Complete computations, then apply changes Keep original data in buffers Periodic checkpoints allow system to 'roll-back' to correct state Copyright © 2004 M. E. Kabay. All rights reserved.
33
Copyright © 2004 M. E. Kabay. All rights reserved.
Recovery Blocks (1) Copyright © 2004 M. E. Kabay. All rights reserved.
34
Copyright © 2004 M. E. Kabay. All rights reserved.
Recovery Blocks (2) Force different algorithm to be used for each version so they reduce probability of common errors However, design of acceptance test difficult as it must be independent of computation used Problems with approach for real-time systems because of sequential operation of redundant versions Copyright © 2004 M. E. Kabay. All rights reserved.
35
Homework Study Chapter 18 in detail using SQ3R Required
By next Wed 10 Nov 2004 For 30 points 20.1, 20.2, 20.4 – 20.6, 20.9 and pay attention to demands for examples OPTIONAL By Wed 17 Nov 2004 For up to 14 extra points, any or all of – details please 20.12 – detailed answers to all parts of this question
36
Copyright © 2004 M. E. Kabay. All rights reserved.
DISCUSSION Copyright © 2004 M. E. Kabay. All rights reserved.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.