Download presentation
Presentation is loading. Please wait.
Published byAustin O’Connor’ Modified over 9 years ago
1
Critical Systems Development IS301 – software Engineering Lecture #19 – 2003-10-28 M. E. Kabay, PhD, CISSP Dept of Computer Information Systems Norwich University mkabay@norwich.edu
2
2 Copyright © 2003 M. E. Kabay. All rights reserved. First, take a deep breath. You are about to enter the fire-hose zone.
3
3 Copyright © 2003 M. E. Kabay. All rights reserved. Acknowledgement Most of the material in this presentation is based directly on slides kindly provided by Prof. Ian Sommerville on his Web site at http://www.software-engin.com Used with Sommerville’s permission as extended by him for all non-commercial educational use Copyright in Kabay’s name applies solely to appearance and minor changes in Sommerville’s work or to original materials and is used solely to prevent commercial exploitation of this material
4
4 Copyright © 2003 M. E. Kabay. All rights reserved. Dependable Software Development Programming techniques for building dependable software systems.
5
5 Copyright © 2003 M. E. Kabay. All rights reserved. Software Dependability In general, software customers expect all software to be dependable For non-critical applications, may be willing to accept some system failures Some applications have very high dependability requirements Special programming techniques req’d
6
6 Copyright © 2003 M. E. Kabay. All rights reserved. Dependability Achievement Fault avoidance Software developed so Human error avoided and System faults minimised Development process organised so Faults in software detected and Repaired before delivery to customer Fault tolerance Software designed so Faults in delivered software do not result in system failure
7
7 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Minimisation Current methods of software engineering now allow for production of fault-free software Fault-free software means it conforms to its specification Does NOT mean software which will always perform correctly Why not? Because of specificatio n errors.
8
8 Copyright © 2003 M. E. Kabay. All rights reserved. Cost of Producing Fault-Free Software (1) Very high Cost-effective only in exceptional situations Which? May be cheaper to accept software faults But who will bear costs? Users? Manufacturers? Both? Will the risk-sharing be with full knowledge?
9
9 Copyright © 2003 M. E. Kabay. All rights reserved. Cost of Producing Fault-Free Software (2) The Pareto Principle Costs Total % of Errors Fixed 20% 80% 100% If curve really is asymptotic to 100%, cost may approach
10
10 Copyright © 2003 M. E. Kabay. All rights reserved. Cost of Producing Fault-Free Software (3) Many Few Very few Number of residual errors Cost per error detected
11
11 Copyright © 2003 M. E. Kabay. All rights reserved. Fault-Free Software Development Needs precise (preferably formal) specification Requires organizational commitment to quality Information hiding and encapsulation in software design essential Use programming language with strict typing and run-time checking Avoid error-prone constructs Use dependable and repeatable development process
12
12 Copyright © 2003 M. E. Kabay. All rights reserved. Structured Programming First discussed in 1970's Programming without goto While loops and if statements as only control statements Top-down design Important because it promoted thought and discussion about programming Programs easier to read and understand than old spaghetti code
13
13 Copyright © 2003 M. E. Kabay. All rights reserved. Error-Prone Constructs (1) Floating-point numbers Inherently imprecise – and machine- dependent Imprecision may lead to invalid comparisons Pointers Pointers referring to wrong memory as can corrupt data Aliasing can make programs difficult to understand and change Dynamic memory allocation Run-time allocation can cause memory overflow
14
14 Copyright © 2003 M. E. Kabay. All rights reserved. Error-Prone Constructs (2) Parallelism Can result in subtle timing errors (race conditions) because of unforeseen interaction between parallel processes Recursion Errors in recursion can cause memory overflow Interrupts Interrupts can cause critical operation to be terminated and make program difficult to understand Similar to goto statements
15
15 Copyright © 2003 M. E. Kabay. All rights reserved. Error-Prone Constructs (3) Inheritance Code not localised Can result in unexpected behaviour when changes made Can be hard to understand Difficult to debug problems All of these constructs don’t have to be absolutely eliminated But must be used with great care
16
16 Copyright © 2003 M. E. Kabay. All rights reserved. Information Hiding Information should only be exposed to those parts of program which need to access it. Create objects or abstract data types which maintain state and operations on state Reduces faults: Less accidental corruption of information ‘Firewalls’ make problems less likely to spread to other parts of program All information localised: Programmer less likely to make errors Reviewers more likely to find errors
17
17 Copyright © 2003 M. E. Kabay. All rights reserved. Example: Queue Specification in Java interface Queue { public void put (Object o) ; public void remove (Object o) ; public int size () ; } //Queue Fig. 18.2, p. 398 Users can put, remove, query size But implementation of queue is concealed as private
18
18 Copyright © 2003 M. E. Kabay. All rights reserved. Example: Signal Declaration in Java Fig. 18.3, p. 398 Define constants as globals once Refer to signal.red, signal.green etc. Avoids risk of accidentally using wrong value in parm
19
19 Copyright © 2003 M. E. Kabay. All rights reserved. Reliable Software Processes Well-defined, repeatable software process: Reduces software faults Does not depend entirely on individual skills – can be enacted by different people Process activities should include significant verification and validation
20
20 Copyright © 2003 M. E. Kabay. All rights reserved. Process Validation Activities Requirements inspections Requirements management Model checking Design and code inspection Static analysis Test planning and management Configuration management also essential
21
21 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Tolerance Critical software systems must be fault tolerant System can continue operating in spite of software failure Fault tolerance required in High availability requirements or System failure costs very high Even “fault-free” systems need fault tolerance May be specification errors or Validation may be incorrect
22
22 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Tolerance Actions Fault detection Incorrect system state has occurred Damage assessment Identify parts of system state affected by fault Fault recovery Return to known safe state Fault repair Prevent recurrence of fault Identify underlying problem If not transient*, then fix errors of design, implementation, documentation or training that led to error E.g., hardwar e failure *
23
23 Copyright © 2003 M. E. Kabay. All rights reserved. Approaches to Fault Tolerance Defensive programming Programmers assume faults in code Check state after modifications to ensure consistency Fault-tolerant architectures HW & SW system architectures support redundancy and fault tolerance Controller detects problems and supports fault recovery Complementary rather than opposing techniques
24
24 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Detection (1) Strictly-typed languages E.g., Java and Ada Many errors trapped at compile-time Some classes of error can only be discovered at run-time Fault detection: Detecting erroneous system state Throwing exception To manage detected fault
25
25 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Detection (2) Preventative fault detection Check conditions before making changes If bad state detected, don’t make change Retrospective fault detection Check validity after system state has been changed Used when Incorrect sequence of correct actions leads to erroneous state or When preventative fault detection involves too much overhead
26
26 Copyright © 2003 M. E. Kabay. All rights reserved. Damage Assessment Analyse system state Judge extent of corruption caused by system failure Assess what parts of state space have been affected by failure Generally based on ‘validity functions’ Can be applied to state elements Assess if their value within allowed range
27
27 Copyright © 2003 M. E. Kabay. All rights reserved. Damage Assessment Techniques Checksums Used for damage assessment in data transmission Verify integrity after transmission Redundant pointers Check integrity of data structures E.g., databases Watch-dog timers Check for non-terminating processes If no response after certain time, there’s a problem
28
28 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Recovery Forward recovery Apply repairs to corrupted system state Domain knowledge required to compute possible state corrections Forward recovery usually application specific Backward recovery Restore system state to known safe state Simpler than forward recovery Details of safe state maintained and replaces corrupted system state
29
29 Copyright © 2003 M. E. Kabay. All rights reserved. Forward Recovery Data communications Add redundancy to coded data Use to repair data corrupted during transmission Redundant pointers E.g., doubly-linked lists Damaged list / file may be repaired if enough links are still valid Often used for database and filesystem repair
30
30 Copyright © 2003 M. E. Kabay. All rights reserved. Backward Recovery Transaction processing often uses conservative methods to avoid problems Complete computations, then apply changes Keep original data in buffers Periodic checkpoints allow system to 'roll- back' to correct state
31
31 Copyright © 2003 M. E. Kabay. All rights reserved. Key Points Fault tolerant software can continue in execution in presence of software faults Fault tolerance requires failure detection, damage assessment, recovery and repair Defensive programming approach to fault tolerance relies on inclusion of redundant checks in program Exception handling facilities simplify process of defensive programming
32
32 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Tolerant Architecture Defensive programming cannot cope with faults involve interactions between hardware and software Misunderstandings of requirements may mean checks and associated code incorrect Where systems have high availability requirements, specific architecture designed to support fault tolerance may be required. Must tolerate both hardware and software failure
33
33 Copyright © 2003 M. E. Kabay. All rights reserved. Hardware Fault Tolerance Depends on triple-modular redundancy (TMR) 3 replicated identical components Receive same input Outputs compared If 1 output different, is discarded Component failure assumed Assumes Most faults result from component failures Few design faults Low probability of simultaneous component failures
34
34 Copyright © 2003 M. E. Kabay. All rights reserved. Hardware Reliability With TMR
35
35 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Tolerant Software Architectures Assumptions of TMR for HW not true for software SW design flaws more common than in HW Cannot replicate same component in software: Would have common design faults Simultaneous component failure therefore virtually inevitable Software systems must therefore be diverse
36
36 Copyright © 2003 M. E. Kabay. All rights reserved. Design Diversity Different versions of system designed and implemented in different ways Ought to have different failure modes Different approaches to design (e.g object- oriented and function oriented) Implementation in different programming languages Use of different tools and development environments Use of different algorithms in implementation
37
37 Copyright © 2003 M. E. Kabay. All rights reserved. Software Analogies to TMR (1) N-version programming Same specification implemented In number of different versions By different teams All versions compute simultaneously Majority output selected Using voting system Most commonly-used approach E.g. in Airbus 320 control systems
38
38 Copyright © 2003 M. E. Kabay. All rights reserved. N-version programming (1)
39
39 Copyright © 2003 M. E. Kabay. All rights reserved. N-version Programming (2) Different system versions Designed and implemented by different teams Assume low probability of same mistakes Algorithms used should be different Problem: may not be different enough Some empirical evidence (research) Teams commonly misinterpret specifications in same way Choose same algorithms for their systems
40
40 Copyright © 2003 M. E. Kabay. All rights reserved. Software Analogies to TMR (2) Recovery blocks Explicitly different versions of same specification written and executed in sequence Acceptance test used to select output to be transmitted
41
41 Copyright © 2003 M. E. Kabay. All rights reserved. Recovery Blocks (1)
42
42 Copyright © 2003 M. E. Kabay. All rights reserved. Recovery Blocks (2) Force different algorithm to be used for each version so they reduce probability of common errors However, design of acceptance test difficult as it must be independent of computation used Problems with approach for real-time systems because of sequential operation of redundant versions
43
43 Copyright © 2003 M. E. Kabay. All rights reserved. Problems with Design Diversity Teams not culturally diverse so they tend to tackle problems in same way Characteristic errors Different teams make same mistakes Some parts of implementation more difficult than others so all teams tend to make mistakes in same place. Specification errors If error in specification, then reflected in all implementations Can be addressed to some extent by using multiple specification representations
44
44 Copyright © 2003 M. E. Kabay. All rights reserved. Specification Dependency Both approaches to software redundancy susceptible to specification errors. If specification incorrect, system could fail Also problem with hardware but software specifications usually more complex than hardware specifications and harder to validate Has been addressed in some cases by developing separate software specifications from same user specification
45
45 Copyright © 2003 M. E. Kabay. All rights reserved. Software Redundancy Needed? Software faults not inevitable (unlike hardware faults = inevitable consequence of physical world) (WHY?) Reducing software complexity may improve reliability and availability (WHY?) Redundant software much more complex Scope for range of additional errors Affect system reliability Ironically, caused by existence of fault- tolerance controllers
46
46 Copyright © 2003 M. E. Kabay. All rights reserved. Key Points Dependability in system can be achieved through fault avoidance and fault tolerance Some programming language constructs such as gotos, recursion and pointers inherently error-prone Data typing allows many potential faults to be trapped at compile time.
47
47 Copyright © 2003 M. E. Kabay. All rights reserved. Key Points Fault tolerant architectures rely on replicated hard and software components include mechanisms to detect faulty component and to switch it out of system N-version programming and recovery blocks two different approaches to designing fault- tolerant software architectures Design diversity essential for software redundancy
48
48 Copyright © 2003 M. E. Kabay. All rights reserved. Homework Study Chapter 18 in detail using SQ3R For next Tuesday 4 Nov 2003 Complete exercises 18.1, 18.3-18.5 & 18.7- 18.9 for 21 points For next class (Thursday 30 Oct 2003), apply S-Q to chapter 19 on Verification and Validation. This is increasingly hard stuff, so STUDY before coming to class. OPTIONAL: For 3 extra points per question, answer any of questions 18.2, 18.6, 18.10 and 18.11 by the 11 Nov.
49
49 Copyright © 2003 M. E. Kabay. All rights reserved. DISCUSSION
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.