Critical Systems Development IS301 – software Engineering Lecture #19 – 2003-10-28 M. E. Kabay, PhD, CISSP Dept of Computer Information Systems Norwich.

Critical Systems Development IS301 – software Engineering Lecture #19 – 2003-10-28 M. E. Kabay, PhD, CISSP Dept of Computer Information Systems Norwich University mkabay@norwich.edu

3 Copyright © 2003 M. E. Kabay. All rights reserved. Acknowledgement  Most of the material in this presentation is based directly on slides kindly provided by Prof. Ian Sommerville on his Web site at http://www.software-engin.com  Used with Sommerville’s permission as extended by him for all non-commercial educational use  Copyright in Kabay’s name applies solely to appearance and minor changes in Sommerville’s work or to original materials and is used solely to prevent commercial exploitation of this material

5 Copyright © 2003 M. E. Kabay. All rights reserved. Software Dependability  In general, software customers expect all software to be dependable  For non-critical applications, may be willing to accept some system failures  Some applications have very high dependability requirements  Special programming techniques req’d

6 Copyright © 2003 M. E. Kabay. All rights reserved. Dependability Achievement  Fault avoidance  Software developed so Human error avoided and System faults minimised  Development process organised so Faults in software detected and Repaired before delivery to customer  Fault tolerance  Software designed so Faults in delivered software do not result in system failure

7 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Minimisation  Current methods of software engineering now allow for production of fault-free software  Fault-free software means it conforms to its specification  Does NOT mean software which will always perform correctly  Why not? Because of specificatio n errors.

8 Copyright © 2003 M. E. Kabay. All rights reserved. Cost of Producing Fault-Free Software (1)  Very high  Cost-effective only in exceptional situations Which?  May be cheaper to accept software faults But who will bear costs?  Users?  Manufacturers?  Both? Will the risk-sharing be with full knowledge?

9 Copyright © 2003 M. E. Kabay. All rights reserved. Cost of Producing Fault-Free Software (2) The Pareto Principle Costs Total % of Errors Fixed 20% 80% 100% If curve really is asymptotic to 100%, cost may approach 

11 Copyright © 2003 M. E. Kabay. All rights reserved. Fault-Free Software Development  Needs precise (preferably formal) specification  Requires organizational commitment to quality  Information hiding and encapsulation in software design essential  Use programming language with strict typing and run-time checking  Avoid error-prone constructs  Use dependable and repeatable development process

12 Copyright © 2003 M. E. Kabay. All rights reserved. Structured Programming  First discussed in 1970's  Programming without goto  While loops and if statements as only control statements  Top-down design  Important because it promoted thought and discussion about programming  Programs easier to read and understand than old spaghetti code

13 Copyright © 2003 M. E. Kabay. All rights reserved. Error-Prone Constructs (1)  Floating-point numbers  Inherently imprecise – and machine- dependent  Imprecision may lead to invalid comparisons  Pointers  Pointers referring to wrong memory as can corrupt data  Aliasing can make programs difficult to understand and change  Dynamic memory allocation  Run-time allocation can cause memory overflow

14 Copyright © 2003 M. E. Kabay. All rights reserved. Error-Prone Constructs (2)  Parallelism  Can result in subtle timing errors (race conditions) because of unforeseen interaction between parallel processes  Recursion  Errors in recursion can cause memory overflow  Interrupts  Interrupts can cause critical operation to be terminated and make program difficult to understand  Similar to goto statements

15 Copyright © 2003 M. E. Kabay. All rights reserved. Error-Prone Constructs (3)  Inheritance  Code not localised  Can result in unexpected behaviour when changes made  Can be hard to understand  Difficult to debug problems  All of these constructs don’t have to be absolutely eliminated  But must be used with great care

16 Copyright © 2003 M. E. Kabay. All rights reserved. Information Hiding  Information should only be exposed to those parts of program which need to access it.  Create objects or abstract data types which maintain state and operations on state  Reduces faults:  Less accidental corruption of information  ‘Firewalls’ make problems less likely to spread to other parts of program  All information localised: Programmer less likely to make errors Reviewers more likely to find errors

17 Copyright © 2003 M. E. Kabay. All rights reserved. Example: Queue Specification in Java interface Queue { public void put (Object o) ; public void remove (Object o) ; public int size () ; } //Queue Fig. 18.2, p. 398 Users can put, remove, query size But implementation of queue is concealed as private

18 Copyright © 2003 M. E. Kabay. All rights reserved. Example: Signal Declaration in Java Fig. 18.3, p. 398 Define constants as globals once Refer to signal.red, signal.green etc. Avoids risk of accidentally using wrong value in parm

19 Copyright © 2003 M. E. Kabay. All rights reserved. Reliable Software Processes  Well-defined, repeatable software process:  Reduces software faults  Does not depend entirely on individual skills – can be enacted by different people  Process activities should include significant verification and validation

20 Copyright © 2003 M. E. Kabay. All rights reserved. Process Validation Activities  Requirements inspections  Requirements management  Model checking  Design and code inspection  Static analysis  Test planning and management  Configuration management also essential

21 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Tolerance  Critical software systems must be fault tolerant  System can continue operating in spite of software failure  Fault tolerance required in  High availability requirements or  System failure costs very high  Even “fault-free” systems need fault tolerance  May be specification errors or  Validation may be incorrect

22 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Tolerance Actions  Fault detection  Incorrect system state has occurred  Damage assessment  Identify parts of system state affected by fault  Fault recovery  Return to known safe state  Fault repair  Prevent recurrence of fault  Identify underlying problem  If not transient*, then fix errors of design, implementation, documentation or training that led to error E.g., hardwar e failure *

23 Copyright © 2003 M. E. Kabay. All rights reserved. Approaches to Fault Tolerance  Defensive programming  Programmers assume faults in code  Check state after modifications to ensure consistency  Fault-tolerant architectures  HW & SW system architectures support redundancy and fault tolerance  Controller detects problems and supports fault recovery  Complementary rather than opposing techniques

24 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Detection (1)  Strictly-typed languages  E.g., Java and Ada  Many errors trapped at compile-time  Some classes of error can only be discovered at run-time  Fault detection:  Detecting erroneous system state  Throwing exception To manage detected fault

25 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Detection (2)  Preventative fault detection  Check conditions before making changes  If bad state detected, don’t make change  Retrospective fault detection  Check validity after system state has been changed  Used when Incorrect sequence of correct actions leads to erroneous state or When preventative fault detection involves too much overhead

26 Copyright © 2003 M. E. Kabay. All rights reserved. Damage Assessment  Analyse system state  Judge extent of corruption caused by system failure  Assess what parts of state space have been affected by failure  Generally based on ‘validity functions’  Can be applied to state elements  Assess if their value within allowed range

27 Copyright © 2003 M. E. Kabay. All rights reserved. Damage Assessment Techniques  Checksums  Used for damage assessment in data transmission  Verify integrity after transmission  Redundant pointers  Check integrity of data structures  E.g., databases  Watch-dog timers  Check for non-terminating processes  If no response after certain time, there’s a problem

28 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Recovery  Forward recovery  Apply repairs to corrupted system state  Domain knowledge required to compute possible state corrections  Forward recovery usually application specific  Backward recovery  Restore system state to known safe state  Simpler than forward recovery  Details of safe state maintained and replaces corrupted system state

29 Copyright © 2003 M. E. Kabay. All rights reserved. Forward Recovery  Data communications  Add redundancy to coded data  Use to repair data corrupted during transmission  Redundant pointers  E.g., doubly-linked lists  Damaged list / file may be repaired if enough links are still valid  Often used for database and filesystem repair

30 Copyright © 2003 M. E. Kabay. All rights reserved. Backward Recovery  Transaction processing often uses conservative methods to avoid problems  Complete computations, then apply changes  Keep original data in buffers  Periodic checkpoints allow system to 'roll- back' to correct state

31 Copyright © 2003 M. E. Kabay. All rights reserved. Key Points  Fault tolerant software can continue in execution in presence of software faults  Fault tolerance requires failure detection, damage assessment, recovery and repair  Defensive programming approach to fault tolerance relies on inclusion of redundant checks in program  Exception handling facilities simplify process of defensive programming

32 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Tolerant Architecture  Defensive programming cannot cope with faults involve interactions between hardware and software  Misunderstandings of requirements may mean checks and associated code incorrect  Where systems have high availability requirements, specific architecture designed to support fault tolerance may be required.  Must tolerate both hardware and software failure

33 Copyright © 2003 M. E. Kabay. All rights reserved. Hardware Fault Tolerance  Depends on triple-modular redundancy (TMR)  3 replicated identical components  Receive same input  Outputs compared  If 1 output different, is discarded  Component failure assumed  Assumes  Most faults result from component failures  Few design faults  Low probability of simultaneous component failures

35 Copyright © 2003 M. E. Kabay. All rights reserved. Fault Tolerant Software Architectures  Assumptions of TMR for HW not true for software  SW design flaws more common than in HW  Cannot replicate same component in software: Would have common design faults Simultaneous component failure therefore virtually inevitable  Software systems must therefore be diverse

36 Copyright © 2003 M. E. Kabay. All rights reserved. Design Diversity  Different versions of system designed and implemented in different ways  Ought to have different failure modes  Different approaches to design (e.g object- oriented and function oriented)  Implementation in different programming languages  Use of different tools and development environments  Use of different algorithms in implementation

37 Copyright © 2003 M. E. Kabay. All rights reserved. Software Analogies to TMR (1) N-version programming  Same specification implemented  In number of different versions  By different teams  All versions compute simultaneously  Majority output selected  Using voting system  Most commonly-used approach  E.g. in Airbus 320 control systems

39 Copyright © 2003 M. E. Kabay. All rights reserved. N-version Programming (2)  Different system versions  Designed and implemented by different teams  Assume low probability of same mistakes  Algorithms used should be different  Problem: may not be different enough  Some empirical evidence (research)  Teams commonly misinterpret specifications in same way  Choose same algorithms for their systems

40 Copyright © 2003 M. E. Kabay. All rights reserved. Software Analogies to TMR (2) Recovery blocks  Explicitly different versions of same specification written and executed in sequence  Acceptance test used to select output to be transmitted

42 Copyright © 2003 M. E. Kabay. All rights reserved. Recovery Blocks (2)  Force different algorithm to be used for each version so they reduce probability of common errors  However, design of acceptance test difficult as it must be independent of computation used  Problems with approach for real-time systems because of sequential operation of redundant versions

43 Copyright © 2003 M. E. Kabay. All rights reserved. Problems with Design Diversity  Teams not culturally diverse so they tend to tackle problems in same way  Characteristic errors  Different teams make same mistakes Some parts of implementation more difficult than others so all teams tend to make mistakes in same place.  Specification errors If error in specification, then reflected in all implementations Can be addressed to some extent by using multiple specification representations

44 Copyright © 2003 M. E. Kabay. All rights reserved. Specification Dependency  Both approaches to software redundancy susceptible to specification errors. If specification incorrect, system could fail  Also problem with hardware but software specifications usually more complex than hardware specifications and harder to validate  Has been addressed in some cases by developing separate software specifications from same user specification

45 Copyright © 2003 M. E. Kabay. All rights reserved. Software Redundancy Needed?  Software faults not inevitable (unlike hardware faults = inevitable consequence of physical world) (WHY?)  Reducing software complexity may improve reliability and availability (WHY?)  Redundant software much more complex  Scope for range of additional errors  Affect system reliability  Ironically, caused by existence of fault- tolerance controllers

46 Copyright © 2003 M. E. Kabay. All rights reserved. Key Points  Dependability in system can be achieved through fault avoidance and fault tolerance  Some programming language constructs such as gotos, recursion and pointers inherently error-prone  Data typing allows many potential faults to be trapped at compile time.

47 Copyright © 2003 M. E. Kabay. All rights reserved. Key Points  Fault tolerant architectures rely on replicated hard and software components  include mechanisms to detect faulty component and to switch it out of system  N-version programming and recovery blocks two different approaches to designing fault- tolerant software architectures  Design diversity essential for software redundancy

48 Copyright © 2003 M. E. Kabay. All rights reserved. Homework  Study Chapter 18 in detail using SQ3R  For next Tuesday 4 Nov 2003  Complete exercises 18.1, 18.3-18.5 & 18.7- 18.9 for 21 points  For next class (Thursday 30 Oct 2003), apply S-Q to chapter 19 on Verification and Validation. This is increasingly hard stuff, so STUDY before coming to class.  OPTIONAL:  For 3 extra points per question, answer any of questions 18.2, 18.6, 18.10 and 18.11 by the 11 Nov.

Critical Systems Development IS301 – software Engineering Lecture #19 – 2003-10-28 M. E. Kabay, PhD, CISSP Dept of Computer Information Systems Norwich.

Similar presentations

Presentation on theme: "Critical Systems Development IS301 – software Engineering Lecture #19 – 2003-10-28 M. E. Kabay, PhD, CISSP Dept of Computer Information Systems Norwich."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Critical Systems Development IS301 – software Engineering Lecture #19 – 2003-10-28 M. E. Kabay, PhD, CISSP Dept of Computer Information Systems Norwich.

Similar presentations

Presentation on theme: "Critical Systems Development IS301 – software Engineering Lecture #19 – 2003-10-28 M. E. Kabay, PhD, CISSP Dept of Computer Information Systems Norwich."— Presentation transcript:

Similar presentations

About project

Feedback