Dependable Software Development Lecture 7. System dependability For many computer-based systems, the most important system property is the dependability.

Slides:



Advertisements
Similar presentations
Accident and Incident Investigation
Advertisements

Verification and Validation
The Therac-25: A Software Fatal Failure
Software Engineering-II Sir zubair sajid. What’s the difference? Verification – Are you building the product right? – Software must conform to its specification.
An Investigation of the Therac-25 Accidents Nancy G. Leveson Clark S. Turner IEEE, 1993 Presented by Jack Kustanowitz April 26, 2005 University of Maryland.
Can We Trust the Computer? Case Study: The Therac-25 Based on Article in IEEE-Computer, July 1993.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 24 Slide 1 Critical Systems Validation 2.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
Figures – Chapter 12.
©Ian Sommerville 2000Software Engineering, 6th edition. Chapter 19Slide 1 Verification and Validation l Assuring that a software system meets a user's.
Soft. Eng. II, Spr. 2002Dr Driss Kettani, from I. Sommerville1 CSC-3325: Chapter 9 Title : Reliability Reading: I. Sommerville, Chap. 16, 17 and 18.
SWE Introduction to Software Engineering
Developing Dependable Systems CIS 376 Bruce R. Maxim UM-Dearborn.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 3 Slide 1 Critical Systems.
©Ian Sommerville 2004Software Engineering, 7th edition. Insulin Pump Slide 1 An automated insulin pump.
©Ian Sommerville 2004Software Engineering, 7th edition. Insulin Pump Slide 1 The portable insulin pump Developing a dependability specification for the.
The embedded control software for a personal insulin pump
©Ian Sommerville 2006Critical Systems Slide 1 Critical Systems Engineering l Processes and techniques for developing critical systems.
CIS 376 Bruce R. Maxim UM-Dearborn
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 10 Slide 1 Critical Systems Specification 3 Formal Specification.
1CMSC 345, Version 4/04 Verification and Validation Reference: Software Engineering, Ian Sommerville, 6th edition, Chapter 19.
An example of a critical system
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
CSCI 5801: Software Engineering
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
©Ian Sommerville 2000Software Engineering, 6th edition. Chapter 19Slide 1 Verification and Validation l Assuring that a software system meets a user's.
Chapter 1- “Diversity” “In higher education they value diversity of everything except thought.” George Will.
1 Chapter 2 Socio-technical Systems (Computer-based System Engineering)
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Verification and Validation.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Slide 1 Critical Systems Specification 2.
 CS 5380 Software Engineering Chapter 8 Testing.
1 Chapter 3 Critical Systems. 2 Objectives To explain what is meant by a critical system where system failure can have severe human or economic consequence.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 3 Slide 1 Critical Systems 1.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Quality Assurance.
Safety-Critical Systems 7 Summary T V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Slide 1 Critical Systems Specification 1.
Chapter 8 Lecture 1 Software Testing. Program testing Testing is intended to show that a program does what it is intended to do and to discover program.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
CS, AUHenrik Bærbak Christensen1 Critical Systems Sommerville 7th Ed Chapter 3.
©Ian Sommerville 2000Dependability Slide 1 Chapter 16 Dependability.
1 Software Engineering, 8th edition. Chapter 3 Courtesy: ©Ian Sommerville 2006 Sep 16, 2008 Lecture # 3 Critical Systems.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 3 Slide 1 Critical Systems.
Lecturer: Eng. Mohamed Adam Isak PH.D Researcher in CS M.Sc. and B.Sc. of Information Technology Engineering, Lecturer in University of Somalia and Mogadishu.
Verification vs. Validation Verification: "Are we building the product right?" The software should conform to its specification.The software should conform.
Laurea Triennale in Informatica – Corso di Ingegneria del Software I – A.A. 2006/2007 Andrea Polini XVII. Verification and Validation.
Critical Systems.
Verification and Validation
Verification and Validation
Critical Systems.
IS301 – Software Engineering V:
Critical Systems.
Critical Systems.
Presentation transcript:

Dependable Software Development Lecture 7

System dependability For many computer-based systems, the most important system property is the dependability of the system. The dependability of a system reflects: –The user’s degree of trust in that system. –The extent of the user’s confidence that it will operate as users expect –That it will not ‘fail’ in normal use. Dependability covers the related systems attributes of reliability, availability and security. These are all inter-dependent. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 19, MCS-NUST 2

Importance of dependability System failures may have widespread effects with large numbers of people affected by the failure. –Systems that are not dependable and are unreliable, unsafe or insecure may be rejected by their users. –The costs of system failure may be very high if the failure leads to economic losses or physical damage. –Undependable systems may cause information loss with a high consequent recovery cost. Causes of failure: –Hardware failure: Poor design and manufacturing errors –Software failure: errors in its specification, design or implementation. –Operational failure: perhaps the largest single cause of system failures in socio-technical systems Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 19, MCS-NUST 3

Principal dependability properties 4 Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST

Principal properties Availability –The probability that the system will be up and running and able to deliver useful services to users. Reliability –The probability that the system will correctly deliver services as expected by users. Safety –A judgment of how likely it is that the system will cause damage to people or its environment. Security –A judgment of how likely it is that the system can resist accidental or deliberate intrusions. 5 Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST

Other dependability properties Repairability –Reflects the extent to which the system can be repaired in the event of a failure Maintainability –Reflects the extent to which the system can be adapted to new requirements; Survivability –Reflects the extent to which the system can deliver services whilst under hostile attack; Error tolerance –Reflects the extent to which user input errors can be avoided and tolerated. 6 Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST

7 Software dependability In general, software customers expect all software to be dependable. –However, for non-critical applications, they may be willing to accept some system failures. Some applications, have very high dependability requirements and special software engineering techniques may be used to achieve this. Dependability achievement –Fault avoidance The system is developed in such a way that human error is avoided and thus system faults are minimised. The development process is organised so that faults in the system are detected and repaired before delivery to the customer. –Fault detection Verification and validation techniques are used to discover and remove faults in a system before it is deployed. –Fault tolerance The system is designed so that faults in the delivered software do not result in system failure. Software fault avoidance approaches include: Formal or precise specification practices, Programming disciplines like information hiding and encapsulation, Extensive and repetitive reviews and formal analyses during the development process rigorous testing software fault avoidance approaches include verification & validation, software testing, and proof methodology Software fault avoidance approaches include: Formal or precise specification practices, Programming disciplines like information hiding and encapsulation, Extensive and repetitive reviews and formal analyses during the development process rigorous testing software fault avoidance approaches include verification & validation, software testing, and proof methodology Formal methods are fault avoidance techniques that aim to increase dependability by eliminating errors at the requirements specification and design stages of development Formal methods are fault avoidance techniques that aim to increase dependability by eliminating errors at the requirements specification and design stages of development fault tolerance technique tries to keep the system operational despite the presence of faults. Since complete fault avoidance or elimination is not possible, a critical system always employs fault tolerance techniques to guarantee high system reliability and Availability fault tolerance technique tries to keep the system operational despite the presence of faults. Since complete fault avoidance or elimination is not possible, a critical system always employs fault tolerance techniques to guarantee high system reliability and Availability

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 8 Critical Systems Software failure is common, some time the failure can cause inconvenience but no serious damage, some times it does harm to the human life –Known as “critical system” Three types of critical systems are: –Safety-critical systems Failure may results in loss of life, injury or damage to the environment; –Chemical plant protection system; –Mission-critical systems Failure results in failure of some goal-directed activity; –Spacecraft navigation system; –Business-critical systems Failure results in high economic losses; –Customer accounting system in a bank; For critical systems, the most important system property is the dependability of the system

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 9 Safety-critical systems Safety-Critical systems: –Systems whose failure could result in loss of life, cause significant property damage or cause damage to the environment. –These systems must be designed in such a way as to guarantee system stability during all of the system operational modes. when a fatal fault occurs, the system safely shuts down. Applications –Computer based systems used in avionics, chemical process and nuclear power plants. A failure in the system endangers human lives directly or through environment pollution and Influence is on a large economic scale.

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 10 Safety-Critical Systems - Present Transportation systems from flight to automobiles –New airplanes contain advanced avionics such as inertial guidance systems and GPS receivers that also have considerable safety requirements. –Automobiles, electric vehicles. and hybrid vehicles are increasingly using embedded systems to maximize efficiency and reduce pollution. Other automotive safety systems such as anti-lock braking system, Electronic Stability Control, and automatic four-wheel drive. Medical equipment is continuing to advance with more embedded systems –Vital signs monitoring –Electronic stethoscopes for amplifying sounds –Various medical imaging for non-invasive internal inspections.

Can We Trust the Computer? Case Study: The Therac-25 Based on Article in IEEE-Computer, July 1993.

Opening the case One of the most widely reported accidents involved the Therac-25 –radiation therapy machine –June 1985 and January 1987 Six known accidents - massive overdoses –causing deaths and serious injuries Worst accidents in 35 year history of medical accelerators “A significant amount of SW for life-critical systems comes from small firms, especially in the medical industry; firms that fit the profile of those resistant to or uninformed of the principles of either system safety or software engineering.” Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 19, MCS-NUST 12

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 13 Therac-25 Massive overdoses of radiation were given; –Medical accelerator to treat tumors –6 known accidents resulting in death or serious injury June 1985 – January 1987 –Caused severe and painful injuries and the death of three patients Airbag sensory system in Automobiles “--- this thing will probably have to work only once in 10 years, but it better work then, otherwise the result will be catastrophic.”

Background of the case Medical linear accelerators accelerate electrons to create high-energy beams that can destroy tumors with minimal impact on surrounding healthy tissue shallow tissue is treated with accelerated electrons: –Deeper tissue requires converting the electron beam into X-ray photons The Therac-25 is a medical linear accelerator. –A linear accelerator ("linac") is a particle accelerator, a device that increases the energy of electrically charged atomic particles. –The charged particle are accelerated by the introduction of an electric field, producing beams of particles which are then focused by magnets. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 19, MCS-NUST 14

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 15 Case study – Therac-25 Linacs are used to treat cancer patients. –A patient is exposed to beams of particles, or radiation, in doses designed to kill a tumor. –Since malignant tissues are more sensitive than normal tissues to radiation exposure, a treatment plan can be developed that permits the absorption of an amount of radiation that is fatal to tumor cells but causes relatively minor damage to normal tissue. –Shallow tissue is treated with electrons, but to reach deeper tissue, X-ray photons are needed

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 16 Development of Therac-25 Developed from the Therac-6’s –A 6MeV accelerator producing only X rays, Evolve to Therac-20's –A 20-MeV dual mode(X Rays or electrons) accelerator SW functionality was limited in both machines, it added convenience to existing hardware –Industry-standard hardware safety features and interlocks in the hardware were retained Therac-25 –Therac-25, dual-mode linear accelerator –more compact and versatile than Therac-20 –Therac-25 takes advantage of computer control from outset while Therac-6 and 20 designed around machines already having histories of clinical use w/o computer control –Therac-25 has more responsibility for maintaining safety than SW in previous machines

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 17 Therac-25's software One programmer, over several years, revised the Therac-6 software into the Therac-25 software. –An important difference between the Therac-20 software and the Therac-25 software is the overall role that each plays in the machine. –In the Therac-20, the role of software is limited. The software simply adds convenience to the hardware. –In the Therac-25, software exclusively performs many of the critical safety checks of the system; these safety checks are also included in the hardware of the Therac-20, but were not included in the Therac-25 hardware.

How it Operates SW responsible for monitoring machine status accepts input about treatment desired, sets machine up for treatment turns beam on, activated by operator command turns beam off when treatment is completed, or when operator commands it OR when a malfunction is detected Unit has an interlock system designed to remove power to unit when there is a HW malfunction Computer monitors interlock system and provides diagnostic messages depending on fault the computer either prevents a treatment from starting OR if treatment is in progress, creates a pause or suspension of treatment

The Safety Analysis Report (before release of product) Programming errors have been reduced by extensive testing on a HW simulator and under field conditions on teletherapy units. –Any residual SW errors are not included in the analysis –Program SW does not degrade due to wear, fatigue, or reproduction process Computer execution errors are caused by faulty HW components and by “soft” (random) errors induced by alpha particles and electromagnetic noise. The fault tree does include computer failure but only hardware failures

Therac-25 SW Testing Manufacturer said the HW and SW were “tested and exercised separately or together over many years” –In deposition, QA manager explained, testing was done in two parts “small amount” of SW testing done on a simulator most done on system Reports indicate that unit and SW testing was minimal Most testing efforts directed to integrated system test Same QA manager at a Therac-25 users meeting stated the SW was tested for 2,700 hours Under questioning by users clarified this as “2700 hours of use” Programmer left AECL in 1986, we know nothing of the programmer AECL employees could not provide any information about the programmers educational background or experience

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 21 Therac-25 Software was carried over from earlier projects where it had seemingly worked well –Therac-6, Therac-20 Computer control added to earlier machines Still capable of stand-alone (no computer) operation –All standard hardware safety mechanisms –Therac-25 Software defects in earlier machines were hidden by hardware safeguards No real software development process Apparently no serious evaluation of risks involved in using software in lieu of hardware safeguards –Single programmer Operating system was developed by one programmer using Assembly Language in the 1970’s. SW “evolved” from Therac-6 (which was started in 1972) Very little SW documentation produced during development When designing dependable systems we must deal with dependability issues from the beginning by addressing fault- tolerance mechanisms within the system design and by employing appropriate fault-avoidance approaches in the design process. Adding dependability later on could be both expensive and might be not so effective as designing it in from the beginning. When designing dependable systems we must deal with dependability issues from the beginning by addressing fault- tolerance mechanisms within the system design and by employing appropriate fault-avoidance approaches in the design process. Adding dependability later on could be both expensive and might be not so effective as designing it in from the beginning. fault avoidance, fault removal and fault tolerance represent three successive lines of defense against the contingency of faults in software systems and their impact on system reliability fault avoidance, fault removal and fault tolerance represent three successive lines of defense against the contingency of faults in software systems and their impact on system reliability

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 22 THE Software Errors Each bug contained in the Therac-25 software was also found in the software of the Therac-20. –However, the hardware safety interfaces in the Therac-20 prevented any accidents from occurring in the other machine. The Therac-25 software errors that cause radiation overexposures can be reduced down to interface errors.

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 23 Fault-free software Fault-free software means software which conforms to its specification. –It does NOT mean software which will always perform correctly as there may be specification errors. Therac-25 –1983 safety analysis, in effect, assumed that software had no errors! “Programming errors have been reduced by extensive testing... Any residual software errors are not included in the analysis.” “Computer execution errors are caused by faulty hardware components and by ‘soft’ (random) errors induced by alpha particles and electromagnetic noise.”

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 24 Diversity and Redundancy Redundancy - Where availability is critical –e.g. in e-commerce systems, companies normally keep backup servers and switch to these automatically if failure occurs. Keep more than 1 version of a critical component available so that if one fails then a backup is available. Diversity - To provide flexibility against external attacks –Different servers may be implemented using different operating systems (e.g. Windows and Linux) Provide the same functionality in different ways so that they will not fail in the same way. However, adding diversity and redundancy adds complexity and this can increase the chances of error.

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 25 Therac-25 SW Testing Manufacturer said the HW and SW were “tested and exercised separately or together over many years” In deposition, QA manager explained, testing was done in two parts –“small amount” of SW testing done on a simulator –most done on system Reports indicate that unit and SW testing was minimal Most testing efforts directed to integrated system test Same QA manager at a Therac-25 users meeting stated the SW was tested for 2,700 hours –Under questioning by users clarified this as “2700 hours of use” –Programmer left AECL in 1986, we know nothing of the programmer AECL employees could not provide any information about the programmers educational background or experience

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 26 Rigorous Software Development Addresses quality and productivity by emphasizing the early stages in the development process –concentrates on developing an early, precise understanding of the required behavior of the system –Think carefully about what you want to do and get it right the first time. Underlying the rigorous approach are formal specification languages –These are mathematically based languages that provide support for abstract and precise descriptions of software systems.

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 27 Therac-25 Overconfidence in Software –Safety analysis did not include software, even though it was responsible for safety of the system –When problems did occur, it was assumed to be a hardware failure –Software was designed for small memory footprint –Self Checks, Error Detection, Error handling and Auditing was left out –Risk Assessment did not include software

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 28 Software inspections. (static verification) –Concerned with analysis of the static system representation to discover problems May be supplement by tool-based document and code analysis Software testing. (dynamic verification) –Concerned with exercising and observing product behaviour The system is executed with test data and its operational behaviour is observed Static and Dynamic verification

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 29 Therac-25 Whose to blame? –The AECL did not take the appropriate measures to insure that the Therac-25 would provide the utmost safety precautions for the patients who were being treated with the software. –Insufficient testing, numerous bugs, bad safety design, and poor programming techniques were all contributors to the incidents that injured patients who trusted the Therac- 25.

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 30 Stages of static analysis Control flow analysis. –Checks for loops with multiple exit or entry points, finds unreachable code, etc. Data use analysis. –Detects uninitialized variables, variables written twice without an intervening assignment, variables which are declared but never used, etc. Interface analysis. –Checks the consistency of routine and procedure declarations and their use

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 31 Therac-25 The Operator Interface –At first, operator needed to enter information at the treatment table, and then re-enter at a console in the control room Operators complained; safeguard was removed –Error codes are reported on the screen with no English explanation Example: (East Texas Cancer Center) “Malfunction 54” reported, caused by “dose input 2”. An AECL technician testified that “does input 2” means the dose delivered was either too high or too low (!) –“Treatment Pause” after non-critical error, which operator can ignore by pressing “P” Causes operators to become insensitive to errors

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 32 Therac-25 Example Bugs –Data Entry Bug Setting the bending magnets takes 8 seconds –“Delay” subroutine uses shared memory with the data entry subroutine –So data changes within 8 seconds will be wiped out when Delay exits! Causes bugs that only show up with proficient users who do data entry in <8 seconds –Set-Up Test Bug On every 256 th pass through Set-Up (one-byte counter), the upper collimator is not checked Problem if operator hits “set” exactly when counter rolls over to 0 –These kinds of bugs are notoriously difficult to track down

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 33 Level of Concern For critical systems the major, minor and moderate safety concerns must be identified Therac-25 –Major: Device directly affects the patient or operator and failure could result in death or serious injury –Moderate: Device directly affects the patient and failure could result in non-serious injury –Minor: Failures will not result in injury

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 34 Levels of Concern Does the software –Control life support device? –Control delivery of harmful energy? –Control treatment delivery? –Provide diagnosis as basis for treatment? –Monitor vital signs? If no to all these questions, then concern is minor

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 35 Safety System property that reflects the system’s ability to operate (normally or abnormally) without danger to system environment –As more devices become software controlled, safety becomes a greater concern –Safety requirements are exclusive (they exclude undesirable situations rather than specify required system services) Safety Criticality –Primary safety-critical systems embedded software systems whose failure can cause associated hardware to fail and directly threaten people –Secondary safety-critical systems systems whose faults can cause other systems to fail which cause threaten people

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 36 Safety and Reliability They are related, but not identical –Reliability concerned with conformance to a specification and delivery of a service –Safety concerned with ensuring a system cannot damage, regardless of its conformance (or nonconformance) to its specification Safety Achievements –Hazard Avoidance system design so some hazard cases can not arise –Hazard Detection and Removal system design so hazards are detected and removed before they result in an accident –Damage Limitation system includes protection features that minimize damage that may result from an accident

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 37 Case study Insulin Pump The system measures the level of blood sugar every 10 minutes and if this level is above a certain value and is increasing then the dose of insulin to counteract the increase is computed and injected into the diabetic The system can also detect abnormally low levels of blood sugar and, if these occur, an alarm is sounded to warn the diabetic that they should take some action.

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 38 Dependability requirements The system shall be available to deliver insulin when required to do so. The system shall perform reliability and deliver the correct amount of insulin to counteract the current level of blood sugar. The essential safety requirement is that excessive doses of insulin should never be delivered as this is potentially life threatening.

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 39 Dependability attributes Availability –The pump should have a high level of availability but the nature of diabetes is such that continuous availability is unnecessary Reliability –Intermittent demands for service are made on the system Safety –The key safety requirements are that the operation of the system should never result in a very low level of blood sugar. A fail-safe position is for no insulin to be delivered Security –Not really applicable in this case

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 40 Sample Requirement Specifications

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 41 General dependability requirements –SR1: The system shall not deliver a single dose of insulin that is greater than a specified maximum dose for a system user. –SR2: The system shall not deliver a daily cumulative dose of insulin that is greater than a specified maximum for a system user. –SR3: The system shall include a hardware diagnostic facility that should be executed at least 4 times per hour. –SR4: The system shall include an exception handler for all of the exceptions that are identified in Table ….. –SR5: The audible alarm shall be sounded when any hardware anomaly is discovered and a diagnostic message as defined in Table ……. should be displayed.

Insulin Pump System Design The important design decisions made during the production of insulin pump software and the simulator. –Approach used to produce the insulin pump software was to emulate the hardware organization by producing separate software objects (classes) for each distinguishable hardware object Controller:: Clock:: Display:: Simulator:: Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 42 System Architecture Insulin pump components

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 43 The software objects The controller object:, the bulk of the computation within the system is carried –It is within the controller that the dose of insulin to be delivered is computed and where the self tests are performed Clock Object:, Working in together with the controller object, –Constantly determining how much time has lapsed since the software was started or the timer was reset (which happens every 24 hours). Periodically, at every interval specified the clock triggers certain events required to be performed by the system

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 44 The software objects Display:: Object, used to create a graphical user interface (GUI), –The data is then presented to the user via text boxes positioned on the GUI The remaining software objects model the peripheral hardware units, –the software contained within these objects simply records the current state of the hardware unit and for the purpose of simulation, provides the functionality to change that state.

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 45 The software objects The simulator software object: –Provides the user with the functionality to perform a simulation of real-world events that would affect the pump software in differing manners The simulator facilitates the testing process –making it quicker and easier to perform the necessary testing required in order to determine whether the insulin pump system is adequately safe.

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 46 Object Interaction – Object classes

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 47 Object Interaction – Sequence Diagrams

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 48 Object Interaction – Sequence Diagrams

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 49 Insulin delivery system Data flow model of software-controlled insulin pump

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 50 Concept of operation Using readings from the embedded sensor, the system automatically measures the level of glucose in the sufferer’s body –Consecutive readings are compared and, if they indicate that the level of glucose is rising then insulin is injected to counteract this rise The ideal situation is a consistent level of sugar that is within some ‘safe’ band

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 51 Sugar levels Unsafe –A very low level of sugar (arbitrarily, we will call this 3 units) is dangerous and can result in hypoglaecemia which can result in a diabetic coma and ultimately death. Safe –Between 3 units and about 7 units, the levels of sugar are ‘safe’ and are comparable to those in people without diabetes. This is the ideal band. Undesirable –Above 7 units of insulin is undesirable but high levels are not dangerous in the short-term. Continuous high-levels however can result in long-term side-effects.

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 52 Injection scenarios Level of sugar is in the unsafe band –Do not inject insulin; –Initiate warning for the sufferer. Level of sugar is falling –Do not inject insulin if in safe band. Inject insulin if rate of change of level is decreasing. Level of sugar is stable –Do not inject insulin if level is in the safe band; –Inject insulin if level is in the undesirable band to bring down glucose level; –Amount injected should be proportionate to the degree of undesirability ie inject more if level is 20 rather than 10.

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 53 System availability In specifying the availability, issues that must be considered are: –The machine does not have to be continuously available as failure to deliver insulin on a single occasion is not a problem –However, no insulin delivery over a few hours would have an effect on the patient’s health –The machine software can be reset by switching it on and off hence recovery from software errors is possible without compromising the usefulness of the system –Hardware failures can only be repaired by return to the manufacturer. This means, in practice, a loss of availability of at least 3 days

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 54 Probability of Failure on Demand Probability system will fail when a service request is made Useful when requests are made on an intermittent or infrequent basis Appropriate for protection systems service requests may be rare and consequences can be serious if service is not delivered Relevant for many safety-critical systems with exception handlers

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 55 System failures Transient failures –can be repaired by user actions such as resetting or recalibrating the machine. For these types of failure, a relatively low value of POFOD (0.002) may be acceptable. –This means that one failure may occur in every 500 demands made on the machine. This is approximately once every 3.5 days. Permanent failures –require the machine to be repaired by the manufacturer The probability of this type of failure should be much lower –Roughly once a year is the minimum figure so POFOD should be no more than

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 56 Safety Processes Hazard and risk analysis –assess the hazards and risks associated with the system Startup, Alarm, Low battery, Needle, reservoir Safety requirements specification –specify system safety requirements Power off, reset, hardware simulator Designation of safety-critical systems –identify sub-systems whose incorrect operation can compromise entire system safety Controller, display, clock, sensor Safety validation –check overall system safety

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 57 System hazard analysis Physical hazards –Hazards that result from some physical failure of the system Electrical hazards –Hazards that result from some electrical failure of the system Biological hazards –Hazards that result from some system failure that interferes with biological processes

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 58 insulin overdose or underdose (biological) power failure (electrical) machine interferes electrically with other medical equipment such as a heart pacemaker (electrical) parts of machine break off in patient’s body (physical) infection caused by introduction of machine (biological.) allergic reaction to the materials or insulin used in the machine (biological). Insulin system hazards

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 59 Risk Assessment Assess the hazard severity, hazard probability, and accident probability –Outcome of risk assessment is a statement of acceptability Intolerable (can never occur) ALARP (as low as possible given cost and schedule constraints) Acceptable (consequences are acceptable and no extra cost should be incurred to reduce it further)

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 60 Risk analysis example

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 61 Fault-tree Analysis Hazard analysis method that starts with an identified fault and works backwards to the cause of the fault –Can be used at all stages of hazard analysis Hazard Analysis Steps –Identify hazard –Identify potential causes of hazards –Link combinations of alternative causes using “OR” or “AND” symbols as appropriate –Continue process until “root” causes are identified (result will be an and/or tree or a logic circuit) the causes are the “leaves”

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 62 Insulin pump fault tree

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 63 Software problems Arithmetic error –Some arithmetic computation causes a representation failure (overflow or underflow) Specification may state that arithmetic error must be detected and an exception handler included for each arithmetic error. –The action to be taken for these errors should be defined The insulin dose is computed incorrectly because of some failure of the computer arithmetic Algorithmic error –Difficult to detect anomalous situation –May use ‘realism’ checks on the computed dose of insulin The dose computation algorithm is incorrect

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 64 Use language exception handling mechanisms to trap errors as they arise Use explicit error checks for all errors which are identified Avoid error-prone arithmetic operations (multiply and divide). –Replace with add and subtract Never use floating-point numbers Shut down system if exception detected (safe state) Arithmetic errors

Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST 65 Safety validation Design validation –Checking the design to ensure that hazards do not arise or that they can be handled without causing an accident. Code validation –Testing the system to check the conformance of the code to its specification and to check that the code is a true implementation of the design. Run-time validation –Designing safety checks while the system is in operation to ensure that it does not reach an unsafe state.