EE 585 : FAULT TOLERANT COMPUTING SYSTEMS B.RAM MOHAN

Slides:



Advertisements
Similar presentations
ES050 – Introductory Engineering Design and Innovation Studio Prof. Ken McIsaac One last word…
Advertisements

CSCI 5230: Project Management Software Reuse Disasters: Therac-25 and Ariane 5 Flight 501 David Sumpter 12/4/2001.
IT Roles and Responsibilities: How Good is Good Enough? IS 485, Professor Matt Thatcher.
“An Investigation of the Therac-25 Accidents” by Nancy G. Leveson and Clark S. Turner Catherine Schell CSC 508 October 13, 2004.
The Therac-25: A Software Fatal Failure
A Gift of Fire, 2edChapter 4: Can We Trust the Computer?1 PowerPoint ® Slides to Accompany A Gift of Fire : Social, Legal, and Ethical Issues for Computers.
An Investigation of the Therac-25 Accidents Nancy G. Leveson Clark S. Turner IEEE, 1993 Presented by Jack Kustanowitz April 26, 2005 University of Maryland.
Can We Trust the Computer? Case Study: The Therac-25 Based on Article in IEEE-Computer, July 1993.
Therac-25 Lawsuit for Victims Against the AECL
+ THE THERAC-25 - A SOFTWARE FATAL FAILURE Kpea, Aagbara Saturday SYSM 6309 Spring ’12 UT-Dallas.
Computingcases.org Safeware
Motivation Why study Software Engineering ?. What is Engineering ? 2 Engineering (Webster) – The application of scientific and mathematical principles.
A Gift of Fire Third edition Sara Baase
A Gift of Fire Third edition Sara Baase
Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.
Software Failures Ron Gilmore, CMC Edmonton April 2006.
Lecture 7, part 2: Software Reliability
Dr Andy Brooks1 Lecture 4 Therac-25, computer controlled radiation therapy machine, that killed people. FOR0383 Software Quality Assurance.
DJ Wattam, Han Junyi, C Mongin1 COMP60611 Directed Reading 1: Therac-25 Background – Therac-25 was a new design dual mode machine developed from previous.
Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE Requirements Engineering – Fall 2013.
Therac-25 : Summary Malfunction Complacency Race condition (turntable / energy mismatch) Data overflow (turntable not positioned) time‘85‘86‘88 ‘87 Micro-switch.
Software Safety Case Study Medical Devices : Therac 25 and beyond Matthew Dwyer.
Therac-25 Final Presentation
Therac 25 Nancy Leveson: Medical Devices: The Therac-25 (updated version of IEEE Computer article)
Software Metrics - Data Collection What is good data? Are they correct? Are they accurate? Are they appropriately precise? Are they consist? Are they associated.
ITGS Software Reliability. ITGS All IT systems are a combination of: –Hardware –Software –People –Data Problems with any of these parts, or a combination.
Course: Software Engineering © Alessandra RussoUnit 1 - Introduction, slide Number 1 Unit 1: Introduction Course: C525 Software Engineering Lecturer: Alessandra.
Chapter 8: Errors, Failures, and Risk
CSE 403 Lecture 14 Safety and Security Requirements.
Liability for Computer Errors Not covered in textbook.
Security and Reliability THERAC CASE STUDY TEXTBOOK: BRINKMAN’S ETHICS IN A COMPUTING CULTURE READING: CHAPTER 5, PAGES
Dimitrios Christias Robert Lyon Andreas Petrou Dimitrios Christias Robert Lyon Andreas Petrou.
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. System design techniques Quality assurance. 1.
Computingcases.org Safeware
Therac-25 CS4001 Kristin Marsicano. Therac-25 Overview  What was the Therac-25?  How did it relate to previous models? In what ways was it similar/different?
BA6 Cooling Towers Test Day Process Control Functionality and Performance Tests TCR – PCR Monitoring.
CS, AUHenrik Bærbak Christensen1 Critical Systems Sommerville 7th Ed Chapter 3.
Dr. Rob Hasker. Classic Quality Assurance  Ensure follow process Solid, reviewed requirements Reviewed design Reviewed, passing tests  Why doesn’t “we.
Overview of the main events related to TS equipment during 2007 Definition Number and category of the events Events and measures taken for each machine.
©2001 Southern Illinois University, Edwardsville All rights reserved. Today Finish Ethics Next Week Research Topics in HCI CS 321 Human-Computer Interaction.
Dr. Rob Hasker. Classic Quality Assurance  Ensure follow process Solid, reviewed requirements Reviewed design Reviewed, passing tests  Why doesn’t “we.
Directed Reading 1 Girish Ramesh – Andres Martin-Lopez – Bamdad Dashtban –
Increasing use of automated systems
Software Quality and Testing (CSC 4133)
Why study Software Design/Engineering ?
ATTRACT TWD Symposium, Barcelona, Spain, 1st July 2016
COMP60611 Directed Reading 1: Therac-25
On-Board Diagnostics Chapter 18 Lesson 1.
CONFIGURING HARDWARE DEVICE & START UP PROCESS
Machine Guarding Element Summary Training
swingo XP Firmware, Service Tool & Error Codes
Design for Quality Design for Quality and Safety Design Improvement
Software Quality Assurance Lecture 1
Therac-25 Accidents What was Therac-25? Who developed it?
Thursday’s Lecture Chemistry Building Musspratt Lecture Theatre,
Computer-System Architecture
Module 2: Computer-System Structures
Fault Tolerance Distributed Web-based Systems
A Gift of Fire Third edition Sara Baase
PowerPoint® Slides to Accompany
Reliability and Safety
Therac-25.
System design techniques
Week 13: Errors, Failures, and Risks
Module 2: Computer-System Structures
1v1.
Module 2: Computer-System Structures
Module 2: Computer-System Structures
easYgen-3000XT Series Training
A Gift of Fire Third edition Sara Baase
Presentation transcript:

EE 585 : FAULT TOLERANT COMPUTING SYSTEMS B.RAM MOHAN THERAC 25 RAM MOHAN EE 585 : CASE STUDY

Background The most serious computer related accidents to date. Therac 25 was a medical linear accelerator , a linac developed by Atomic Energy Of Canada Ltd(AECL). Therac 25 was a radio therapy machine used to destroy tumors using high energy beams. 11 Therac 25s were installed - 5 in US , 6 in Canada. For shallow tissue penetration, the electrons are used; and to reach deeper tissue, the beam was converted into x-ray form. RAM MOHAN EE 585 : CASE STUDY

Background(Contd..) Therac 25 was derived from its previous version Therac 6 and Therac 20. Differences from Therac 20 - Uses double pass technique which is absent in previous versions - Software is responsible for safety - Hardware safety interlocks removed - Less space and economic RAM MOHAN EE 585 : CASE STUDY

Modes Of Operation RAM MOHAN EE 585 : CASE STUDY

Set Up Of The Machine RAM MOHAN EE 585 : CASE STUDY

General Layout RAM MOHAN EE 585 : CASE STUDY

Therac-25 Turntable Field Light Mirror Counterweight Beam Flattener (X-ray Mode) Turntable Scan Magnet (Electron Mode) RAM MOHAN EE 585 : CASE STUDY

Accidents 3 June 1985 – patient at Marietta GA received overdose 26 July 1985 – Hamilton ONT patient severely burned , died November 1985 December 1985 – patient in Yakima Wa receives overdose 21 March 1986 - Tyler TX accident 11 April 1986 – 2nd Tyler TX accident 17 January 1987 - Second Yakima WA Accident RAM MOHAN EE 585 : CASE STUDY

Responses 3 JUNE 1985 MARIETTA GA not recognised as overdose until after tyler incident 26 JULY 1985 HAMILTON ONT operator overdose no dose indications not suspected of overdose until patient returned suspected microswitch malfunction-fixed DECEMBER 1985 YAKIMA WA not ascribed to overdose until second incident 21 MARCH 1986 TYLER TX malfunction 54 – operator override – “electrical surge” 11 APRIL1986 TYLER TX thought to be editing error – up arrow key disabled 17 JANUARY 1987 YAKIMA WA all systems shutdown – complete investigation and rework Manufacturer, government, and user response. On February 3, 1987, after interaction with the FDA and others, including the user group, AECL announced to its customers a new software release to correct both the Tyler and Yakima software problems, a hardware single-pulse shutdown circuit, a turntable potentiometer to independently monitor turntable position, and a hardware turntable interlock circuit. RAM MOHAN EE 585 : CASE STUDY

Why? The turntable was in the wrong position. Patients were receiving x-rays without beam-scattering. No hardware safety interlocks Non descriptive error messages User override able error modes Software designed by only one person RAM MOHAN EE 585 : CASE STUDY

Cost of the Bug To users (patients): To developers (AECL): Four deaths, two other serious injuries. To developers (AECL): One lawsuit Settled out of court Time/money to investigate and fix the bugs To product owners (11 hospitals): System downtime RAM MOHAN EE 585 : CASE STUDY

Corrective Action Plan Numerous hardware and software changes All interruptions related to dosimetry not continuable independent hardware & software shutdowns potentiometer on turntable hardware interlocks “dead man switch” motion enable Fix documentation, messages, & user manuals All interruptions related to the dosimetry system will go to a treatment suspend, not a treatment pause. Operators will not be allowed to restart the machine without reentering all parameters. A software single-pulse shutdown will be added. An independent hardware single-pulse shutdown will be added. Monitoring logic for turntable position will be improved to ensure that the turntable is in one of the three legal positions. A potentiometer will be added to the turntable. It will provide a visible signal of position that operators will use to monitor exact turntable location. Interlocking with the 270-degree bending magnet will be added to ensure that the target and beam flattener are in position if the X-ray mode is selected. Beam on will be prevented if the turntable is in the field-light or an intermediate position. Cryptic malfunction messages will be replaced with meaningful messages and highlighted dose-rate messages. Editing keys will be limited to cursor up, backspace, and return. All other keys will be inoperative. A motion-enable foot switch will be added, which the operator must hold closed during movement of certain parts of the machine to prevent unwanted motions when the operator is not in control (a type of "dead man's switch"). Twenty-three other changes to the software to improve its operation and reliability, including disabling of unused keys, changing the operation of the set and reset commands, preventing copying of the control program on site, changing the way various detected hardware faults are handled, eliminating errors in the software that were detected during the review process, adding several additional software interlocks, disallowing changing to the service mode while a treatment is in progress, and adding meaningful error messages. The known software problems associated with the Tyler and Yakima accidents will be fixed. The manuals will be fixed to reflect the changes. RAM MOHAN EE 585 : CASE STUDY

Lessons Learned For complex interrupt-driven software ,timing is of critical importance Not to remove standard hardware interlocks when adding computer control Revalidate reused software Not to overrely on software In a 1987 paper, Miller, director of the Division of Standards Enforcement, CDRH, wrote about the lessons learned from the Therac-25 experiences.[6] The first was the importance of safe versus "user-friendly" operator interfaces - in other words, making the machine as easy as possible to use may conflict with safety goals. The second is the importance of providing fail-safe designs: The second lesson is that for complex interrupt-driven software, timing is of critical importance. In both of these situations, operator action within very narrow time-frame windows was necessary for the accidents to occur. It is unlikely that software testing will discover all possible errors that involve operator intervention at precise time frames during software operation. These machines, for example, have been exercised for thousands of hours in the factory and in the hospitals without accident. Therefore, one must provide for prevention of catastrophic results of failures when they do occur. I, for one, will not be surprised if other software errors appear with this or other equipment in the future. RAM MOHAN EE 585 : CASE STUDY

References An investigation of the Therac-25 Accidents Nancy Leveson Clark S.Turner www.bowdoin.edu/~allen/courses/cs260/readings/therac.pdf - RAM MOHAN EE 585 : CASE STUDY