Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013.

Slides:



Advertisements
Similar presentations
Test process essentials Riitta Viitamäki,
Advertisements

System Integration Verification and Validation
P5, M1, D1.
CS0004: Introduction to Programming Visual Studio 2010 and Controls.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
CSCI 5230: Project Management Software Reuse Disasters: Therac-25 and Ariane 5 Flight 501 David Sumpter 12/4/2001.
IT Roles and Responsibilities: How Good is Good Enough? IS 485, Professor Matt Thatcher.
“An Investigation of the Therac-25 Accidents” by Nancy G. Leveson and Clark S. Turner Catherine Schell CSC 508 October 13, 2004.
The Therac-25: A Software Fatal Failure
Background Increasing use of automated systems Hardware and software technology are improving rapidly User interface technology is lagging Critical bottleneck.
An Investigation of the Therac-25 Accidents Nancy G. Leveson Clark S. Turner IEEE, 1993 Presented by Jack Kustanowitz April 26, 2005 University of Maryland.
Can We Trust the Computer? Case Study: The Therac-25 Based on Article in IEEE-Computer, July 1993.
Therac-25 Lawsuit for Victims Against the AECL
+ THE THERAC-25 - A SOFTWARE FATAL FAILURE Kpea, Aagbara Saturday SYSM 6309 Spring ’12 UT-Dallas.
Software Engineering Disasters
Programming Logic and Design Fourth Edition, Introductory
16/27/2015 3:38 AM6/27/2015 3:38 AM6/27/2015 3:38 AMTesting and Debugging Testing The process of verifying the software performs to the specifications.
Jacky: “Safety-Critical Computing …” ► Therac-25 illustrated that comp controlled equipment could be less safe. ► Why use computers at all, if satisfactory.
CSE 341 S. Tanimoto Social/Ethical Issues - 1 Social and Ethical Issues in Programming Language Design Can harm be done by designers of programming languages?
School of Computer ScienceG53FSP Formal Specification1 Dr. Rong Qu Introduction to Formal Specification
CS 235: User Interface Design January 22 Class Meeting
Systems Analysis Chapter 8 P 94 to P 101
Activity 1 - WBs 5 mins Go online and spend a moment trying to find out the difference between: HIGH LEVEL programming languages and LOW LEVEL programming.
Systems Life Cycle A summary of what needs to be done.
Lecture 7, part 2: Software Reliability
Dr Andy Brooks1 Lecture 4 Therac-25, computer controlled radiation therapy machine, that killed people. FOR0383 Software Quality Assurance.
DJ Wattam, Han Junyi, C Mongin1 COMP60611 Directed Reading 1: Therac-25 Background – Therac-25 was a new design dual mode machine developed from previous.
Therac-25 : Summary Malfunction Complacency Race condition (turntable / energy mismatch) Data overflow (turntable not positioned) time‘85‘86‘88 ‘87 Micro-switch.
CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.
Software Safety Case Study Medical Devices : Therac 25 and beyond Matthew Dwyer.
Therac-25 Final Presentation
Programming. What is a Program ? Sets of instructions that get the computer to do something Instructions are translated, eventually, to machine language.
Therac 25 Nancy Leveson: Medical Devices: The Therac-25 (updated version of IEEE Computer article)
Software Metrics - Data Collection What is good data? Are they correct? Are they accurate? Are they appropriately precise? Are they consist? Are they associated.
ITGS Software Reliability. ITGS All IT systems are a combination of: –Hardware –Software –People –Data Problems with any of these parts, or a combination.
Course: Software Engineering © Alessandra RussoUnit 1 - Introduction, slide Number 1 Unit 1: Introduction Course: C525 Software Engineering Lecturer: Alessandra.
CS 235: User Interface Design August 25 Class Meeting Department of Computer Science San Jose State University Fall 2014 Instructor: Ron Mak
CSE 403 Lecture 14 Safety and Security Requirements.
Liability for Computer Errors Not covered in textbook.
Security and Reliability THERAC CASE STUDY TEXTBOOK: BRINKMAN’S ETHICS IN A COMPUTING CULTURE READING: CHAPTER 5, PAGES
 CS 5380 Software Engineering Chapter 8 Testing.
Dimitrios Christias Robert Lyon Andreas Petrou Dimitrios Christias Robert Lyon Andreas Petrou.
©2001 Southern Illinois University, Edwardsville All rights reserved. Today Fun with Icons Thursday Presentation Lottery Q & A on Final Exam Course Evaluations.
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. System design techniques Quality assurance. 1.
IT Essentials: PC Hardware and Software v4.0. Chapter 4 Objectives 4.1 Explain the purpose of preventive maintenance 4.2 Identify the steps of the troubleshooting.
VCE IT Theory Slideshows By Mark Kelly Vceit.com Problem Solving Methodology 3 Development.
Therac-25 CS4001 Kristin Marsicano. Therac-25 Overview  What was the Therac-25?  How did it relate to previous models? In what ways was it similar/different?
Chapter 1: Fundamental of Testing Systems Testing & Evaluation (MNN1063)
CS, AUHenrik Bærbak Christensen1 Critical Systems Sommerville 7th Ed Chapter 3.
CSC 480 Software Engineering Test Planning. Test Cases and Test Plans A test case is an explicit set of instructions designed to detect a particular class.
SAFEWARE System Safety and Computers Chap18:Verification of Safety Author : Nancy G. Leveson University of Washington 1995 by Addison-Wesley Publishing.
CSCI 3428: Software Engineering Tami Meredith Chapter 7 Writing the Programs.
Software Engineering Lecture 8: Quality Assurance.
Software Development Process CS 360 Lecture 3. Software Process The software process is a structured set of activities required to develop a software.
Directed Reading 1 Girish Ramesh – Andres Martin-Lopez – Bamdad Dashtban –
Universal Systems Model. Has 4 elements – Has 4 elements – Inputs Inputs Process Process Output Output Feedback Feedback.
SOFTWARE TESTING Date: 29-Dec-2016 By: Ram Karthick.
IAEA E-learning Program
Chapter 18 Maintaining Information Systems
EE 585 : FAULT TOLERANT COMPUTING SYSTEMS B.RAM MOHAN
COMP60611 Directed Reading 1: Therac-25
Programmable Logic Controllers (PLCs) An Overview.
TRANSLATORS AND IDEs Key Revision Points.
Therac-25 Accidents What was Therac-25? Who developed it?
Reliability and Safety
System design techniques
Therac-25: A Lesson Learned
CSE403 Software Engineering Autumn 2000 Requirements
In the Senior Design Center
Software Engineering Disasters
Presentation transcript:

Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE Requirements Engineering – Fall 2013

The Atomic Age World War II ushered in the atomic age The start of the nuclear arms race In many countries… The question was how to harness this power for peaceful purposes 2

In Canada: AECL Atomic Energy of Canada Limited is a “Crown Corporation” Designed and implemented a Heavy Water nuclear reactor The CANDU system It also included AECL-Medical Harnessing the atom for medical reasons 3

AECL & CGR – Medical Accelerator Technology AECL-Medical and the French company: la Compagnie Générale de Radiologie (CGR) Worked together during the 1970s on using linear accelerators for radio-therapy High energy, low dose, Electron beams, or A stream of photons in the X-Ray spectrum The two companies’ partnership produced The 6 MeV, X-Ray only “Therac-6” The dual mode, 20 MeV “Therac-20” 4

Therac-6 & Therac-20 Stand-alone electro-mechanical units Operator could Set all settings manually Position beam devices manually Once everything was set, and system was “safe” – deliver the dose The system had an optional computer that allowed a simpler UI A Digital Equipment PDP kilobytes of memory All assembly code 5

True Innovation: the Therac-25 AECL only – CGR partnership had dissolved Used a Double-Pass accelerator Halved the space that the Therac-6 & Therac-20 had occupied Made the computer the primary controller No stand-alone manual mode Shipped in 1983 Still used a DEC PDP-11 6

It was the best on the market… Except… It seriously injured 6 patients between 1985 and 1987 Killing 3 of those patients All because of software 7

Hubris When an engineer graduates in Canada, he/she attends The Ritual Calling of an Engineer And gets an Iron Ring Rudyard Kipling wrote the ceremony Instills a sense of professionalism And humility 8

Supreme Faith in Software It appears that this device had rigorous safety engineering on the hardware side Complete hazard analysis – fault tree On the software side, the likelihood of error was described in insanely low terms Fault probabilities on the order of and “Software does not degrade due to wear, fatigue or the reproduction process” They had no expectation that a bug could cause a problem 9

Malfunction 54 When there was a problem, the UI displayed the word “Malfunction” followed by a number 1-64 There was NO documentation of what these codes were in the user manual An internal AECL service manual described #54 as “dose input 2” and pointed out that this error code was only there for internal diagnostic reasons Under normal conditions, an operator might see as many as 40 malfunction codes in a day But Malfunction 54 was very rare They were easily dismissed by pressing [P] (for “Proceed”) 10

Electron Mode vs. X-Ray Mode In Electron Mode a low power beam is scanned across the patient In X-Ray mode a high power beam is aimed at a target, producing X-Rays, which then irradiate the patient The electron scanning mechanism and X-Ray target were mounted on a turntable The position was controlled by the computer 11

Usability User interface was a VT-100 Green Screen Contained the Prescription Entered by the operator Originally – on error, prescription had to be re- entered Usability studies changed this, near the end of the dev cycle Introduced a major error 12 PATIENT NAME : JOHN DOE TREATMENT MODE : FIX BEAM TYPE: X ENERGY (MeV): 25 ACTUAL PRESCRIBED UNIT RATE/MINUTE MONITOR UNITS TIME (MIN) GANTRY ROTATION (DEG) VERIFIED COLLIMATOR ROTATION (DEG) VERIFIED COLLIMATOR X (CM) VERIFIED COLLIMATOR Y (CM) VERIFIED WEDGE NUMBER 1 1 VERIFIED ACCESSORY NUMBER 0 0 VERIFIED DATE : 84-OCT-26 SYSTEM : BEAM READY OP.MODE: TREAT AUTO TIME : 12:55. 8 TREAT : TREAT PAUSE X-RAY OPR ID : T25VO2-RO3 REASON : OPERATOR COMMAND:

A Race Condition – UI & Operations Threads In the Therac-25, the prescription information was entered The Electron/X-Ray mode Then a command to execute If the operator Entered an X-Ray command in error Re-edited the page and changed it to Electron Then executed the dose, all within 8 seconds Then the patient was given an X-Ray dose directly through the Electron turntable element 13 PATIENT NAME : JOHN DOE TREATMENT MODE : FIX BEAM TYPE: X ENERGY (MeV): 25 ACTUAL PRESCRIBED UNIT RATE/MINUTE MONITOR UNITS TIME (MIN) GANTRY ROTATION (DEG) VERIFIED COLLIMATOR ROTATION (DEG) VERIFIED COLLIMATOR X (CM) VERIFIED COLLIMATOR Y (CM) VERIFIED WEDGE NUMBER 1 1 VERIFIED ACCESSORY NUMBER 0 0 VERIFIED DATE : 84-OCT-26 SYSTEM : BEAM READY OP.MODE: TREAT AUTO TIME : 12:55. 8 TREAT : TREAT PAUSE X-RAY OPR ID : T25VO2-RO3 REASON : OPERATOR COMMAND: Malfunction 54

Why Have One Deadly Bug? A second deadly bug was eventually found in the Therac-25 The system periodically tested if everything is positioned properly, setting a variable with the result of the test A zero indicated OK Instead of simply setting the value to 1 or 0, the program incremented the value And, the variable was a byte The result was that every 256 tests of the positioning, the system would falsely indicate that everything was ready to proceed. 14

Noteworthy: The Users Found the Bugs It’s worth noting that AECL’s reaction to the problems initially was denial Eventually, the got to the stage where they did piecemeal fixes Without the efforts of the staff at the East Texas Cancer Center in Tyler, AECL might never have acknowledged the first bug After two accidents – with the same operator – they spent time trying to recreate the race condition After the Therac-25, the FDA changed the way it evaluated software (and software engineering) in medical devices. 15

The Scorecard Total AccidentsDeaths Malfunction 54 Race Condition 32 Incorrect Increment Logic 31* Total63 16  One patient died of cancer, but would have died of radiation poisoning in a few weeks had the cancer not killed him

Not the Bugs – The Software Engineering All software systems have bugs Even Knuth hands out the occasional $2.56 check AECL coalesced their entire operator interface, control system and safety system into one program They apparently had very little in the way of formal requirements gathering, design or development standards All of the software was developed by one programmer Their reaction to the problems was to fix them one at a time 17

Software Reuse The Therac-20 reused some of the software from the Therac-6 The Therac-25 reused software from both of the previous models But The earlier models had hardware interlocks to prevent over-dosing The desire to reuse previous software resulted in a Home-made real-time operating system On an expensive, 10 year old computer system Running a program written entirely in assembly language That relied on global variables for inter-task communication – without synchronization 18

No Requirement to Separate Layers AECL architected the Therac-25’s software into a single point of failure This was far from accepted practice in the early 1980s Safety systems were migrating from hardware to software But… they were usually separate, simpler systems – e.g. PLCs By the early 80s, there were usually three distinct layers Safety and integrity Control and positioning Operator interface and supervisory 19

Testability – Auditing AECL’s task architecture and real time OS made adequate testing nearly impossible Look at the deadly errors – neither is discoverable through testing No auditing of operations, or failures was included in the system After all the issues with the Therac-25, a check was done on the Therac-20 system and the same bugs were found But, because that system had mechanical interlocks, no injuries resulted 20

References “Medical Devices – The Therac-25”, Levenson, Nancy. “An Investigation of the Therac-25 Accidents”, Levenson, Nancy and Turner, Clark S., IEEE Computer, Vol. 26, No. 7, July 1993, pp “Fatal Dose - Radiation Deaths linked to AECL Computer Errors”, Rose, Barbara Wade, Saturday Night (magazine), June, “Safety-Critical Computing: Hazards, Practices, Standards, and Regulation”, Jacky, Jonathan, “Therac-25”, Wikipedia “PDP-11”, Wikipedia “PDP-11 architecture”, Wikipedia