Errors, Failures and Risks CS4020 Overview Failures and Errors in Computer Systems Case Study: The Therac-25 Increasing Reliability and Safety Dependence,

Slides:



Advertisements
Similar presentations
“An Investigation of the Therac-25 Accidents” by Nancy G. Leveson and Clark S. Turner Catherine Schell CSC 508 October 13, 2004.
Advertisements

The Therac-25: A Software Fatal Failure
A Gift of Fire, 2edChapter 4: Can We Trust the Computer?1 PowerPoint ® Slides to Accompany A Gift of Fire : Social, Legal, and Ethical Issues for Computers.
OHT 2.1 Galin, SQA from theory to implementation © Pearson Education Limited 2004 Software Quality assurance (SQA) SWE 333 Dr Khalid Alnafjan
Social Implications of a Computerized Society Computer Errors Instructor: Oliver Schulte Simon Fraser University.
An Investigation of the Therac-25 Accidents Nancy G. Leveson Clark S. Turner IEEE, 1993 Presented by Jack Kustanowitz April 26, 2005 University of Maryland.
Therac-25 Lawsuit for Victims Against the AECL
Can We Trust the Computer?
Ethics in a Computing Culture
Chapter 11 Software Development Horror Stories. Sampling of Software Problems = Faye Starman gets an electric bill for $6.3 million instead of $63 due.
Slides prepared by Cyndi Chie and Sarah Frye. Fourth edition revisions by Sharon Gray. A Gift of Fire Fourth edition Sara Baase Chapter 8: Errors, Failures,
Reliability and Safety Lessons Learned. Ways to Prevent Problems Good computer systems Good computer systems Good training Good training Accountability.
Motivation Why study Software Engineering ?. What is Engineering ? 2 Engineering (Webster) – The application of scientific and mathematical principles.
Software Engineering Modern Approaches Eric Braude and Michael Bernstein 1.
IT Safety and Reliability Professor Matt Thatcher.
A Gift of Fire Third edition Sara Baase
A Gift of Fire Third edition Sara Baase
Jacky: “Safety-Critical Computing …” ► Therac-25 illustrated that comp controlled equipment could be less safe. ► Why use computers at all, if satisfactory.
Lecture 7, part 2: Software Reliability
DJ Wattam, Han Junyi, C Mongin1 COMP60611 Directed Reading 1: Therac-25 Background – Therac-25 was a new design dual mode machine developed from previous.
Slides prepared by Cyndi Chie and Sarah Frye A Gift of Fire Third edition Sara Baase Chapter 8: Errors, Failures, and Risks Version modified by Cheryl.
Therac-25 : Summary Malfunction Complacency Race condition (turntable / energy mismatch) Data overflow (turntable not positioned) time‘85‘86‘88 ‘87 Micro-switch.
Why is software engineering worth studying?  Demand for software is growing dramatically  Software costs are growing per system  Many projects have.
MassMEDIC Risk Management: Legal and Liability Issues with Home Healthcare Products Raymond C. Zemlin Goodwin Procter LLP (March 9, 2006) ©2006. Goodwin.
Project Management : Techniques and Tools (60-499) Fall 2014 / Winter 2015.
Therac 25 Nancy Leveson: Medical Devices: The Therac-25 (updated version of IEEE Computer article)
ITGS Software Reliability. ITGS All IT systems are a combination of: –Hardware –Software –People –Data Problems with any of these parts, or a combination.
Chapter 8: Errors, Failures, and Risk
1 Can We Trust the Computer? What Can Go Wrong? Case Study: The Therac-25 Increasing Reliability and Safety Perspectives on Failures, Dependence, Risk,
Liability for Computer Errors Not covered in textbook.
2.2 Software Myths 2.2 Software Myths Myth 1. The cost of computers is lower than that of analog or electromechanical devices. –Hardware is cheap compared.
Slides prepared by Cyndi Chie and Sarah Frye1 A Gift of Fire Third edition Sara Baase Chapter 8: Errors, Failures, and Risks.
Security and Reliability THERAC CASE STUDY TEXTBOOK: BRINKMAN’S ETHICS IN A COMPUTING CULTURE READING: CHAPTER 5, PAGES
Therac-25 Case Family vs. Programmer. People Suffered From Different Type of Bad Programming Database accuracy problems. Many people could not vote in.
Dimitrios Christias Robert Lyon Andreas Petrou Dimitrios Christias Robert Lyon Andreas Petrou.
CS 4001Mary Jean Harrold 1 Can We Trust the Computer?
Basic of Software Testing Presented by The Smartpath Information System An ISO 9001:2008 Certified Organization
CptS 401 Adam Carter. Announcement  Executive decision: no class Thursday! (CH and exam review will take place tomorrow instead)  Be sure that.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 1: Software and Software Engineering.
A Gift of Fire, 2edChapter 4: Can We Trust the Computer?1 Can We Trust the Computer?
Computing is Socio-Technical or: Why Stakeholder Listing is Inadequate for Thoughtful Ethical Analysis Chuck Huff St. Olaf College For NSF Computer Ethics.
Therac-25 CS4001 Kristin Marsicano. Therac-25 Overview  What was the Therac-25?  How did it relate to previous models? In what ways was it similar/different?
Copyright © 2012 Pearson Education, Inc. All rights reserved. Chapter 4 The Ethics of Manufacturing and Marketing.
CptS 401 Adam Carter. Final Review  Similar structure to the midterm: Multiple choice Matching (probably quotes from Daemon) Essay  Most questions will.
CS 4001Mary Jean Harrold1 Class 20 ŸSoftware safety ŸRest of semester Ÿ11/1 (Thursday) Term paper approach due Ÿ11/13 (Tuesday) Assignment 8 on software.
Chapter 8 Errors, Failures, & Risks. Real Headlines Navigation system directs car into river Data entry typo mutes millions of U.S. pagers Flaws found.
Topic: Reliability and Integrity. Reliability refers to the operation of hardware, the design of software, the accuracy of data or the correspondence.
ERRORS, FAILURES, AND RISK By: Majed Ainaldin Olalekan Alabi.
Randy Modowski Adam Reimel Max Varner COSC 380 May 23, 2011 Accountability.
Chapter 8: Errors, Failures, and Risk Zach Archer Daniel O’Hara Eric Strittmatter.
CHAPTER 9: PROFESSIONAL ETHICS AND RESPONSIBILITIES BY: MATT JENNINGS SHANE CRAKER KYLER RHOADES.
1 Advanced Computer Programming Project Management: Basics Copyright © Texas Education Agency, 2013.
Can We Trust the Computer? FIRE, Chapter 4. What Can Go Wrong? What are the risks and reasons for computer failures? How much risk must or should we accept?
Software Quality and Testing (CSC 4133)
Why study Software Design/Engineering ?
EE 585 : FAULT TOLERANT COMPUTING SYSTEMS B.RAM MOHAN
Software Quality Assurance Lecture 1
Therac-25 Accidents What was Therac-25? Who developed it?
Chapter 5 The Ethics of Manufacturing and Marketing
A Gift of Fire Third edition Sara Baase
PowerPoint® Slides to Accompany
Reliability and Safety
Reliability and Safety
Week 13: Errors, Failures, and Risks
Errors, Failures, & Risks
Reliability and Safety
Chapter 5 The Ethics of Manufacturing and Marketing
Computer in Safety-Critical Systems
A Gift of Fire Third edition Sara Baase
Presentation transcript:

Errors, Failures and Risks CS4020

Overview Failures and Errors in Computer Systems Case Study: The Therac-25 Increasing Reliability and Safety Dependence, Risk, and Progress

Failures and Errors in Computer Systems Most computer applications are so complex it is virtually impossible to produce programs with no errors The cause of failure is often more than one factor Computer professionals must study failures to learn how to avoid them Computer professionals must study failures to understand the impacts of poor work

An Example – Billing Errors Cause: Inaccurate and misinterpreted data in databases –Large population where people may share names –Automated processing may not be able to recognize special cases –Overconfidence in the accuracy of data –Errors in data entry –Lack of accountability for errors

Some more examples of failure AT&T, Amtrak, NASDAQ systems have all had failures –AT&T Wireless Services Inc. executives said yesterday that a massive software failure in November resulted in the inability to sign up several hundred thousand new subscribers –NASDAQ power failures, system software failures on reporting closing price, etc. –Amtrak had system error preventing passengers from buying Amtrak tickets online and at station kiosks across the country for a weekend. Voting system in 2000 presidential election –Irregularities in Florida, wide range of errors, including the insufficient provision of adequate resources, caused a significant breakdown in the state’s plan, which resulted in a variety of problems that permeated the election process in Florida. Large numbers of Florida voters experienced frustration and anger on Election Day as they endured excessive delays, misinformation, and confusion, which resulted in the denial of their right to vote or to have their vote counted.

Some more examples of failure Denver Airport –Mid 90’s, software that controls its automated baggage system malfunctioning. Scheduled for takeoff by last Halloween, the airport's grand opening was postponed until December to allow BAE Automated Systems time to flush the gremlins out of its $193-million system. December yielded to March. March slipped to May. In June the airport's planners, their bond rating demoted to junk and their budget hemorrhaging red ink at the rate of $1.1 million a day in interest and operating costs, conceded that they could not predict when the baggage system would stabilize enough for the airport to open. Ariane 5 Rocket –Ariane 5's first test flight (Ariane 5 Flight 501) on 4 June 1996 failed, with the rocket self-destructing 37 seconds after launch because of a malfunction in the control software, which was arguably one of the most expensive computer bugs in history.

What was the problem??? Denver Airport: Baggage system failed due to real world problems, problems in other systems and software errors Main causes: –Time allowed for development was insufficient –Denver made significant changes in specifications after the project began

High-level Causes of Computer-System Failures Lack of clear, well thought out goals and specifications Poor management and poor communication among customers, designers, programmers, etc. Pressures that encourage unrealistically low bids, low budget requests, and underestimates of time requirements Use of very new technology, with unknown reliability and problems Refusal to recognize or admit a project is in trouble

Saftey- Critical Applications We need to be especially dutiful when creating applications dealing with health and safety. Example: A-320: "fly-by-the-wire" airplanes (many systems are controlled by computers and not directly by the pilots) –Between four planes crashed Air traffic control is extremely complex, and includes computers on the ground at airports, devices in thousands of airplanes, radar, databases, communications, and so on - all of which must work in real time, tracking airplanes that move very fast In spite of problems, computers and other technologies have made air travel safer

Case Study: The Therac-25 Therac-25 Radiation Overdoses: radiation therapy machine produced by Atomic Energy of Canada Limited (AECL) and CGR MeV of France Massive overdoses of radiation were given; the machine said no dose had been administered at all Caused severe and painful injuries and the death of three patients (+) Important to study to avoid repeating errors Manufacturer, computer programmer, and hospitals/clinics all have some responsibility

Case Study: The Therac-25 Software and Design problems: Re-used software from older systems, unaware of bugs in previous software Weaknesses in design of operator interface Inadequate test plan Bugs in software –Allowed beam to deploy when table not in proper position –Ignored changes and corrections operators made at console

Case Study: The Therac-25 Why So Many Incidents? Hospitals had never seen such massive overdoses before, were unsure of the cause Manufacturer said the machine could not have caused the overdoses and no other incidents had been reported (which was untrue) The manufacturer made changes to the turntable and claimed they had improved safety after the second accident. The changes did not correct any of the causes identified later

Case Study: The Therac-25 Why So Many Incidents? (cont.) Recommendations were made for further changes to enhance safety; the manufacturer did not implement them The FDA declared the machine defective after the fifth accident The sixth accident occurred while the FDA was negotiating with the manufacturer on what changes were needed

Case Study: The Therac-25 Observations and Perspective: Minor design and implementation errors usually occur in complex systems; they are to be expected The problems in the Therac-25 case were not minor and suggest irresponsibility Accidents occurred on other radiation treatment equipment without computer controls when the technicians: –Left a patient after treatment started to attend a party –Did not properly measure the radioactive drugs –Confused micro-curies and milli-curies

If you were a judge who had to assign responsibility in this case, how much responsibility would you assign to the programmer, the manufacturer, and the hospital or clinic using the machine? Post your answers to the Discussion Board Discussion Question

Increasing Reliability and Safety What goes Wrong? Design and development problems Management and use problems Misrepresentation, hiding problems and inadequate response to reported problems Insufficient market or legal incentives to do a better job Re-use of software without sufficiently understanding the code and testing it Failure to update or maintain a database

Making it Better: Professional Techniques Importance of good software engineering and professional responsibility User interfaces and human factors –Feedback –Should behave as an experienced user expects –Workload that is too low can lead to mistakes Redundancy and self-checking Testing –Include real world testing with real users

Law, Regulation and Markets Make it better by: Criminal and civil penalties –Penalize problems but provide incentives to produce good systems, but shouldn't inhibit innovation. Warranties for consumer software –Most are sold ‘as-is’ Regulation for safety-critical applications –Hard to do, but, could save failures Professional licensing –Arguments for and against Taking responsibility

Dependence, Risk, and Progress Are We Too Dependent on Computers? –Computers are tools –They are not the only dependence Electricity Risk and Progress –Many new technologies were not very safe when they were first developed –We develop and improve new technologies in response to accidents and disasters –We should compare the risks of using computers with the risks of other methods and the benefits to be gained

Err.3) Do you believe we are too dependent on computers? Why or why not? Err.4) In what ways are we safer due to new technologies? Post your answers to the discussion board. Discussion Questions