Chapter 8: Errors, Failures, and Risk Zach Archer Daniel O’Hara Eric Strittmatter
Overview Errors and Failures – Problems for Individuals – Failures that Affect Populations – Problems in Safety – Critical Applications Therac – 25: A Case Study Increasing Reliability and Safety – Reuse of Software – Failure to Update – Professional Techniques – Law, Regulation, and Markets Discussion
Errors and Failures Many factors – Faulty Interface – Sloppy Implementation – Careless/Insufficiently Trained Users – Poor User Interface
Categorization Problems for Individuals – Generally as consumers Failures that Affect Populations – Costs large amounts of money Problems in Safety – Critical Applications – May injure or kill
Problems for Individuals Billing Errors – Programming based Limits Variable representation Database Inaccuracy – Management based Updating data Poor data consistency
Failures that Affect Populations Communications – Software Updates – Device Dependency Business – Data Loss – Inverse Affects – Marketing Dishonesty “The honest computing professional will not make deliberately false or deceptive claims about a system or system design”
Failures that Affect Populations (cont’d) Voting Systems – Data Leaks – Verification – System Hacking Airports – Large Complexity – Overconfidence
Abandoned and Legacy Systems Abandoned Systems – Costs ~40 million – 4 billion – “Hopelessly Inadequate” – 5% – 15% per 1 Trillion dollars Legacy Systems – Old Software and Hardware on New Systems – Generally Creates Problems Original Programmers Gone Documentation Lost Obscure variable names Extinct Language
Problems in Safety – Critical Applications Air Traffic Control – Automated Airplanes Unexpected behavior Pilots vs. system
Trust and Acceptance Computers Do Help… – Ground-Proximity Warning System – Traffic Collision Avoidance System Drawing the Line – Error Ratio Computer creates 10,000 incorrect checks per day – Higher When Dealing with Human Life? – “Well-intended actions, including those that accomplish assigned duties, may lead to harm unexpectedly. In such an event the responsible person or persons are obligated to undo or mitigate the negative consequences as much as possible.”
Therac-25 Case Study
Therac-25 Classic case study – Deadly software failure – Radiation treatment machine Software controlled Cancer treatment
Therac-25 Manufacturer – Atomic Energy of Canada Limited (AECL) Government corporation
Therac – 4 different medical centers – Massive overdose of radiation to 6 patients 13,000 to 25,000 rads given 100 to 200 intended Multiple doses due to display error 3 dead
Therac-25 Factors – Safety design – Insufficient testing – Bugs – Inadequate reporting and investigating
Therac-25 Design Flaws – Manufacture oversight -Malfunctioned frequently -Generally under doses -Operators were used to errors -Overlooked -Operator Interface -Number of issues
Therac-25 Design Flaws – Earlier versions (Therac-6, 20) Hardware safety mechanisms – Independent of computer – First fully computer controlled Used same software as the earlier models – Assumed to be safe Frequent shutdowns, blown fuses – Some bugs
Therac-25 Why Study? – Avoid repeating History Panama (2000) – Different machine similar issue – 28 overdoses, several deaths – Risk Assessment and Ethical Questions
Therac-25 Stakeholders?
Therac-25 Manufacturer Government Agencies Hospitals/Physicians Patients Family
Therac-25 Responsibility?
Therac-25 Software Developers System Engineers Physicians AECL Government Agencies
Therac-25 Risk?
8.3 Increasing Reliability and Safety What Goes Wrong – Two General reasons Job is to difficult Job is done poorly – Now interact with the real world Complex communications Unpredictable humans Numerous features
Overconfidence Developers and users need to appreciate the risk Backing up files A320 airplane Two programming teams Unrealistic reliability or safety estimates Carelessness
The Reuse of software Ariane 5 rocket – Veered off course – Rocket and satellites destroyed – 500 million dollars “No Fly” list
Failure to Update Failure to update information in databases – FBI database Dose not indicate whether a suspect was convicted – Foreign visitors databases Screening for terrorist Visitors stay longer than legally permitted No way for visitors to check out
User interface and human factors Good interfaces help avoid common problems Word processor User interfaces need clear instructions and error messages American Airlines Flight 965 Autopilot interface – Feedback needed – Behave like the user – Low workload is dangerous
Testing Well-planned testing of software is the most important thing Challenger space shuttle NASA called for independent Beta testing
Law, Regulation, and Markets Criminal and Civil penalties – Therac-25 Victims sued and settled out of court “catastrophic” financial system Credit reports – Limited to the money paid for the software – Many liability laws and criminal laws Help to produce good systems
Warranties for consumer software “shrink-wrap” or “click-on” Take software as-is No guarantee
Taking Responsibility Many companies pay customers for problems United Airlines Consumers can protect themselves Reviews
What are your thoughts Are we too dependent on computers Should there be mandatory licensing of software developers Should software have warranties
References Baase, S. (2008), A Gift of Fire, 3 rd Edition, Pearson Education Inc. ACM (1992), ACM Code of Ethics and Professional Conduct, Accessed: 4/16/2012. Coker, R. (2012), Google Chrome and SE Linux, chrome-and-se-linux/, Accessed: 4/16/2012.
Thank You Any Questions?