March 30, 2001 Cmpt. 490 - System Error System Error: The ValuJet Disaster On May 11, 1996, Flight 592, a DC-9 owned by ValuJet, took off from Miami destined.

Slides:



Advertisements
Similar presentations
HOW SAFE IS IT TO WORK IN AN SME? GREAT TO FOCUS ON THE GOOD PRACTICES NECESSARY TO FOCUS ON THE BAD PRACTICES Most accidents happen in SMEs A responsible.
Advertisements

Prescriptive Process models
PROTOCOL VERIFICATION & PROTOCOL VALIDATION. Protocol Verification Communication Protocols should be checked for correctness, robustness and performance,
Anatomy of 4GL Disaster CS524 - Software Engineering I Fall I, Sheldon X. Liang, Ph.D. Nathan Scheck CS524 - Software Engineering I Fall I, 2007.
SEP1 - 1 Introduction to Software Engineering Processes SWENET SEP1 Module Developed with support from the National Science Foundation.
Chapter 1 You are the driver
SYSTEMS-THEORETIC ACCIDENT MODEL AND PROCESSES (STAMP) APPLIED TO DESIGN A SAFETY-DRIVEN CONCEPT OF AN AIR NAVIGATION SERVICE PROVIDER (ANSP)
Workshop on Machine Protection, focusing on Linear Accelerator complexes Summary of Fifth Session – Operational Aspects 1)RF Breakdown recovery 2)CLIC.
Chapter 7: Risk, Safety and Liability in Engineering
Sabre Airline Solutions
Innovation and IS Kieran Mathieson. What is Innovation?  Long definition Successful innovation is the creation and implementation of new processes, products,
Challenges of Bringing Information Markets to the Organization.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 2 Slide 1 Socio-technical Systems.
Xtreme Programming. Software Life Cycle The activities that take place between the time software program is first conceived and the time it is finally.
Investigation of the Crash of ValuJet Flight 592, the Relevant Stakeholders, Crisis System, and Crisis Mechanisms Matt Williams Dr. Petkov LAC 130.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 2 Slide 1 Systems engineering 1.
Copyright 1999 all rights reserved Overview of HCI n What is Human-Computer Interaction? n Why should an Information Scientist be concerned with Human-Computer.
Models and Designs Investigation 1.  Label your new section Models and Designs  Draw pictures of a “model” and “design”
Software Evolution Planning CIS 376 Bruce R. Maxim UM-Dearborn.
Pilots By: Abigail Blair.
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
Lecture # 22 Software Evolution
Incident Review Meeting Example  The next slides are an example of how to complete the template and identify latent conditions, threats, errors, UAS and.
Mantova 18/10/2002 "A Roadmap to New Product Development" Supporting Innovation Through The NPD Process and the Creation of Spin-off Companies.
Striving For Safety Excellence HunterDouglas Corporate Environmental, Safety, Risk Management Emergency Preparedness / Fire Extinguisher Safety March 2010.
Populations Chapter 8.
Nabil dmaidi1 Bid Submission Documents A LATE BID IS NO BID.
Mission Aircrew School Chapter 14: Crew Resources Management (March 2011)
Software Project Failure Software Project Failure Night Two, Part One CSCI 521 Software Project Management.
Problems and Solutions To Air Traffic Controllers Joshua, Miguel, Jesus.
Problems and Solutions To Air Traffic Controllers Joshua Miguel.
Software Component Technology and Component Tracing CSC532 Presentation Developed & Presented by Feifei Xu.
Lecture 13: Broader Engineering Perspectives EEN 112: Introduction to Electrical and Computer Engineering Professor Eric Rozier, 4/8/13.
Nuclear Power as a High Risk System And the Accident at Three Mile Island Discussing Perrow Chapters 1 and 2 Presented by Gus Scheidt Friday the Thirteenth.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 8 Slide 1 Software Prototyping l Rapid software development to validate requirements.
PROTECT LEGAL AND REGULATORY REVIEW NOVEMBER 2014.
1.NET Web Forms Business Forms © 2002 by Jerry Post.
A BRIEF LESSON PROCEDURE 1.PRE-TEACH VOCABULARY 2.WRITTEN PRACTICE OF VOCABULARY 3.ORAL PRACTICE OF VOCABULARY 4.READING TO FIND SPECIFIC INFORMATION 5.FURTHER.
1 IE 590D Applied Ergonomics Lecture 26 – Ergonomics in Manufacturing & Automation Vincent G. Duffy Associate Prof. School of IE and ABE Thursday April.
Week 3 Outline Post-Mortem By: Jamaral Johnson. 2 After Actions Review In this presentation I will do my best to highlight what went wrong. This is just.
Requirement Handling
1 Safety - definitions Accident - an unanticipated loss of life, injury, or other cost beyond a pre-determined threshhold.  If you expect it, it’s not.
Object Oriented Software Development
Lecture 9: Ecological Factors & Aircraft Performance AIRCRAFT WEIGHT & PERFORMANCE.
Software Evolution Program evolution dynamics Software maintenance Complexity and Process metrics Evolution processes 1.
Software Prototyping Rapid software development to validate requirements.
University of Virginia Department of Computer Science Complex Systems and System Accidents presented by: Joel Winstead.
Extreme programming (XP) Variant of agile Takes commonsense practices to extreme levels © 2012 by Václav Rajlich1.
By: Maya Valerio & Lauren Mariani. US Airways flight 1549 departed from La Guardia Airport. Complications occurred shortly after takeoff when both engines.
CS223: Software Engineering Lecture 2: Introduction to Software Engineering.
Sequence of a Disaster. Period of Warning Warning signs of possible disaster occur People search for certainty, for some answers about these signs Some.
Air Traffic Controller
Crew Resources Management
Welcome to Software Project Management. CONVENTIONAL SOFTWARE MANAGEMENT The BEST and WORST thing about software is its flexibility. 1.Software development.
The Second Industrial Revolution America Mechanizes
Determining the Root Cause and Corrective Action of a Problem World Class Solutions for Global Applications Riverhawk.
INTRODUCTION Mehmet Sait Andaç Web: Office: 431.
Toolbox Meetings What is a toolbox meeting? An informal 5 to 15 minute meeting held by supervisors used to promote safety.
Unit 1, Lesson 2 Trends in Customer Service in Hospitality and Tourism since 1930 AOHT Delivering Great Customer Service Copyright © 2007–2014 National.
Air Line Pilots Association, International EMB 145 Oral Questions Emergency/Abnormal Procedures Select “Slide Show” Use the space bar on your computer.
Ashima Wadhwa.  Probably the most time-consuming project management activity.  Continuous activity from initial concept through to system delivery.
Ford & Firestone’s Tire Recall
Air Freight Shipping Tips
An-Najah National University
Human Factors and Flight Physiology
Human Factors Training
Software Testing and Maintenance Maintenance and Evolution Overview
Chapter 9 – Software Evolution and Maintenance
Introduction to ISO & The Quality Process.
Week 13: Errors, Failures, and Risks
Presentation transcript:

March 30, 2001 Cmpt System Error System Error: The ValuJet Disaster On May 11, 1996, Flight 592, a DC-9 owned by ValuJet, took off from Miami destined for Atlanta with 110 on board 10 minutes after take-off it crashed into a swamp in the Everglades, killing everybody this is a classic example of a crash caused by system error we will analyze this and try to draw lessons for information systems in safety critical applications great Atlantic Monthly article: William Langewiesche, “The Lessons of ValuJet 592”, 281, 3, March 1998, pp ; also available at also read Charles Perrow, “Normal Accidents: Living With High-Risk Technologies”, 1984 Scott Sagan, “The Limits of Safety: Organizations, Accidents, and Nuclear Weapons”, 1993

March 30, 2001 Cmpt System Error Chronology of Events ValuJet contracted out a maintenance job on 3 of its MD-80’s to SabreTech, a maintenance firm based at Miami airport one of the jobs was to replace the chemical oxygen generators on board, at the end of their licensed lifetime (when lanyard is pulled two chemicals combine, producing heat and oxygen channeled to a mask so passengers can breathe during depressurization) they were stacked in unmarked cardboard boxes with their lanyards cutoff, but without required safety caps over their firing pins (the caps were not available) nevertheless, SabreTech mechanics signed paperwork certifying that they had capped the generators (among many other things) supervisors also signed off on the work many weeks passed, and eventually in early May a SabreTech manager ordered a shipping clerk to clean up the area in preparation for inspection by Continental Airlines, a potential customer for SabreTech

March 30, 2001 Cmpt System Error Chronology (continued) the shipping clerk re-distributed the cannisters, added bubble wrap, sealed the boxes, addressed them to ValuJet headquarters in Atlanta, and labelled them “aircraft parts”; and then added 3 tires to the cargo to be shipped next day he asked a co-worker to add the notation “Oxy Canisters” and “Empty” to the boxes on May 11 the SabreTech driver delivered the cargo to the ValuJet area where the ramp agent accepted it for shipment on Flight 592 even though shipping such material was forbidden the ramp agent and 592’s co-pilot discussed weight distribution and where to put the cargo, and eventually put it in the forward cargo hold during or shortly after take-off one of the canisters ignited a few minutes later fire engulfed the cargo hold and the cabin, smoke filled the cockpit, and the plane spiralled into the swamp, killing everybody

March 30, 2001 Cmpt System Error Factors in the Accident de-regulation in the industry generated intense price competition, rapid growth, lots of new employees, much contracting out, low salaries FAA inspection regime couldn’t keep up - only 3 inspectors assigned to ValuJet, who nevertheless did identify serious problems caused by too rapid growth: a more serious inspection was underway at the time of the accident, but too late SabreTech mechanics were ignorant of the functioning of the oxygen generators, and didn’t realize how serious the shipping caps were to safety: the manuals were written in “engineerspeak” and anyway, the mechanics were highly stressed with other duties and not focussed on what happened after they removed the generators the caps were not available, and the mechanics signed off anyway, both examples of a collective relaxation called by sociologist Diane Vaughan “the normalization of deviance”: the real Murphy’s Law is “what can go wrong usually goes right”

March 30, 2001 Cmpt System Error Factors (continued) there were disconnects among the various people: mechanics never thought the canisters would be shipped, co-pilot and ramp agent did not realize they were not capped the shipping clerk was chosen to “clean up” the area, and he naturally thought in terms of shipping the cargo rather than other alternatives communication problems: confusion by shipping clerk who thought that being out of service meant that the canisters were “Empty”; mechanics who did not realize that “expired” canisters were not “expended”, so warnings in the manuals about unexpended canisters were meaningless multitudinous procedure manuals, rules, and regulations for all concerned: hard to read, impossible to remember, often don’t match reality of day to day job pressures and necessities, breed resentment and rebellion by employees

March 30, 2001 Cmpt System Error Analysis of Errors three kinds of errors –procedural error (eg. pilot error) –engineered error (eg. mechanical failure) –system error (as in ValuJet) system errors “control and operation of some of the riskiest technologies require organizations so complex that serious failures are virtually guaranteed to occur” (paraphrased from Perrow, 1984) “safety ultimately involves a blizzard of small judgments” –interactive complexity “many elements … linked in multiple and often unpredictable ways … cascading failures can accelerate out of control” –tight coupling “lack of slack” many examples: Chernobyl, Three Mile Island, close calls on SAC false positives, Cuban missile crisis, many disasters

March 30, 2001 Cmpt System Error Lessons for Information Technology big software is highly complex, a definitional example of interactive complexity and tight coupling IT is embedded in bigger social or technical systems which are themselves complex there are even more complex interactions between the IT and the system in which it is embedded fallible humans are “in the loop” everywhere: design, implementation, testing, and after deployment IT is brittle, but so fast that humans cannot easily intervene when it goes wrong IT-based systems are constantly evolving and changing, adding still further to the complexity and diminishing the ability of humans to understand and monitor it perhaps error is inevitable, regardless of controls and regimes attempting to eliminate it: sometimes the control regimes themselves cause the errors