Safety in large technology systems October, 1999.

Slides:



Advertisements
Similar presentations
Integra Consult A/S Safety Assessment. Integra Consult A/S SAFETY ASSESSMENT Objective Objective –Demonstrate that an acceptable level of safety will.
Advertisements

INTRODUCTION TO MODELING
Auditing Concepts.
Safety System & Scene. Overview Safety Terms Hazards Hazard Lists Worst Case Conditions Hazard Characteristics Analysis Sumary.
Leadership Development Nova Scotia Public Service
Ken KUSUKAMI Director Safety Research Laboratory Research and Development Center of JR East Group East Japan Railway Company Development of human factors.
Plan for Today: 1. Wrap-up of points from Sagan & Waltz debate. 2. Evaluation of decisionmaking approaches. 3. Introduction to constructivism.
Modern Techniques of Accident Investigation C.Jayasuriya, S.V.Karthikeyan and S.E.Kannan IGCARKalpakkam.
Reliability and Safety Lessons Learned. Ways to Prevent Problems Good computer systems Good computer systems Good training Good training Accountability.
1 An Overview of Computer Security computer security.
Root Cause Analysis Presented By: Team: Incredibles
Also Known as: “The Complete Method of Creative Problem Solving”
Road Safety Audits Ghazwan al-Haji PhD student ”On whats goes wrong in road design and how to put it right safely”
SWE Introduction to Software Engineering
Title slide PIPELINE QRA SEMINAR. PIPELINE RISK ASSESSMENT INTRODUCTION TO GENERAL RISK MANAGEMENT 2.
Software Project Risk Management
Bureau of Workers’ Comp PA Training for Health & Safety (PATHS)
Hazards Analysis & Risks Assessment By Sebastien A. Daleyden Vincent M. Goussen.
CRITICAL THINKING AND THE NURSING PROCESS
CIS 376 Bruce R. Maxim UM-Dearborn
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 2 Slide 1 Systems engineering 1.
Occupational Road Risk Health and safety issues for vehicles and drivers Mike Lewis MIOSH, RSP.
Protection Against Occupational Exposure
Chapter 8.  Network Management  Organization Management  Risk Assessment & Management  Service Management  Performance Management  Problem Management.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
An Introduction to AlarmInsight
Chapter 3 – Agile Software Development 1Chapter 3 Agile software development.
Topic 5 Understanding and learning from error. LEARNING OBJECTIVE Understand the nature of error and how health care can learn from error to improve patient.
A New TRIC An audit tool for multi-patient environments An audit tool for multi-patient environments Sue Ieraci 2013.
George Firican ICAO EUR/NAT Regional Officer Almaty, 5 to 9 September 2005 SAFETY MANAGEMENT SYSTEMS.
Software Safety CS3300 Fall Failures are costly ● Bhopal 1984 – 3000 dead and injured ● Therac – 6 dead ● Chernobyl / Three Mile.
Dimitrios Christias Robert Lyon Andreas Petrou Dimitrios Christias Robert Lyon Andreas Petrou.
Management & Development of Complex Projects Course Code MS Project Management Perform Qualitative Risk Analysis Lecture # 25.
Socio-technical Systems (Computer-based System Engineering)
Chapter 16 Problem Solving and Decision Making. Objectives After reading the chapter and reviewing the materials presented the students will be able to:
PREPARED BY MS. ROSITA ARMAN MICHAEL ANNIAH MBA IN STRATEGIC MANAGEMENT (UTM) BA. ESTATE MANAGEMENT (UITM)
Liability Issues for TRIO Programs Managing Your Project’s Risk.
VIRTUAL WORLDS IN EDUCATIONAL RESEARCH © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
11th International Symposium Loss Prevention 2004 Prague Ľudovít JELEMENSKÝ Department of Chemical and Biochemical Engineering, STU BRATISLAVA, SLOVAKIA.
Copyright 2012 Delmar, a part of Cengage Learning. All Rights Reserved. Chapter 9 Improving Quality in Health Care Organizations.
1 Safety - definitions Accident - an unanticipated loss of life, injury, or other cost beyond a pre-determined threshhold.  If you expect it, it’s not.
Hazard Identification
Risk management and disaster preparedness
Quality Assurance.
Safety-Critical Systems 7 Summary T V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Slide 1 Critical Systems Specification 1.
1 SWE 513: Software Engineering People II. 2 Future Experience What will you be doing one year from now? Ten years from now?
SOFTWARE ENGINEERING. Objectives Have a basic understanding of the origins of Software development, in particular the problems faced in the Software Crisis.
University of Virginia Department of Computer Science Complex Systems and System Accidents presented by: Joel Winstead.
International Atomic Energy Agency Regulatory Review of Safety Cases for Radioactive Waste Disposal Facilities David G Bennett 7 April 2014.
Smart Home Technologies
CRITICAL THINKING AND THE NURSING PROCESS Entry Into Professional Nursing NRS 101.
New Supervisors’ Guide To Effective Supervision
SAFEWARE System Safety and Computers Chap18:Verification of Safety Author : Nancy G. Leveson University of Washington 1995 by Addison-Wesley Publishing.
The SIPDE and Smith System “Defensive Driving Techniques”
Professional Ethics and Responsibilities
Systems Engineering (Sistem Mühendisliği) Doç. Dr. A. Egemen YILMAZ Ankara Üniversitesi Elektrik-Elektronik Müh. Bölümü
Topic: Reliability and Integrity. Reliability refers to the operation of hardware, the design of software, the accuracy of data or the correspondence.
UNIT III. A managerial problem can be described as the gap between a given current state of affairs and a future desired state. Problem solving may then.
Slide #13-1 Design Principles CS461/ECE422 Computer Security I Fall 2008 Based on slides provided by Matt Bishop for use with Computer Security: Art and.
Process Safety Management Soft Skills Programme Nexus Alliance Ltd.
PST Human Factors Jan Shaw Manchester Royal Infirmary CMFT.
1 Design and evaluation methods: Objectives n Design life cycle: HF input and neglect n Levels of system design: Going beyond the interface n Sources of.
Safety and Security Management Fundamental Concepts
Recognization and management of RISK in educational projects
Air Carrier Continuing Analysis and Surveillance System (CASS)
Regulatory Oversight of HOF in Finland
Computer in Safety-Critical Systems
Presentation transcript:

Safety in large technology systems October, 1999

Technology failure Why do large, complex systems sometimes fail so spectacularly? Do the easy explanations of “operator error,” “faulty technology,” or “complexity” suffice? Are there managerial causes of technology failure? Are there design principles and engineering protocols that can enhance large system safety? What is the role of software in safety and failure?

Decision-making about complex technology systems l the scientific basis of technology systems l How do managers make intelligent decisions about complex technologies? l managers, scientists, citizens l example: Star Wars anti-missile systems l Note: the technical specialist often does not make the decision; so the persuasive power of good scientific communication is critical.

Goal for technology management l A central problem for designers, policy makers, and citizens, then, is how to avoid large-scale failures when possible, through appropriate design, and how to plan for minimizing the consequences of those failures which will inevitably occur.

Information and decision- making l Information flow and management of complex technology systems l complex organizations pursue multiple objectives simultaneously l complex organizations pursue the same objective along different and conflicting paths

Surprising failures l Franco-Prussian war, Israeli intelligence failure in Yom Kippur war l The Mercedes “A” vehicle sedan and the moose test l Chernobyl nuclear power meltdown

Therac-25 l high energies l computer control rather than electro- mechanical control l positioning the turntable: x-ray beam flattener l 15,000 rad administered rather than 200 rad

Technology failure l sources of failure n management failures n design failures n proliferating random failures n “storming” the system l design for “soft landings” l crisis management

Causes of failure l Complexity and multiple causal pathways and relations l defective procedures l defective training systems l “human” error l faulty design

Causes of failure l “The causes of accidents are frequently, if not almost always, rooted in the organization--its culture, management, and structure. These factors are all critical to the eventual safety of the engineered system” (Leveson, 47).

Varieties of failure l routine failures, stochastic failures, design failures, systemic failures, interactive failures, “horseshoe nail” failures l vulnerability of modern technologies to software failure -- Euro, 2000 bug, air traffic control failures

Sources of potential failure l hardware interlocks replaced with software checks on turntable position l cryptic malfunction codes; frequent messages l excessive operator confidence in safetysystems l lack of effective mechanism for reporting and investigating failures l poor software engineering practices;

Organizational factors l “Large-scale engineered systems are more than just a collection of technological artifacts: They are a reflection of the structure, management, procedures, and culture of the engineering organization that created them, and they are also, usually, a reflection of the society in which they were created” (Leveson, 47).

Design for safety l hazard elimination l hazard reduction l hazard control l damage reduction

Aspects of design l the technology -- machine, vehicle, software system, airport l the management structure -- locus of decision-making l the communications system -- transmission of critical and routine information within the organization l training of workers for task -- performance skills, safety procedures

Information and decision-making l Information flow and management of complex technology systems l complex organizations pursue multiple objectives simultaneously l complex organizations pursue the same objective along different and conflicting paths

System safety l builds in safety, not simply adding it on to a completed design l deals with systems as a whole rather than subsystems or components l takes a larger view of hazards than just failures l emphasizes analysis rather than past experience and standards

System safety (2) l emphasizes qualitative rather than quantitative approaches l recognizes the importance of tradeoffs and conflicts in system design l more than just system engineering

Hazard analysis l development: identify and assess potential hazards l operations: examine an existing system to improve its safety l licencing: examine a planned system to demonstrate acceptable safety to a regulatory authority

Hazard analysis (2) l construct an exhaustive inventory of hazards early in design l classify by severity and probability l construct causal pathways that lead to hazards l design so as to eliminate, reduce, control, or ameliorate

Better software design l design for the worst case l avoid “single point of failure” designs l design “defensively” l investigate failures carefully and extensively l look for “root cause,” not symptom or specific transient cause l embed audit trails; design for simplicity

Safe software design l control software should be designed with maximum simplicity (408) l design should be testable; limited number of states l avoid multitasking, use polling rather than interrupts l design should be easily readable and understood

Safe software (2) l interactions between components should be limited and straightforward l worst-case timing should be determinable by review of code l code should include only the minimum features and capabilities required by the system; no unnecessary or undocumented features

Safe software (3) l critical decisions (launch a missile) should not be made on values often taken by failed components -- 0 or 1. l Messages should be designed in ways to eliminate possibility of compute hardware failures having hazardous consequences (missile launch example)

Safe software (4) l strive for maximal decoupling of parts of a software control system l accidents in tightly coupled systems are a result of unplanned interactions l the flexibility of software encourages coupling and multiple functions; important to resist this impulse.

Safe software (5) l “Adding computers to potentially dangerous systems is likely to increase accidents unless extra care is put into system design” (411).

Scope and limits of simulations l Computer simulations permit “experiments” on different scenarios presented to complex systems l Simulations are not reality l Simulations represent some factors and exclude others l Simulations rely on a mathematicization of the process that may be approximate or even false.

Human interface considerations l unambiguous error messages (Therac 25) l operator needs extensive knowledge about the “theory” of the system l alarms need to be comprehensible (TMI); spurious alarms minimized l operator needs knowledge about timing and sequencing of events l design of control board is critical

Control panel anomalies

Risk assessment and prediction l What is involved in assessing risk? n probability of failure n prediction of consequences of failure n failure pathways

Reasoning about risk l How should we reason about risk? l Expected utility: probability of outcome x utility of outcome l Probability and science l How to anticipate failure scenarios?

Compare scenarios l nuclear power vs coal power l automated highway system vs routine traffic accidents

Ordinary reasoning and judgment l well-known “fallacies” of ordinary reasoning: n time preference n framing n risk aversion

large risks and small risks l the decision-theory approach: minimize expected harms l the decision-making reality: large harms are more difficult to absorb, even if smaller in overall consequence l example: JR West railway

The end