TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY

Slides:



Advertisements
Similar presentations
An advanced weapon and space systems company 1 23 rd ISSC/NWSSS Conference 23 rd ISSC/NWSSS Conference C. Forni, B. Blake – Remote Controlled.
Advertisements

Assumptions underlying regression analysis
Principles of Engineering System Design Dr T Asokan
Chapter 8 Fault Tolerance
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 2.
Software Fault Tolerance – The big Picture RTS April 2008 Anders P. Ravn Aalborg University.
CS 603 Failure Models April 12, Fault Tolerance in Distributed Systems Perfect world: No Failures –W–We don’t live in a perfect world Non-distributed.
Modified from Sommerville’s originals Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
Soft. Eng. II, Spr. 2002Dr Driss Kettani, from I. Sommerville1 CSC-3325: Chapter 9 Title : Reliability Reading: I. Sommerville, Chap. 16, 17 and 18.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Developing Dependable Systems CIS 376 Bruce R. Maxim UM-Dearborn.
©Ian Sommerville 2006Critical Systems Slide 1 Critical Systems Engineering l Processes and techniques for developing critical systems.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 2 Slide 1 Systems engineering 1.
SAFE 605: Principles of Safety Engineering Overview of Safety Engineering Safety Engineering Concepts.
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
Software Reliability Categorising and specifying the reliability of software systems.
2. Fault Tolerance. 2 Fault - Error - Failure Fault = physical defect or flow occurring in some component (hardware or software) Error = incorrect behavior.
1 Chapter 2 Socio-technical Systems (Computer-based System Engineering)
LSST Workshop Bremerton, WA August, LSST Workshop Bremerton, WA August, 2015 Camera Protection System Martin Nordby Chief Mechanical Engineer LSST.
Chapter 4 The Ethics of Manufacturing and Marketing
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Slide 1 Critical Systems Specification 2.
1 Chapter 3 Critical Systems. 2 Objectives To explain what is meant by a critical system where system failure can have severe human or economic consequence.
Secure Systems Research Group - FAU 1 A survey of dependability patterns Ingrid Buckley and Eduardo B. Fernandez Dept. of Computer Science and Engineering.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 3 Slide 1 Critical Systems 1.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
PSP Quality Strategy [SE-280 Dr. Mark L. Hornick 1.
Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Adaptive control and process systems. Design and methods and control strategies 1.
1 Safety - definitions Accident - an unanticipated loss of life, injury, or other cost beyond a pre-determined threshhold.  If you expect it, it’s not.
Quality Assurance.
CprE 458/558: Real-Time Systems
Safety-Critical Systems 7 Summary T V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Slide 1 Critical Systems Specification 1.
Effective State Awareness Information is Enabling for System Prognosis Mark M. Derriso Advanced Structures Branch Air Vehicles Directorate Air Force Research.
Nonbehavioral Specifications Non-behavioral Characteristics Portability Portability Reliability Reliability Efficiency Efficiency Human Engineering.
©Ian Sommerville 2007Design for Failure Slide 1 Design for Failure The dependability challenge for inter-organisational systems.
Fixing the Defect CEN4072 – Software Testing. From Defect to Failure How a defect becomes a failure: 1. The programmer creates a defect 2. The defect.
1 Fault-Tolerant Computing Systems #1 Introduction Pattara Leelaprute Computer Engineering Department Kasetsart University
©Ian Sommerville 2000Dependability Slide 1 Chapter 16 Dependability.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4a) Department of Electrical.
Failure Modes and Effects Analysis (FMEA)
Failure Modes, Effects and Criticality Analysis
Can We Trust the Computer? FIRE, Chapter 4. What Can Go Wrong? What are the risks and reasons for computer failures? How much risk must or should we accept?
Week#3 Software Quality Engineering.
Chapter 6 - Modern Concepts of Accident Prevention
Fault Trees.
Risk management Be aware. Take care.
Critical systems design
Hardware & Software Reliability
Albert M. K. Cheng Embedded Real-Time Systems
SYSTEM SAFETY AND THE TECHNICAL AUTHOR
Fault Tolerance & Reliability CDA 5140 Spring 2006
Verification and Testing
What are the Common Warning Signs of Faulty Oil Pressure Sensor.
Design for Quality Design for Quality and Safety Design Improvement
Fault Tolerance Distributed Web-based Systems
System Testing.
Chapter 10 Multiprocessor and Real-Time Scheduling
An Introduction to Debugging
Computer in Safety-Critical Systems
Definitions Cumulative time to failure (T): Mean life:
Seminar on Enterprise Software
Presentation transcript:

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY Thu, 08 Jan 87 11:29: […] I question whether assigning a monetary value to human life would provide additional insight into the management of risks. I am not convinced that we know how to predict risks, particularly unlikely ones, with any degree of confidence. I would hate to see a $500K engineering change traded off against a loss of 400 $1M with a 10E-9 expected probability. I'm afraid reducing the problem to dollars could tend to obscure the real issues. Moreover, even if the analyses were performed correctly, the results could be socially unacceptable. I suspect that in the case of a spacecraft, or even a military aircraft, the monetary value of the crew's lives would be insignificant in comparison with other program costs, even with a relatively high hazard probability. In the case of automobile recalls, where the sample size is much larger, the manufacturers may already be trading off the cost of a recall against the expected cost of resulting lawsuits, although I hope not.

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY Making it work

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY Failures Catastrophic –Serious consequences Major –Incorrect operation –Possibly recoverable Minor –Inconvenience Not noticed

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY Fault Tolerance Steps 1/3 Fault Detection –The process of determining that a fault has occurred Diagnosis –The process of determining what caused the fault, or exactly which subsystem or component is faulty Containment –The process that prevents the propagation of faults from their origin at one point in a system to a point where it can have an effect on the service to the user Source:

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY Fault Tolerance Steps 2/3 Masking –The process of insuring that only correct values get passed to the system boundary in spite of a failed component. Compensation –If a fault occurs and is confined to a subsystem, it may be necessary for the system to provide a response to compensate for output of the faulty subsystem. Source:

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY Fault Tolerance Steps 3/3 Repair –The process in which faults are removed from a system. In well-designed fault tolerant systems, faults are contained before they propagate to the extent that the delivery of system service is affected. This leaves a portion of the system unusable because of residual faults. If subsequent faults occur, the system may be unable to cope because of this loss of resources, unless these resources are reclaimed through a recovery process which insures that no faults remain in system resources or in the system state. Source:

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY Buzzwords Fault Tolerance Robust Computing Fail-Safe Intrinsically safe

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY Mechanisms Defensive Design –Prevent faults in the first place Fault tolerance/Robustness –Can operate in an imperfect situation Fail-Safe –Limit the consequences of a failure

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY Redundancy Design the system with multiple instances of critical units in such a manner that the failure of some of these units does not directly fail the entire system. –No single point of failure

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY Limits When a range of values is physically possible, use a subset for safety –Soft Indicator when recommended values are exceeded –Hard for use when exceeding the limits would damage the system

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY Interlocks Mechanical –one part cannot move until another does Software –semaphores

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY Sanity checks A mechanism for the system to ensure correct operation Related to Interlocks and Limits ’does this make sense here’

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY Safe start-up and shutdown When electronic devices devices are activated, they are by nature in a random state until forced into a desired state Be proactive and make sure instead of just assuming things to be as needed

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY Calibration Factory calibration is useful only for a limited time Instruments drift due to: –temperature –loading –pressure –age Self calibration –useful in a controlled fashion

TEKNILLINEN KORKEAKOULU HELSINKI UNIVERSITY OF TECHNOLOGY Testing You cannot test enough You can test too much You can test wrong You can think wrong But you must test