CSE 466 – Fall 2000 - Introduction - 1 Safety  Example  Terms and Concepts  Safety Architectures  Safe Design Process  Software Specific Stuff  Sources.

Slides:



Advertisements
Similar presentations
Integra Consult A/S Safety Assessment. Integra Consult A/S SAFETY ASSESSMENT Objective Objective –Demonstrate that an acceptable level of safety will.
Advertisements

Development of Tools for Risk Assessment and Risk Communication for Hydrogen Applications By Angunn Engebø and Espen Funnemark, DNV ICHS, Pisa 09. September.
Fault-Tolerant Systems Design Part 1.
EECE499 Computers and Nuclear Energy Electrical and Computer Eng Howard University Dr. Charles Kim Fall 2013 Webpage:
Chapter 21: Product Issues Design of Biomedical Devices and Systems By: Paul H. King Richard C. Fries.
Reliability and Safety Lessons Learned. Ways to Prevent Problems Good computer systems Good computer systems Good training Good training Accountability.
SWE Introduction to Software Engineering
CSE 466 – Fall Introduction - 1 Design for Safety 1.Hazard Identification and Fault Tree Analysis 2.Risk Assessment 3.Define Safety Measures 4.Create.
1 Software Testing and Quality Assurance Lecture 37 – Software Quality Assurance.
1 Software Testing and Quality Assurance Lecture 34 – Software Quality Assurance.
SWE Introduction to Software Engineering
CSE 466 – Fall Introduction - 1 Safety  Examples  Terms and Concepts  Safety Architectures  Safe Design Process  Software Specific Stuff 
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
Hazards Analysis & Risks Assessment By Sebastien A. Daleyden Vincent M. Goussen.
T. Bajd, M. Mihelj, J. Lenarčič, A. Stanovnik, M. Munih, Robotics, Springer, 2010 SAFETY IN INDUSTRIAL ROBOTICS R. Kamnik, T. Bajd and M. Mihelj.
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
Pipeline Qra Seminar Title slide Title slide.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
Scheduled Versus Event Driven Testing of Distribution Protection IEDs Dr. Alexander Apostolov, Benton Vandiver, Will Knapek, OMICRON electronics.
Design for Safety Injury, Hazards, Conditional Circumstances Legal Responsibilities Guidelines for Safe Products/systems Safety Hierarchy, Safe Design.
Risk Analysis for Engineering Design J. M. McCarthy Fall 2003 Definitions Hazard Analysis Hazard Analysis Report Example for Mini Baja Nationally Recognized.
Reliability and Fault Tolerance Setha Pan-ngum. Introduction From the survey by American Society for Quality Control [1]. Ten most important product attributes.
2. Fault Tolerance. 2 Fault - Error - Failure Fault = physical defect or flow occurring in some component (hardware or software) Error = incorrect behavior.
Levels of safety Priorities for eliminating hazards in the workplace Eliminate the hazard through the machine design stage Apply safeguarding technology.
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Slide 1 Critical Systems Specification 2.
High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 4: Embedded Computing High Performance Embedded Computing Wayne Wolf.
ERT 312 SAFETY & LOSS PREVENTION IN BIOPROCESS RISK ASSESSMENT Prepared by: Miss Hairul Nazirah Abdul Halim.
ERT 322 SAFETY AND LOSS PREVENTION RISK ASSESSMENT
High Performance Embedded Computing © 2007 Elsevier Lecture 5: Embedded Systems Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Software availability –the probability that a program is operating according to requirements at a given point in time. Availability = (MTTF/MTBF) x 100.
CSE 466 – Fall Introduction - 1 Physical Layer API (same as Lab)  enq(), deq(), scan()  Periodically  if not currently master select non-empty.
Need for protection Power system must be kept in operation continuously without major breakdowns This can be achieved in two ways: 1.Implement a system.
Fault-Tolerant Systems Design Part 1.
Safety-Critical Systems T Ilkka Herttua. Safety Context Diagram HUMANPROCESS SYSTEM - Hardware - Software - Operating Rules.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Software Testing and Quality Assurance Software Quality Assurance 1.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
1 ITGD 2202 Supervision:- Assistant Professor Dr. Sana’a Wafa Al-Sayegh Dr. Sana’a Wafa Al-SayeghStudent: Anwaar Ahmed Abu-AlQumboz.
Safety-Critical Systems 7 Summary T V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Slide 1 Critical Systems Specification 1.
CMSC 345 Fall 2000 Requirements Overview. Work with customers to elicit requirements by asking questions, demonstrating similar systems, developing prototypes,
Computer security By Isabelle Cooper.
RELIABILITY ENGINEERING 28 March 2013 William W. McMillan.
Ensure that the right functions are performed Ensure that the these functions are performed right and are reliable.
06/01/20161 Benny Hoff TÜV NORD Sweden AB AFS 2002:1 Use of pressure equipment.
CSE 466 – Fall Introduction - 1 Safety  Terms and Concepts  Safety Architectures  Safe Design Process  Software Specific Stuff  Sources  Hard.
CSE 466 – Spring Introduction - 1 The Final  Hardware: probably something on memory mapped I/O (HW and SW)  OS: Probably a task diagram of some.
Risk and Precautions needed when installing hardware - P2
COMPUTER HARDWARE SERVICING
1 Software Testing and Quality Assurance Lecture 38 – Software Quality Assurance.
Process Safety Management Soft Skills Programme Nexus Alliance Ltd.
October 22, 2005 Parvaiz Ahmed Khand An Overview of Software Safety.
An overview of I&C Systems in APR 1400 Parvaiz Ahmed Khand December 28, 2007.
Can We Trust the Computer? FIRE, Chapter 4. What Can Go Wrong? What are the risks and reasons for computer failures? How much risk must or should we accept?
Chapter 6 - Modern Concepts of Accident Prevention
BASIC PROFESSIONAL TRAINING COURSE Module V Safety classification of structures, systems and components Case Studies Version 1.0, May 2015.
ATTRACT TWD Symposium, Barcelona, Spain, 1st July 2016
Safety and Risk.
Failure and Design Jaime Baber October 12, 2000
Reliability and Fault Tolerance
Fault Tolerance Distributed Web-based Systems
Hazards Analysis & Risks Assessment
Computer in Safety-Critical Systems
Operation of Target Safety System (TSS)
Definitions Cumulative time to failure (T): Mean life:
Mikael Olsson Control Engineer
Seminar on Enterprise Software
Presentation transcript:

CSE 466 – Fall Introduction - 1 Safety  Example  Terms and Concepts  Safety Architectures  Safe Design Process  Software Specific Stuff  Sources  Hard Time, by Bruce Powell Douglass, which references to Safeware

CSE 466 – Fall Introduction - 2 What is a Safe System? Brake Pedal Sensor ProcessorBus Brake Engine Is it safe? What does “safe” mean? How can we make it safe? Add electronic watch dog between brake and bus Add mechanical linkage from separate brake pedal directly to brake Add a third mechanical linkage….

CSE 466 – Fall Introduction - 3  Reliability of component i can be expressed as the probability that component i is still functioning at some time t.  Is system reliability P s (t) =  P i (t) ?  Assuming that all components have the same component reliability, Is a system w/ fewer components always more reliable?  Does component failure  system failure burn in period Terms and Concepts time Probability of Continued Operation Low failure rate means nearly constant probability 1/(failure rate) = MTBF Pi(t)

CSE 466 – Fall Introduction - 4 A Safety System  A system is safe if it’s deployment involves assuming an acceptable amount of risk…acceptable to whom?  Risk factors  Probability of something bad happing  Consequences of something bad happening (Severity)  Example  Airplane Travel – high severity, low probability  Electric shock from battery powered devices – hi probability, low severity severity probability danger zone (we don’t all have the same risk tolerance!) autopilot mp3 player PC

CSE 466 – Fall Introduction - 5 More Precise Terminology  Accident or Mishap: (unintended) Damage to property or harm to persons. Economic impact of failure to meet warranted performance is outside of the scope of safety.  Hazard: A state of the the system that will inevitably lead to an accident or mishap  Release of Energy  Release of Toxins  Interference with life support functions  Supplying misleading information to safety personnel or control systems. This is the desktop PC nightmare scenario. Bad information  Failure to alarm when hazardous conditions exist

CSE 466 – Fall Introduction - 6 Faults  A fault is an “unsatisfactory system condition or state”. A fault is not necessarily a hazard. In fact, assessments of safety are based on the notion of fault tolerance.  Systemic faults  Design Errors (includes process errors such as failure to test or failure to apply safety design process)  Faults due to software bugs are always systemic  Random Faults  Random events that can cause permanent or temporary damage to the system. Includes EMI, radiation, component failure, power supply problems, security breech, etc.

CSE 466 – Fall Introduction - 7 Component v. System  Reliability is a component issue  Safety and Availability are system issues  If a system has lots of redundancy the likelihood of a component failure (a fault) increases, but so may the safety and availability of that system.  Safety and Availability are different and sometimes at odds. Safety may require the shutdown of a system that may still be able to perform its function.  A backup system that can fully operate a nuclear power plant might always shut it down in the event of failure of the primary system.

CSE 466 – Fall Introduction - 8 Single Fault Tolerance (for safety)  The existence of any single fault does not result in a hazard  Single fault tolerant systems are generally considered to be safe, but more stringent requirements may apply to high risk cases…airplanes, power plants, etc. Backup H2 Valve Control Main H2 Valve Control watchdog protocol If the handshake fails, then either one or both can shut off the gas supply. Is this a single fault tolerant system?

CSE 466 – Fall Introduction - 9 Think Again? Backup H2 Valve Control Main H2 Valve Control watchdog handshake common mode failures

CSE 466 – Fall Introduction - 10 Now Safe? Backup H2 Valve Control Main H2 Valve Control watchdog handshake Separate Clock Source Power Fail-Safe What about power spike? Does it ever end? Is there another approach?

CSE 466 – Fall Introduction - 11 Time is a Factor  The TUV Fault Assessment Flow Chart  T0: Fault tolerance time of the first failure  T1: Time after which a second fault is likely  Captures time, and the notion of “latent faults”  Built-in fault diagnosis and corrective action must occur within specific time frames. T act < T0 < T1

CSE 466 – Fall Introduction - 12 Latent Faults  Any fault this is not detectable by the system during operation has a probability of 1 – doesn’t count in single fault tolerance assessment Backup H2 Valve Control Main H2 Valve Control watchdog handshake stuck valves could be latent if the controllers cannot check their state.

CSE 466 – Fall Introduction - 13 Fail Safe  On reset process checks status. If bad, enter “safe mode”  power off  reduced/altered functionality  alarm  restart  Safe mode is application dependent Processor Watchdog protocol failure reset status

CSE 466 – Fall Introduction - 14 Safety Architectures  Single Channel Protected Design  Redundancy  Diversity or Heterogeneity  Watchdog Brake Pedal Sensor Computer Bus Brake Engine Control watchdog embed checks into the channel

CSE 466 – Fall Introduction - 15 Single Channel Protection  How to protect against corrupt code due to EMI?  perform period checksums on the code space  How long does this take?  Is T0<T1?  Feasibility of SCPD  Fault Tolerance Time  Speed of the processor  Amount of ROM/RAM  Redundant Memory Subsystems: special hw  Recurring cost v. Development cost tradeoff Computer (code corruption) Computer Bus Brake Engine Control parity/crc on the bus

CSE 466 – Fall Introduction - 16 Multi-Channel Protection  Homogeneous Redundancy  Low development cost…just duplicate  High recurring cost  No protection against systemic faults Computer (code corruption) Brake Engine Control Computer Voting Bus could be implemented same as collision detect in I2C

CSE 466 – Fall Introduction - 17 Multi-Channel Protection  Heterogeneous Redundancy  Protects against random and some systemic faults.  Best if implementation teams are kept separated Proc/SW 1 Brake Engine Control Proc/SW 2 Voting Bus

CSE 466 – Fall Introduction - 18 Design for Safety 1.Hazard Identification and Fault Tree Analysis 2.Risk Assessment 3.Define Safety Measures 4.Create Safe Requirements 5.Create Safe Designs 6.Implement Safety 7.Assure Safety Process 8.Test,Test,Test

CSE 466 – Fall Introduction - 19 Hazard Identification – Ventilator Example HazardSeverityTolerance Time FaultLikelihoodDetection Time MechanismExposure Time Hypo- ventilation Severe5 min.Vent FailsRare30secIndep. pressure sensor w/ alarm 40sec Esophageal intubation Medium30secC02 sensor alarm 40sec User mis- attaches breathing hoses neverN/ADifferent mechanic al fittings for intake and exhaust N/A Over- pressuriza tion Sever0.05secRelease valve failure Rare0.01secSecondary valve opens 0.01sec

CSE 466 – Fall Introduction - 20 Fault Tree Analysis  Pacemaker Example

CSE 466 – Fall Introduction - 21 System Risk Assessment  According to TUV: Risk factors are  Severity of the hazard  Duration of the period of exposure to the hazard  Prevention of the hazard  Probability of occurrence

CSE 466 – Fall Introduction - 22 Define the Safety Measures  Obviation: Make it physically impossible (mechanical hookups, etc).  Education: Educate users to prevent misuse or hazardous cases  Alarming: Inform the users/operators of hazardous conditions  Interlocks: Take steps to eliminate the hazard when conditions exist  Restriction of Access. High voltage sources should be in compartments that require tools to access, w/ proper labels.  Labeling  Consider  Tolerance time  Supervision of the system: constant, occasional, unattended. Airport People movers have to be design to a much higher level of safety than attended trains even if they both have fully automated control

CSE 466 – Fall Introduction - 23 Create Save Requirements: Specifications  Document the safety functionality