CSE 466 – Fall 2000 - Introduction - 1 Safety  Terms and Concepts  Safety Architectures  Safe Design Process  Software Specific Stuff  Sources  Hard.

Slides:



Advertisements
Similar presentations
Exceptions1 Syntax, semantics, and pragmatics. Exceptions2 Syntax, semantics, pragmatics Syntax –How it looks, i.e. how we have to program to satisfy.
Advertisements

CSI 3120, Exception handling, page 1 Exception and Event Handling Credits Robert W. Sebesta, Concepts of Programming Languages, 8 th ed., 2007 Dr. Nathalie.
Architectural Support for OS March 29, 2000 Instructor: Gary Kimura Slides courtesy of Hank Levy.
CSE 466 – Fall Introduction - 1 Design for Safety 1.Hazard Identification and Fault Tree Analysis 2.Risk Assessment 3.Define Safety Measures 4.Create.
1 Software Testing and Quality Assurance Lecture 37 – Software Quality Assurance.
Exceptions David Rabinowitz. March 3rd, 2004 Object Oriented Design Course 2 The Role of Exceptions Definition: a method succeeds if it terminates in.
CSE Fall Introduction - 1 What is an Embedded Systems  Its not a desktop system  Fixed or semi-fixed functionality (not user programmable)
SWE Introduction to Software Engineering
Developing Dependable Systems CIS 376 Bruce R. Maxim UM-Dearborn.
CSC 402, Fall Requirements Analysis for Special Properties Systems Engineering (def?) –why? increasing complexity –ICBM’s (then TMI, Therac, Challenger...)
CSE 466 – Spring Introduction Implement Safety – Safe Software Language Features Type and Range Safe Systems Exception Handling Re-use, Encapsulation.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
CSE 466 – Fall Introduction - 1 Safety  Examples  Terms and Concepts  Safety Architectures  Safe Design Process  Software Specific Stuff 
T. Bajd, M. Mihelj, J. Lenarčič, A. Stanovnik, M. Munih, Robotics, Springer, 2010 SAFETY IN INDUSTRIAL ROBOTICS R. Kamnik, T. Bajd and M. Mihelj.
1 Exception and Event Handling (Based on:Concepts of Programming Languages, 8 th edition, by Robert W. Sebesta, 2007)
Exceptions. Many problems in code are handled when the code is compiled, but not all Some are impossible to catch before the program is run  Must run.
Basic Concepts of Computer Networks
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
Ethical and Social...J.M.Kizza 1 Module 8: Software Issues: Risks and Liabilities Definitions Causes of Software Failures Risks Consumer Protection Improving.
Language Evaluation Criteria
Lecture 22 Miscellaneous Topics 4 + Memory Allocation.
What is an exception? An exception is: – an event that interrupts the normal processing of the program. –an error condition that violates the semantic.
June 14, 2001Exception Handling in Java1 Richard S. Huntrods June 14, 2001 University of Calgary.
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Slide 1 Critical Systems Specification 2.
Exception Handling Basic -- from generic idea to Java.
ERT 312 SAFETY & LOSS PREVENTION IN BIOPROCESS RISK ASSESSMENT Prepared by: Miss Hairul Nazirah Abdul Halim.
Software Safety CS3300 Fall Failures are costly ● Bhopal 1984 – 3000 dead and injured ● Therac – 6 dead ● Chernobyl / Three Mile.
07 Coding Conventions. 2 Demonstrate Developing Local Variables Describe Separating Public and Private Members during Declaration Explore Using System.exit.
Exceptions Handling Exceptionally Sticky Problems.
CSE 466 – Fall Introduction - 1 Physical Layer API (same as Lab)  enq(), deq(), scan()  Periodically  if not currently master select non-empty.
Security - Why Bother? Your projects in this class are not likely to be used for some critical infrastructure or real-world sensitive data. Why should.
Exceptions and assertions CSE 331 University of Washington.
Exception Handling Unit-6. Introduction An exception is a problem that arises during the execution of a program. An exception can occur for many different.
Software Testing and Quality Assurance Software Quality Assurance 1.
We will talking about story of JAVA language. By Kristsada Songpartom.
Exception Handling Programmers must deal with errors and exceptional situations: User input errors Device errors Empty disk space, no memory Component.
RELIABILITY ENGINEERING 28 March 2013 William W. McMillan.
Chapter 1: Fundamental of Testing Systems Testing & Evaluation (MNN1063)
Fall 2015CISC/CMPE320 - Prof. McLeod1 CISC/CMPE320 Lecture Videos will no longer be posted. Assignment 3 is due Sunday, the 8 th, 7pm. Today: –System Design,
CS212: Object Oriented Analysis and Design Lecture 19: Exception Handling.
1 Lecture07: Memory Model 5/2/2012 Slides modified from Yin Lou, Cornell CS2022: Introduction to C.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
CSE 466 – Spring Introduction - 1 The Final  Hardware: probably something on memory mapped I/O (HW and SW)  OS: Probably a task diagram of some.
(c) University of Washington10-1 CSC 143 Java Errors and Exceptions Reading: Ch. 15.
Exceptions Lecture 11 COMP 401, Fall /25/2014.
Lecture 4 Page 1 CS 111 Online Modularity and Memory Clearly, programs must have access to memory We need abstractions that give them the required access.
1 Software Testing and Quality Assurance Lecture 38 – Software Quality Assurance.
Introduction to Exceptions in Java CS201, SW Development Methods.
Process Safety Management Soft Skills Programme Nexus Alliance Ltd.
October 22, 2005 Parvaiz Ahmed Khand An Overview of Software Safety.
Can We Trust the Computer? FIRE, Chapter 4. What Can Go Wrong? What are the risks and reasons for computer failures? How much risk must or should we accept?
OOP Tirgul 7. What We’ll Be Seeing Today  Packages  Exceptions  Ex4 2.
CSE 466 – Fall Introduction - 1 Safety  Example  Terms and Concepts  Safety Architectures  Safe Design Process  Software Specific Stuff  Sources.
CSE 332: C++ Exceptions Motivation for C++ Exceptions Void Number:: operator/= (const double denom) { if (denom == 0.0) { // what to do here? } m_value.
Eighth Lecture Exception Handling in Java
CMSC 345 Defensive Programming Practices from Software Engineering 6th Edition by Ian Sommerville.
Handling Exceptionally Sticky Problems
Java Programming Language
Why exception handling in C++?
Modularity and Memory Clearly, programs must have access to memory
Fault Tolerance Distributed Web-based Systems
Exception Handling Imran Rashid CTO at ManiWeber Technologies.
Exception and Event Handling
Handling Exceptionally Sticky Problems
Exception Handling.
Presentation transcript:

CSE 466 – Fall Introduction - 1 Safety  Terms and Concepts  Safety Architectures  Safe Design Process  Software Specific Stuff  Sources  Hard Time by Bruce Powell Douglass, which references Safeware by Nancy Leveson

CSE 466 – Fall Introduction - 2 What is a Safe System? Brake Pedal Sensor ProcessorBus Brake w/ local controller Engine w/ local controller Is it safe? What does “safe” mean? How can we make it safe? Add electronic watch dog between brake and bus Add mechanical linkage from separate brake pedal directly to brake Add a third mechanical linkage….

CSE 466 – Fall Introduction - 3  Reliability of component i can be expressed as the probability that component i is still functioning at some time t.  Is system reliability P s (t) =  P i (t) ?  Assuming that all components have the same component reliability, Is a system w/ fewer components always more reliable ?  Does component failure  system failure ? burn in period Terms and Concepts time Pi(t) = Probability of being operational at time t Low failure rate means nearly constant probability 1/(failure rate) = MTBF

CSE 466 – Fall Introduction - 4 A Safety System  A system is safe if it’s deployment involves assuming an acceptable amount of risk…acceptable to whom?  Risk factors  Probability of something bad happing  Consequences of something bad happening (Severity)  Example  Airplane Travel – high severity, low probability  Electric shock from battery powered devices – hi probability, low severity severity probability danger zone (we don’t all have the same risk tolerance!) airplane autopilot mp3 player PC safe zone

CSE 466 – Fall Introduction - 5 More Precise Terminology  Accident or Mishap: (unintended) Damage to property or harm to persons. Economic impact of failure to meet warranted performance is outside of the scope of safety.  Hazard: A state of the the system that will inevitably lead to an accident or mishap  Release of Energy  Release of Toxins  Interference with life support functions  Supplying misleading information to safety personnel or control systems. This is the desktop PC nightmare scenario. Bad information  Failure to alarm when hazardous conditions exist

CSE 466 – Fall Introduction - 6 Faults  A fault is an “unsatisfactory system condition or state”. A fault is not necessarily a hazard. In fact, assessments of safety are based on the notion of fault tolerance.  Systemic faults  Design Errors (includes process errors such as failure to test or failure to apply a safety design process)  Faults due to software bugs are systemic  Security breech  Random Faults  Random events that can cause permanent or temporary damage to the system. Includes EMI and radiation, component failure, power supply problems, wear and tear.

CSE 466 – Fall Introduction - 7 Component v. System  Reliability is a component issue  Safety and Availability are system issues  A system can be safe even if it is unreliable!  If a system has lots of redundancy the likelihood of a component failure (a fault) increases, but so may increase the safety and availability of that system.  Safety and Availability are different and sometimes at odds. Safety may require the shutdown of a system that may still be able to perform its function.  A backup system that can fully operate a nuclear power plant might always shut it down in the event of failure of the primary system.  The plant could remain available, but it is unsafe to continue operation

CSE 466 – Fall Introduction - 8 Single Fault Tolerance (for safety)  The existence of any single fault does not result in a hazard  Single fault tolerant systems are generally considered to be safe, but more stringent requirements may apply to high risk cases…airplanes, power plants, etc. Backup H2 Valve Control Main H2 Valve Control watchdog protocol If the handshake fails, then either one or both can shut off the gas supply. Is this a single fault tolerant system?

CSE 466 – Fall Introduction - 9 Next Week  Project Presentations/Demos  Wednesday  Tiny OS (one day)  Structure of the OS, how does it work?  Applications, what is it like to write code event handling communications examples  Monday demos  Lake Analysis (Forget about funding from the new EPA)  Modem  Gravity Mouse  MBOX Alarm

CSE 466 – Fall Introduction - 10 The Final  Something networking, related to the stack that we built in lecture  assumes a working understanding of I2C  A safety question  Something from previous sections

CSE 466 – Fall Introduction - 11 Terms  Safety: Assuming acceptable risk  Accident: Unintended damage  Hazard: Dangerous system state: accident is inevitable  Fault: Conditions that lead to hazards  Systemic (design) faults  Random faults  Reliability System is functioning if all components are functioning Ps(t) =  Pi(t) System is functioning of any component is functioning (redundancy) Ps(t) = 1-  (Fi(t)) probability of component failure Fi(t) = 1-Pi(t) Example: let P1(T) = P2(T) = 0.9 then F1(T) = F2(T) = 0.1, so Fs(T) = 0.1*0.1 = 0.01 So Ps(T) = 1-Fs(T) =.99

CSE 466 – Fall Introduction - 12 Term (cont)  Latent fault: a fault that does not in itself lead to a hazard, but which cannot be detected. Must assume that the probability of this fault = 1  Safety Architectures  Single Channel Protection  Redundancy  Diversity  Time equation  Time to Eliminate Hazard < Tolerance Time of Hazard < Time to Next Fault

CSE 466 – Fall Introduction - 13 Single Fault Tolerance (for safety)  The existence of any single fault does not result in a hazard  Single fault tolerant systems are generally considered to be safe, but more stringent requirements may apply to high risk cases…airplanes, power plants, etc. Backup H2 Valve Control Main H2 Valve Control watchdog protocol If the handshake fails, then either one or both can shut off the gas supply. Is this a single fault tolerant system? error

CSE 466 – Fall Introduction - 14 Is This? Backup H2 Valve Control Main H2 Valve Control watchdog handshake common mode failures

CSE 466 – Fall Introduction - 15 Now Safe? Backup H2 Valve Control Main H2 Valve Control watchdog handshake Separate Clock Source Power Fail-Safe (non-latching) Valves What about power spike that confuses both processors at the same time? Maybe the watchdog can’t be software based. Does it ever end? T test <T 0 <T 1 detection time is < than single fault tolerance time < time to second failure have we solved this? latent faults P = 1

CSE 466 – Fall Introduction - 16 Safety Architectures  Self Checking (Single Channel Protected Design)  Redundancy  Diversity or Heterogeneity Brake Pedal Sensor Computer Bus Brake Engine Control watchdog protocol parity/crc Periodic internal CRC/Checksum computation (code/data corruption)

CSE 466 – Fall Introduction - 17 Single Channel Protection  Self Checking  perform periodic checksums on code and data  How long does this take?  Is Ttest<T0<T1?  No protection against systemic faults  Feasibility of Single Channel Protection  Fault Tolerance Time  Speed of the processor  Amount of ROM/RAM  Special Hardware  Recurring cost v. Development cost tradeoff Computer (code corruption) Computer Bus Brake Engine Control parity/crc on the bus

CSE 466 – Fall Introduction - 18 Redundancy  Homogeneous Redundancy  Low development cost…just duplicate  High recurring cost  No protection against systemic faults Computer (code corruption) Brake Engine Control Computer Voting Bus could be implemented similar to collision detection what happens if you have an even number of computers?

CSE 466 – Fall Introduction - 19 Diversity  Heterogeneous Redundancy  Protects against random and some systemic faults.  Best if implementation teams are kept separated  Space shuttle: five computers, 4 same 1 different Proc/SW 1 Brake Engine Control Proc/SW 2 Voting Bus

CSE 466 – Fall Introduction - 20 Design Process 1.Hazard Identification and Fault Tree Analysis 2.Risk Assessment 3.Define Safety Measures 4.Create Safe Requirements 5.Implement Safety 6.Test,Test,Test,Test,Test

CSE 466 – Fall Introduction - 21 Hazard Analysis – Working forward from hazards HazardSeverityTolerance Time Fault Example LikelihoodDetection Time MechanismExposure Time Hypo- ventilation Severe5 min.Motor Too Slow Rare30secIndep. pressure sensor w/ alarm 40sec Esophageal intubation Medium30secC02 sensor alarm 40sec User mis- attaches breathing hoses neverN/ADifferent mechanic al fittings for intake and exhaust N/A Over- pressuriza tion Sever0.05secRelease valve stuck closed Rare0.01secSecondary valve opens 0.01sec Human in Loop Ventilator Example

CSE 466 – Fall Introduction - 22 Fault Tree Analysis Satisfiability Analysis: What combinations of inputs produce the hazard Explosion Hazard: (SystemOn * FanFailure * PlumbingLeak) + (SystemOff * MainH2Stuck * PlumbingLeak) Let Plumbing Leak = 1 (there is always some level of leakage (SystemOn * FanFailure) + (SystemOff * MainH2Stuck) Let Tdetect(FanFailure < ToleranceTime) (MainH2Stuck * System is Off) is our biggest concern. Mitigation: Open an valve from internal H2 plumbing when off?? Why Not? Proper Installation Required!

CSE 466 – Fall Introduction - 23 FMEA: Same as Hazard Analysis, but Start w/ Faults  Failure Mode: how a device can fail  Battery: never voltage spike, only low voltage  Valve: Stuck open? Stuck Closed?  Motor or Motor Controller: Stuck fast, stuck slow?  Hydrogen sensor: Will it be latent or mimic the presence of hydrogen?  Failure Modes and Effects Analysis  Great for single fault tolerant systems  For each system.  Identify all failure modes and likelihoods  Identify the hazard that is produced by each failure  Determine Time tolerance for each potential hazard  Design Considerations  Mitigation  Detection  Response  What to do: shutdown, alarm, disable certain features, etc.  Search space can be quite large

CSE 466 – Fall Introduction - 24 Risk Assessment  Risk is orthogonal to hazard analysis  Determine how risky your system is S: Extent of Damage Slight injury Single Death Several Deaths Catastrophe E: Exposure Time infrquent continuous G: Prevenability Possible Impossible W: Probability low medium high W3 W2W1 S1 S3 S2 G2 G1 G2 G1 S4 E2 E1 E2 E1

CSE 466 – Fall Introduction - 25 Example Risk Assessment DeviceHazardExtent of Damage Exposure Time Hazard Prevention Probabil ity TUV Risk Level Microwave Oven IrradiationS2E2G2W35 PacemakerPacing too slowly Pacing too fast S2E2G2W35 Power station burner control ExplosionS3E1--W36 AirlinerCrashS4E2G2W28

CSE 466 – Fall Introduction - 26 Define the Safety Measures  Obviation: Make it physically impossible (mechanical hookups, etc).  Education: Educate users to prevent misuse or dangerous use.  Alarming: Inform the users/operators or higher level automatic monitors of hazardous conditions  Interlocks: Take steps to eliminate the hazard when conditions exist (shut off power, fuel supply, explode, etc.  Restrict Access. High voltage sources should be in compartments that require tools to access, w/ proper labels.  Labeling  Consider  Tolerance time  Supervision of the system: constant, occasional, unattended. Airport People movers have to be design to a much higher level of safety than attended trains even if they both have fully automated control

CSE 466 – Fall Introduction - 27 Create Safe Requirements: Specifications  Document the safety functionality  eg. The system shall NOT pass more than 10mA through the ECG lead.  Typically the use of NOT implies a much more general requirement about functionality…in ALL CASES  Create Safe Designs  Start w/ a safe architecture  Keep hazard/risk analysis up to date.  Search for common mode failures  Assign responsibility for safe design… hire a safety engineer.  Design systems that check for latent faults  Use safe design practices…this is very domain specific, we will talk about software

CSE 466 – Fall Introduction Implement Safety – Safe Software Language Features Type and Range Safe Systems Exception Handling Re-use, Encapsulation Objects Operating Systems Protocols Testing Regression Testing Exception Testing (Fault Seeding)

CSE 466 – Fall Introduction - 29 What happens if void* a[SZ];// Data Structure Definition a[i] = (void*) x;// Range Violation? x = (myType *)a[i]; // Range and Data Type Violation? Ideal Error Checking Hierarchy Automatic: Compile Time Checking (Static) better than Run Time Checking (Dynamic) - data types for assignments - range - unitialized - Out of memory….etc. Programmer: Semantic error conditions (e.g array not sorted, too many users, etc) if (i < SZ) a[i] = (void*) x; else what??// Range Violation? if (i < SZ) x = (myType *) a[i]; else what??// Range and Data Type Violation? Four Main Problems in C 1.Static analysis not defined by the language: a[5] means *(a+5), not “fifth element of the array a”. 2.There is no run-time checking. OS checks to make sure you stay in your space. 3.Exception flow is indistinquishable from normal flow and exception handling is voluntary 4.Semantic checking onus on user of data structure

CSE 466 – Fall Introduction - 30 Language Definition  static analysis is up to the compiler  Define the semantics of the language to include all compile time checks that cannot be caught at run time  Uninitialized variables  type mismatch  The run time environment performs dynamic checks that cannot be caught at compiler time: mainly to make sure that you never access memory the wrong way  Null pointer access  Array out of bounds  Type mismatch even when casting  Memory Management and Garbage Collection a[i] = (void*) x; // raise an exception x = (myType *) a[i]; // raise and exception  What happens in the event of an exception?

CSE 466 – Fall Introduction - 31 Exception Handling  Its NOT okay to just let the system crash if some operation fails! You must, at least, get into safe mode.  In C it is up to the designer to perform error checking on the value returned by f1 and f2. Easily put off, or ignored. Can’t distinguish error handling from normal program flow, no guarantee that all errors are handled gracefully.  typical C approach: a = f1(b,c) if (a) switch (a) { case 1: handle exception 1 case 2: handle exception 2 … } b = f2(e,f) if (a) switch (a) { case 1: handle exception 1 case 2: handle exception 2 … } In C, the exception flow is the same as the normal flow. Use negative numbers for exceptions?!

CSE 466 – Fall Introduction - 32 Exception Handling in Java void myMethod() throws FatalException { try { a = x.f1(b,c) b = x.f2(e,f) if (a) … // handle all functional outcomes here! } catch (IOException e) { recover and continue if that’s okay. } catch (ArrayOutOfBoundsException e) { not recoverable, throw new FatalException(“I’m Dead”); } finally { finish up and exit } All exceptions must be handled or thrown. Exceptions are subclassed so that you can have very general or very specific exception handlers. Separates throwing exceptions functional code exception handling

CSE 466 – Fall Introduction - 33 Encapsulation: Semantic Checking  IN C while (item!=tail) { process(item); if (item->next == null) return –1 // exception ? item = item->next; }  In Java while (item = mylist.next()) { // inside mylist is not my problem process (item); } class list { Object next() throws CorruptListException { if (current == tail) return null; current = current.next; // private field access okay if (current == null) throw new CorruptListException(this.toString()); return(current.value); }

CSE 466 – Fall Introduction - 34 More Language Features  Garbage collection  What is this for  Is it good or bad for embedded systems  Inheritance  Means that type safe systems can still have functions that operate on generic objects.  Means that we can re-use commonalities between objects.  Re-use  Use trusted systems that have been thoroughly tested  OS  Networking  etc.

CSE 466 – Fall Introduction - 35 Java for Embedded Systems  Why not Java for Embedded Systems  Its slower  Code bloat  Garbage Collection may not be interruptible (Latency, predictability)  Time resolution – run time support for multithreading and synchronization must be optimized. Java assumes the existence of a basic operating system.  Hardware access – interrupt handlers, event handlers  TinyOS  A Component model that seems to be good for “reactive” systems. Probably does a good job of addressing the four major issues listed here.

CSE 466 – Fall Introduction - 36 Testing  Regression Test  Fault Seeding

CSE 466 – Fall Introduction - 37 Safe Design Process  Mainly, the hazard/risk/FMEA analysis is a process not an event!  How you do things is as important as what you do.  Standards for specification, documentation, design, review, and test  ISO9000 defines quality process…one quality level is stable and predictable.