Presentation is loading. Please wait.

Presentation is loading. Please wait.

RELIABILITY ENGINEERING 28 March 2013 William W. McMillan.

Similar presentations

Presentation on theme: "RELIABILITY ENGINEERING 28 March 2013 William W. McMillan."— Presentation transcript:

1 RELIABILITY ENGINEERING 28 March 2013 William W. McMillan

2 In non-functional requirements, what are some of the reliability targets that might be defined?

3 General Approaches  Avoiding faults  Develop in a way to prevent faults.  Careful specification and programming.  Detecting faults  Formal verification  Extensive testing  Tolerating faults  Run-time response to faults  Recover and proceed

4 Diminishing Returns  Cost to catch each error goes up dramatically as more and more are caught.  Considered impossible to catch all errors.  Especially in systems with complex interactions among modules, with hardware, or between threads.  “Six Sigma” aims at 3.4 defects in 1 million items.  From Motorola, used by GE and others.  Spec limit is 6 SDs away from mean of measure. E.g., Spec is 1000 ± 0.6; If mean is 1000, SD < 0.1  Still not perfect!

5 What Six Sigma goal could be defined for software reliability?

6 Redundancy  Multiple versions of the software.  N-version programming  Different developers  Different languages and libraries  Installations on multiple hardware platforms.  Multiple methods to verify software.  Multiple sets of eyes on code.

7 How would you use redundancy in creating software to set off water sprinklers for fire suppression?

8 Observation  Process  Documented, archived, standardized  Monitoring at runtime  Performance: time, space, transmission rates  Inconsistencies between version or measures  Deadlocks  Memory access problems  Failure of assertions  State of hardware  Keep a trace.

9 Runtime Recovery  Exception handling is critical.  Record state and problem.  Run diagnostic routines.  Reset hardware.  Return to functional state.  Might have different versions “vote.”  Can sometimes reduce performance and still do job.  Slow down data transmission.  Throw away some packets.  Disable some functions.

10 Backup or Protection System  Runs in parallel with primary system.  Simpler than primary system.  Monitors sensors (possibly alternate ones), performance, etc.  Can intervene to:  Shut something down.  Start emergency actions (fire suppression, brakes, alarms…).  Take control from primary to get into safe state.

11 What kinds of systems could not function well with degraded performance?

12 Programming Practices  Validate data.  Range checks  Consistency checks E.g., Car in “park” is not going 50 mph.  Encapsulate.  Use good languages  Object-oriented design or similar  Private data  Simple interfaces

13 Programming Practices  Control memory access  Array bounds  Pointers  Handle exceptions  Throw specific exception types and info.  Use assertions  Throw exception when one fails.  Time out when waiting for resource.  Install switches for debug mode, audit trails.

14 Programming Practices  Check versions of other components.  Define hierarchy of hardware needed.  Alternate ports, sensors, actuators,…  Alternate storage devices  Move to another if there’s a problem.  Make UI bulletproof  Consistency  Data types and ranges  Keep in sandbox

15 Programming Practices  Beware of recursion.  Can be inefficient.  Can blow the stack.  Beware of interrupts.  Device might send interrupt and halt a time-critical operation.  Program should have a plan for full data structure.  Buffer  Disk file

16 Think of a language that would not support these programming practices well. How would you use that language so as to overcome its deficiencies?

17 Measures of Reliability  Mean time between failures  Probability of failure on demand  When service requested, how often given?  Percent time available  E.g., web services  Percent of completed operations  Initiated by the program, e.g., Step of motor, writing to port, saving data item,…

18 Measures of Reliability  Percent of data acquired  E.g., reading from stream, how many values lost?  Average quality of data  E.g., video  Percent time that status bits are not as expected.  …?

19 Think of some other reliability measures that might be useful.

Download ppt "RELIABILITY ENGINEERING 28 March 2013 William W. McMillan."

Similar presentations

Ads by Google