Download presentation
Presentation is loading. Please wait.
Published byShawn Peters Modified over 9 years ago
1
RELIABILITY ENGINEERING 28 March 2013 William W. McMillan
2
In non-functional requirements, what are some of the reliability targets that might be defined?
3
General Approaches Avoiding faults Develop in a way to prevent faults. Careful specification and programming. Detecting faults Formal verification Extensive testing Tolerating faults Run-time response to faults Recover and proceed
4
Diminishing Returns Cost to catch each error goes up dramatically as more and more are caught. Considered impossible to catch all errors. Especially in systems with complex interactions among modules, with hardware, or between threads. “Six Sigma” aims at 3.4 defects in 1 million items. From Motorola, used by GE and others. Spec limit is 6 SDs away from mean of measure. E.g., Spec is 1000 ± 0.6; If mean is 1000, SD < 0.1 Still not perfect!
5
What Six Sigma goal could be defined for software reliability?
6
Redundancy Multiple versions of the software. N-version programming Different developers Different languages and libraries Installations on multiple hardware platforms. Multiple methods to verify software. Multiple sets of eyes on code.
7
How would you use redundancy in creating software to set off water sprinklers for fire suppression?
8
Observation Process Documented, archived, standardized Monitoring at runtime Performance: time, space, transmission rates Inconsistencies between version or measures Deadlocks Memory access problems Failure of assertions State of hardware Keep a trace.
9
Runtime Recovery Exception handling is critical. Record state and problem. Run diagnostic routines. Reset hardware. Return to functional state. Might have different versions “vote.” Can sometimes reduce performance and still do job. Slow down data transmission. Throw away some packets. Disable some functions.
10
Backup or Protection System Runs in parallel with primary system. Simpler than primary system. Monitors sensors (possibly alternate ones), performance, etc. Can intervene to: Shut something down. Start emergency actions (fire suppression, brakes, alarms…). Take control from primary to get into safe state.
11
What kinds of systems could not function well with degraded performance?
12
Programming Practices Validate data. Range checks Consistency checks E.g., Car in “park” is not going 50 mph. Encapsulate. Use good languages Object-oriented design or similar Private data Simple interfaces
13
Programming Practices Control memory access Array bounds Pointers Handle exceptions Throw specific exception types and info. Use assertions Throw exception when one fails. Time out when waiting for resource. Install switches for debug mode, audit trails.
14
Programming Practices Check versions of other components. Define hierarchy of hardware needed. Alternate ports, sensors, actuators,… Alternate storage devices Move to another if there’s a problem. Make UI bulletproof Consistency Data types and ranges Keep in sandbox
15
Programming Practices Beware of recursion. Can be inefficient. Can blow the stack. Beware of interrupts. Device might send interrupt and halt a time-critical operation. Program should have a plan for full data structure. Buffer Disk file
16
Think of a language that would not support these programming practices well. How would you use that language so as to overcome its deficiencies?
17
Measures of Reliability Mean time between failures Probability of failure on demand When service requested, how often given? Percent time available E.g., web services Percent of completed operations Initiated by the program, e.g., Step of motor, writing to port, saving data item,…
18
Measures of Reliability Percent of data acquired E.g., reading from stream, how many values lost? Average quality of data E.g., video Percent time that status bits are not as expected. …?
19
Think of some other reliability measures that might be useful.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.