Fehlererkennung in SW David Rigler
Overview Types of errors detection Fault/Error classification Description of certain SW error detection techniques Evaluation (Coverage / Overhead) Conclusion
Failure Runtime Detection (in Software) Software Diversity / N-Version P. Defensive Programming Assertions Bound/Range checking Control Flow checking Block Entry Exit Checking Error Capturing Instructions Advanced Techniques … Redundant Data/Code HW - Failures SW - Failures
Transient Hardware Error Classification Data Errors Code Errors Type S1 Statements affecting data only Type S2 Statements affecting the execution flow Type E1 Errors changing operation (not control flow) Type E2 Errors changing the Statement type (S1 S2)
Data Errors (Executable Assertions) Generic Bound Integrity For SW and HW Errors Non-Generic Value Range Approximate (False alarm)
Data Errors (systematic Data Redundancy) Rules Duplicate every variable: x -> (x1 and x2) Perform write operations on x1 and x2 Read operation on x -> check for consistency of x1 and x2
Data Errors (systematic Data Redundancy) Generic Approach Use pre-processor on high level language Compiler optimisations may be a problem All (visible) single Bit Flip Errors in DATA Memory can be detected
Control Flow Errors Block Entry Exit Checking Unique signatures for Basic Blocks Assign at Entry Compare at Exit Problems Jumps within Block Granularity Jumps to unused Area
Control Flow Errors Duplicate Condition Checks
Control Flow Errors Error Capturing Instructions Special or unused Instructions Trap, SWI, … Spread over unused Memory Program Memory Data Memory Call Error Handling Function
Control Flow Errors Watchdog Timer Periodically reset timer Take Action at specific timer value Needs Support of Hardware Common in embedded Controllers Detects infinite loop errors
Coverage Example 1 BEEC, Duplicate Condition Checks, Systematic Data Redundancy Simulated bit-flip errors in memory ~ 5x Performance slow down ~ 2x Size No Silent Violations (Data) High Coverage even for Errors in Code Area.
Coverage Example 2 Physical Fault Injection Heavy-Ion Radiation Power-Supply Disturbances Hardware WDT Effect of additional SW 60% 85%
Improving Coverage Separate BB for redundant variables Separated in Memory No single bit-flip jumps Use cumulative Signatures Detect jumps within Block Avoid Signature aliasing Hamming distance
100% Coverage For simple failure model Single bit-flip Data- and Code-Memory/Registers Hidden Registers not included (Branch Buffer, Cache tags, etc) High Overhead ~4x Memory usage >3x Time
Conclusion: Error Detection in SW Pure SW: high coverage only for simple failure models Addition to HW Error Detection Trade-off: Overhead Coverage Fine tuning possible Use available Resources (Time, Memory)
Miremadi G., J. Karlsson, U. Gunneflo, and J. Torin, Two Software Techniques for On-Line Error Detection, Proc. of the 22th International Symposium on Fault-Tolerant Computing (FTCS-22), July 1992, pp Miremadi G. and J. Torin, Evaluation Processor-Behavior Three Error-Detection Mechanisms Using Physical Fault-Injection, IEEE Trans. On Reliability, Vol. 44, No. 3, Sept. 1995, pp Rabejac C., J.-P. Blanquart, J.-P. Queille, Lab. for Dependability Eng., CNRS, Toulouse, France, Executable assertions and timed traces for on-line software error detection, Proc. of the 26th International Symposium on Fault-Tolerant Computing (FTCS-26), Alkhalifa Z., V. S. S. Nair, N. Krishnamurthy and J. A. Abraham, Design and Evaluation of Systemlevel Checks for On-line Control Flow Error Detection, IEEE Trans. on Parallel and Distributed Systems, Vol. 10, No. 6, Jun. 1999, pp M. Fazeli, R. Farivar, S. G. Miremadi, "A Software-Based Concurrent Error Detection Technique for PowerPC Processor-based Embedded systems", Proc. Of 20th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems (DFT), Monterey, California, Software Detection Mechanisms Providing Full Coverage Against Single Bit-Flip Faults B. Nicolescu, Y. Savaria, Senior Member, IEEE, and R. Velazco, Member, IEEE Soft-error Detection through Software Fault-Tolerance techniques Maurizio REBAUDENGO, Matteo SONZA REORDA, Marco TORCHIANO, Massimo VIOLANTE