Fault Tolerant Systems in a Space Environment Archana EE585: Fault Tolerance Computing
EE585: Fault Tolerance Computing Overview Introduction Error Detection Technique. *Watchdog Processor *Control Flow Error Detection. *Types of Signatures. Fault Injection. Conclusion. Archana EE585: Fault Tolerance Computing
EE585: Fault Tolerance Computing Introduction Experimented by CRC on Advanced Research and Global Observations Satellite.(ARGOS) The approach mainly focuses on Space missions involving equipment that combines the two basic approaches of Fault Avoidance and Fault Tolerance Mainly uses Software Techniques for detecting errors. Archana EE585: Fault Tolerance Computing
Error Detection Techniques Watch dog Processor It is a small processor that sits on buses , passively observes the bus transactions generated by main processor and detects errors by monitoring. Archana EE585: Fault Tolerance Computing
EE585: Fault Tolerance Computing Watchdog Processor Archana EE585: Fault Tolerance Computing
Control Flow Error Detection Main goal is to check the correct sequencing of the instructions. Done by Signature Analysis. It is a method in which signature is associated with a block of instructions and saved at compile time. During runtime, generated signature is compared with saved ones and errors are detected. Archana EE585: Fault Tolerance Computing
EE585: Fault Tolerance Computing Types of Signatures 1. Path Signature Analysis: * Signatures are computed for sequence of nodes, i.e., paths rather than single node. * Two bits are used to differentiate signatures * A special tag signals the time to compare the computed signature with embedded one. 2. Signature Instruction Streams (SIS) Archana EE585: Fault Tolerance Computing
EE585: Fault Tolerance Computing Contd…. Archana EE585: Fault Tolerance Computing
EE585: Fault Tolerance Computing Contd… Paths are grouped into sets and each set has a signature, called justifying signature. Control flow diagram of three basic blocks Archana EE585: Fault Tolerance Computing
2.Signature Instruction Streams (SIS) Archana EE585: Fault Tolerance Computing
EE585: Fault Tolerance Computing Contd… To reduce number of signatures embedded in the code, Branch Address hashing is used. Archana EE585: Fault Tolerance Computing
Branch Address Hashing Archana EE585: Fault Tolerance Computing
Stutter Step Mode (SSM) Each group of instructions is executed twice or more and the results are compared. It detects errors missed by other techniques. Disadvantages: * Performance level is lowered. * Memory overhead. Archana EE585: Fault Tolerance Computing
Application of SSM to one instruction Overhead is 300% Archana EE585: Fault Tolerance Computing
EE585: Fault Tolerance Computing Contd… Reduced overhead by extending duplication to a basic block. Archana EE585: Fault Tolerance Computing
EE585: Fault Tolerance Computing Error Masking in SSM Archana EE585: Fault Tolerance Computing
EE585: Fault Tolerance Computing Contd… Assume, values of registers B= 10 C= 7 => A= 17 D= 3 (We know the result of dividing any number between 19 and 15 by 5 is 3.) Say if A= 18 (instead of 17), the error is not detected. Therefore, we need to be careful in selecting the error detection technique. Archana EE585: Fault Tolerance Computing
EE585: Fault Tolerance Computing Fault Injection One way to validate Fault tolerance mechanisms Advantages: 1. Flexibility 2. Controllability 3. Predictability Disadvantages: 1. Its questionable whether the injected faults are good representation of faults in real environment. Archana EE585: Fault Tolerance Computing
EE585: Fault Tolerance Computing Contd… In ARGOS, system is tested in Space environment created. Different approaches to fault injection in electronic systems: 1. Disturb the signals on the pins of the pins. 2. Radiation. 3. Power Supply Disturbance. 4. Logic simulation. Archana EE585: Fault Tolerance Computing
EE585: Fault Tolerance Computing Conclusion Determined the tradeoffs between fault tolerance and fault avoidance techniques and finally come up with an efficient blend of technique suitable. Hardware and Software fault tolerance techniques are studied. Archana EE585: Fault Tolerance Computing
EE585: Fault Tolerance Computing References Fault Tolerant Systems in a Space Environment. - Philip P.Shirvani and Edward J. McCluskey. (Stanford University) http://www-crc.stanford.edu/crc_papers/CRC-TR-98-2.pdf Archana EE585: Fault Tolerance Computing
EE585: Fault Tolerance Computing Queries? Archana EE585: Fault Tolerance Computing