Embedded Systems Laboratory Informatics Institute Federal University of Rio Grande do Sul Porto Alegre – RS – Brazil SRC TechCon 2005 Portland, Oregon, USA Dealing with Multiple Simultaneous Faults in Future Technologies Carlos A. L. LisbôaErik Schüler Luigi Carro
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Why Multiple Simultaneous Faults ? Future technologies (2010 and beyond) very small transistors and fewer electrons to form the channel ( SETs) transient pulses due to radiation attack will last longer than the propagation delays of gates devices will be more sensitive to the effects of electromagnetic noise, neutrons and alpha particles
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Single Event Upset Origin
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Why Should One Study Multiple Faults ? Change in paradigm: Gates will behave statistically, producing correct outputs only a fraction of the time.
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # New paradigm: multiple simultaneous faults new fault tolerance techniques will be required (TMR will no longer provide enough protection) How to Deal with Multiple Faults ?
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # New paradigm: multiple simultaneous faults new fault tolerance techniques will be required (TMR will no longer provide enough protection) How to deal with this problem ? new materials and manufacturing technologies must be developed OR new design approaches must be taken How to Deal with Multiple Faults ?
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # New paradigm: multiple simultaneous faults new fault tolerance techniques will be required (TMR will no longer provide enough protection) How to deal with this problem ? How to Deal with Multiple Faults ? new design approaches must be taken (our bet !)
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Research Approaches Use of stochastic operators Use of bit stream operators Ensuring voter reliability to use n-MR while dealing with multiple simultaneous faults Next steps: time frame
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Research Evolution OK for some DSP Applications Looking for more speed Stochastic Operators Small footprint and fast Tolerant to multiple faults in n-MR solutions Analog Voter Bit Stream Operators Looking for tolerant converter
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Using Stochastic Operators SEU induced transient errors are of random nature
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Using Stochastic Operators SEU induced transient errors are of random nature Stochastic operators rely on randomness to produce approximate results
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Using Stochastic Operators SEU induced transient errors are of random nature Stochastic operators rely on randomness to produce approximate results The injection of random faults in the input signals processed by stochastic operators did not impact the precision of the results 0 faults 2 faults 4 faults8 faults Stochastic Adder Conventional % Errors in 1,000 additions
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Using Stochastic Operators SEU induced transient errors are of random nature Stochastic operators rely on randomness to produce approximate results The injection of random faults in the input signals processed by stochastic operators did not impact the precision of the results Several application areas (DSP) can deal with approximate values and still produce acceptable results (outputs)
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Using Stochastic Operators Benefit: reduced area of the operators Stochastic multiplier circuit Stochastic Adder Circuit S1S1 S3S3 Sum S2S2
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Using Stochastic Operators How does it work ? Come and see the posters ! No free drinks, but the answer to this question is granted !
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Using Bit Stream Operators Computation principles similar to those of the stochastic adder and multiplier
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Using Bit Stream Operators Computation principles similar to those of the stochastic adder and multiplier Operators can produce bit streams which represent the exact results of the operation Proposed Multiplication Algorithm - bit stream product (the count of 1’s in the stream is equal to the product value) F x F F1 2 F2 0. F1 1 F2 0. F1 0 F2 1. F1 2 F2 1. F1 1 F2 1. F1 0 F2 2. F1 2 F2 2. F1 1 F2 2. F1 0 b48.. b33b32.. b17b16.. b5b4.. b1b0
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # b48.. b48 b47.. b47... b0.. b times 8 times 8 times +4 total count of 1’s = 8 * product + 4 Using Bit Stream Operators Computation principles similar to those of the stochastic adder and multiplier Operators can produce bit streams which represent the exact results of the operation Redundancy is added to the bit streams in order to stand to multiple bit flips Adding robustness to the bit stream through redundancy
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Using Bit Stream Operators Computation principles similar to those of the stochastic adder and multiplier Operators can produce bit streams which represent the exact results of the operation Redundancy is added to the bit streams in order to stand to multiple bit flips Conversion of bit streams to binary coded values is delayed as much as possible, and conversion circuits must use TMR or n-MR for protection against faults
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Using Bit Stream Operators Computation principles similar to those of the stochastic adder and multiplier Operators can produce bit streams which represent the exact results of the operation Redundancy is added to the bit streams in order to stand to multiple bit flips Conversion of bit streams to binary coded values is delayed as much as possible, and conversion circuits must use TMR or n- MR for protection against faults Issues to be further investigated: size of bit streams and area of the conversion circuits
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Using Bit Stream Operators No free food, but some more info on this subject will be provided ! How does it work ? Come and see the posters !
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # VOTERVOTER correct output What is Wrong with TMR ? TMR protects only against single faults in one of the modules Module 1 Module 2 Module 3 correct output
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Module 2 wrong output What is Wrong with TMR ? Module 1 Module 3 correct output VOTERVOTER TMR protects only against single faults in one of the modules
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Module 2 correct output What is Wrong with TMR ? TMR does not protect against double faults in different modules Module 1 Module 3 wrong output VOTERVOTER
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # VOTERVOTER correct output What is Wrong with TMR ? When a single fault occurs in the voter circuit, the voter output may be wrong Module 1 Module 2 Module 3 correct output
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # VOTERVOTER correct output ? What is Wrong with TMR ? Module 1 Module 2 Module 3 correct output When a single fault occurs in the voter circuit, the voter output may be wrong
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Making TMR (n-MR) more reliable Known solutions imply in area, performance and / or power penalties deadlock: how to protect the output generator ?
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Making TMR (n-MR) more reliable Known solutions imply in area, performance and / or power penalties deadlock: how to protect the output generator ? Proposed solution: use TMR to cope with single faults in the modules
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Making TMR (n-MR) more reliable Known solutions imply in area, performance and / or power penalties deadlock: how to protect the output generator ? Proposed solution: use TMR to cope with single faults in the modules replace the digital voter by an analog voter that uses a comparator to generate the output
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Known solutions imply in area, performance and / or power penalties deadlock: how to protect the output generator ? Proposed solution: use TMR to cope with single faults in the modules replace the digital voter by an analog voter that uses a comparator to generate the output can support some noise, nevertheless producing the correct result Making TMR (n-MR) more reliable
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # The Analog Voter
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Injection of faults in the comparator (*) Minimum Area Comparator (*) using CMOS 0.35µm
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Electrical Simulation: Multiple Faults (SPICE and CMOS 0.35 m)
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Dealing with Multiple Simultaneous Faults: n-MR The Analog Voter with 5 Inputs (for 5-MR)
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Dealing with Multiple Simultaneous Faults: n-MR The Analog Voter with 5 Inputs (for 5-MR) Simulations with injection of 2 simultaneous faults also succeeded
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # The Analog Voter... Oops ! Does this work ???
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Let’s see the posters ! The Analog Voter
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Future Work - Short Term ( ) use of signal redundancy with other number representation forms, such as Sigma-Delta
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Future Work - Short Term ( ) use of signal redundancy with other number representation forms, such as Sigma-Delta use of the analog voter as an efficient way to implement robust n-MR circuits
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Future Work - Short Term ( ) use of signal redundancy with other number representation forms, such as Sigma-Delta use of the analog voter as an efficient way to implement robust n-MR circuits investigate the application of statistical methods and neural networks to the design of fault tolerant circuits with minimum redundancy
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Future Work - Long Term ( ) use of logic properties to develop signal redundancy with low cost
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Future Work - Long Term ( ) use of logic properties to develop signal redundancy with low cost apply the developed techniques to actual processors w/ DSP and VLIW architectures
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Future Work - Long Term ( ) use of logic properties to develop signal redundancy with low cost apply the developed techniques to actual processors with DSP and VLIW architectures discuss the architectural impact of new technologies together with fault tolerance
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Research Evolution Stochastic Operators Analog Voter Bit Stream Operators previous work ( )
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Research Evolution Stochastic Operators Analog Voter Bit Stream Operators Sigma Delta previous work ( )
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Research Evolution Stochastic Operators Analog Voter Bit Stream Operators Sigma Delta Logic Properties previous work ( )
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Low cost redundancy Research Evolution Stochastic Operators Analog Voter Bit Stream Operators Sigma Delta Logic Properties previous work ( )
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Application to actual DSP and VLIW processors Low cost redundancy Research Evolution Stochastic Operators Analog Voter Bit Stream Operators Sigma Delta Logic Properties DSP / VLIW previous work ( )
Carlos A. L. Lisbôa SRC TechCon October, 26, Paper # Questions ? Looking forward to answer them at the poster booth! (# 20.4) Contact: Thank You ! No free anything, but a nice chat about these matters will be a pleasure !