Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programa de Pós-Graduação em Computação Instituto de Informática Universidade Federal do Rio Grande do Sul Porto Alegre – RS – Brazil Semana Acadêmica.

Similar presentations


Presentation on theme: "Programa de Pós-Graduação em Computação Instituto de Informática Universidade Federal do Rio Grande do Sul Porto Alegre – RS – Brazil Semana Acadêmica."— Presentation transcript:

1 Programa de Pós-Graduação em Computação Instituto de Informática Universidade Federal do Rio Grande do Sul Porto Alegre – RS – Brazil Semana Acadêmica PPGC/UFRGS 17/10/2006 PPGC Programa de Pós-Graduação em Computação Dealing with Multiple Simultaneous Faults in Future Technologies Doutorando: Carlos Arthur Lang Lisbôa Orientador: Luigi Carro

2 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 2 Why Multiple Simultaneous Faults ? Future technologies (2010 and beyond) very small transistors and fewer electrons to form the channel ( SETs) transient pulses due to radiation attack will last longer than the propagation delays of gates and cycle times devices will be more sensitive to the effects of electromagnetic noise, neutrons and alpha particles

3 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 3 Single Event Upset Origin 1 0 1 0 0 0 0 1 0 1 0 1 1 1 1 01 1 0 1 1 1 1 0

4 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 4 Why Should One Study Multiple Faults ? Changes in paradigm: Gates will behave statistically, producing correct outputs only a fraction of the time Faster devices cycle times shorter than duration of transient pulses

5 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 5 New paradigm: multiple simultaneous faults new fault tolerance techniques will be required (TMR will no longer provide enough protection) How to Deal with Multiple Faults ?

6 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 6 New paradigm: multiple simultaneous faults new fault tolerance techniques will be required (TMR will no longer provide enough protection) How to deal with this problem ? new materials and manufacturing technologies must be developed OR new design approaches must be taken How to Deal with Multiple Faults ?

7 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 7 New paradigm: multiple simultaneous faults new fault tolerance techniques will be required (TMR will no longer provide enough protection) How to deal with this problem ? How to Deal with Multiple Faults ? new design approaches must be taken (our bet !)

8 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 8 Online Hardening Majority Logic Low cost redundancy Research Evolution - Overview Stochastic Operators TMR and Analog Voter Bit Stream Operators MemProc Statistical Computation 2004 20052006 2007 IOLTS 04 DFT 04 WDES 04 LATW 06 ETS 06 DFT 06 VTS 07 (submitted) ETS 05 SBCCI 05 Research Report SRC 2005 TechCon Research Report DATE 06 PhD Forum

9 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 9 Published Papers Lisbôa, C. and Carro, L., Arithmetic Operators Robust to Multiple Simultaneous Upsets, 10th IEEE International Online Test Symposium - IOLTS 2004, IEEE Computer Society, Funchal, Madeira Island, Portugal, July 2004. Lisbôa, C. and Carro, L., Highly Reliable Arithmetic Multipliers for Future Technologies, in Proceedings of the International Workshop on Dependable Embedded Systems - WDES 2004 - in conjunction with the 23rd International Symposium on Reliable Distributed Systems - SRDS 2004, pp. 13-18. Edited by Becker, L. B. and Kaiser, J., Florianópolis, October 17, 2004. Lisbôa, C. and Carro, L., Arithmetic Operators Robust to Multiple Simultaneous Upsets, in Proceedings of the 19th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems - DFT 2004, pp. 289-297, ISBN0-7695-2241-6. IEEE Computer Society, New York, October 2004.

10 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 10 Published Papers Lisbôa, C. A. L., Carro, L. and Cota, E., RobOps - Arithmetic Operators for Future Technologies, 10th European Test Symposium - ETS 2005, Tallin, Estonia, May 2005. Lisbôa, C. A. L., Schüler, E. and Carro, L., Going Beyond TMR for Protection Against Multiple Faults, in Proceedings of the 18th Symposium on Integrated Circuits and Systems Design - SBCCI 2005, September 2005. Rhod, E.; Lisbôa, C. A. L. and Carro, L., Using Memory to Cope with Simultaneous Transient Faults, in Proceedings of the 7th Latin- American Test Workshop - LATW 2006, pp. 151-156, IEEE Computer Society, New York, March 2006.

11 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 11 Published Papers Rhod, E.; Lisbôa, C. A. L.; Michels, Á. and Carro, L., Fault Tolerance Against Multiple SEUs using Memory-Based Circuits to Improve the Architectural Vulnerability Factor, in Informal Digest of Papers of the 11th IEEE European Test Symposium - ETS 2006, pp. 229-234, IEEE Computer Society, New York, May 2006. Michels, Á., Petroli, L., Lisbôa, C. A. L., Kastensmidt, F. and Carro, L. SET Fault Tolerant Combinational Circuits Based on Majority Logic, in Proceedings of the 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems - DFT 2006, pp. 345- 352, IEEE Computer Society, Los Alamitos, CA, October 2006. Lisbôa, C. A. L., Carro, L., Sonza Reorda, M., and Violante, M. Online Hardening of Programs against SEUs and SETs, in Proceedings of the 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems - DFT 2006, pp. 280-288, IEEE Computer Society, Los Alamitos, CA, October 2006.

12 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 12 Research Approaches - 2004 / 2005 Use of stochastic operators Use of bit stream operators Ensuring voter reliability to use n-MR while dealing with multiple simultaneous faults

13 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 13 Research Evolution - 2004 / 2005 Stochastic Operators IOLTS 2004

14 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 14 Research Evolution - 2004 / 2005 IOLTS 2004 OK for some DSP Applications Stochastic Operators

15 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 15 Research Evolution - 2004 / 2005 Looking for more speed Stochastic Operators Bit Stream Operators DFT 2004 WDES 2004

16 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 16 Research Evolution - 2004 / 2005 Looking for more speed Stochastic Operators Small footprint and fast Bit Stream Operators DFT 2004 WDES 2004

17 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 17 Research Evolution - 2004 / 2005 Looking for more speed Stochastic Operators Analog Voter Bit Stream Operators Looking for tolerant converter ETS 2005 SBCCI 2005

18 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 18 Research Evolution - 2004 / 2005 Looking for more speed Stochastic Operators Tolerant to multiple faults in n-MR solutions Bit Stream Operators Looking for tolerant converter TMR and Analog Voter ETS 2005 SBCCI 2005

19 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 19 Research Evolution - 2004 / 2005 Looking for more speed Stochastic Operators Bit Stream Operators Looking for tolerant converter TMR and Analog Voter Research Report SRC 2005 TechCon

20 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 20 Research approach - 2006 / 2007 cooperation with peers use of memory for computation analog voter + majority logic use of an I-IP to harden instructions

21 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 21 Research approach - 2006 / 2007 cooperation with peers use of memory for computation analog voter + majority logic use of an I-IP to harden instructions low cost redundancy using statistical parallel computation

22 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 22 Research Evolution - 2006 / 2007 Research Report DATE 06 PhD Forum

23 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 23 Research Evolution - 2006 / 2007 MemProc LATW 06 ETS 06 Research Report DATE 06 PhD Forum

24 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 24 Majority Logic Research Evolution - 2006 / 2007 MemProc LATW 06 ETS 06 Research Report DATE 06 PhD Forum DFT 06

25 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 25 Low cost redundancy Majority Logic Research Evolution - 2006 / 2007 MemProc LATW 06 ETS 06 Research Report DATE 06 PhD Forum DFT 06

26 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 26 Low cost redundancy Online Hardening Majority Logic Research Evolution - 2006 / 2007 MemProc LATW 06 ETS 06 DFT 06 Research Report DATE 06 PhD Forum DFT 06

27 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 27 Online Hardening Majority Logic Low cost redundancy Research Evolution - 2006 / 2007 MemProc Statistical Computation LATW 06 ETS 06 DFT 06 VTS 07 (submitted) Research Report DATE 06 PhD Forum DFT 06

28 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 28 Current research - motivation faster devices transient pulse duration scaling not proportional to speed scaling transient pulses will last longer than one cycle future technologies

29 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 29 Current research - motivation future technologies faster devices transient pulse duration scaling not proportional to speed scaling transient pulses will last longer than one cycle techniques relying on time redundancy will fail

30 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 30 Current research - motivation alternative approach: space redundancy current solutions: area overhead 100% small granularity does not provide low overhead (what can one do with 50% of a MOSFET ?)

31 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 31 proposed solution: fingerprinting parallel processing on subset of possible inputs small transient fault probability (desired: 0%) Current research - motivation alternative approach: space redundancy current solutions: area overhead 100% small granularity does not provide low overhead (what can one do with 50% of a MOSFET ?)

32 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 32 Current research - focus use of low cost redundancy and statistical computation to cope with transient faults main circuit random checker inputs output error

33 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 33 Sample application Freivalds: matrix multiplication correctness given matrices A and B, n x n given one algorithm that calculates C = A x B goal: check if the algorithm performs correctly by executing thousands of multiplications and comparing the results naive solution: calculate again and compare O(n 3 )

34 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 34 Sample application Freivalds technique 1. generate a random vector r, with values from {0,1} 2. compute vector Cr = C r O(n 2 ) 3. compute vector ABr = A (B x r) O(n 2 ) 4. if C A B, then Pr[Abr = Cr] 1/2 After k independent repetitions of steps 1, 2 and 3: Pr[Abr = Cr] 1/2 k

35 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 35 Sample application Our extension of Freivalds technique 1. generate a random vector r, with values from {0,1} 2. generate a vector r c with r ci = not(r i ) for i = 1:n 3. compute Cr = C r and Crc = C r c 4. compute ABr = A (B x r) and ABrc = A (B x r c ) 5. if ABr Cr OR ABrc Crc, then Pr[Abr Cr] = 1

36 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 36 Sample Implementation C A * B Cr C * r ABr A*(B*r) inputs (A, B) output (C) error matrix multiplier with checker application of Freivalds technique

37 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 37 Sample Implementation Area overhead (# of gates)

38 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 38 Sample implementation Time overhead (# of instructions)

39 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 39 Sample implementation Fault injection results

40 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 40 PhD program requiremnets 36 credits qualifying examination 2 foreign languages proficiency exam academic week seminar Thesis proposal February 2007 Thesis presentation December 2007

41 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 41 Questions ?

42 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 42 Using Stochastic Operators SEU induced transient errors are of random nature Stochastic operators rely on randomness to produce approximate results The injection of random faults in the input signals processed by stochastic operators did not impact the precision of the results 0 faults 2 faults 4 faults8 faults 0.14120.25800.17680.2196 Stochastic Adder Conventional 0.0000 % Errors in 1,000 additions

43 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 43 Using Stochastic Operators SEU induced transient errors are of random nature Stochastic operators rely on randomness to produce approximate results The injection of random faults in the input signals processed by stochastic operators did not impact the precision of the results Several application areas (DSP) can deal with approximate values and still produce acceptable results (outputs)

44 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 44 Using Stochastic Operators Benefit: reduced area of the operators Stochastic multiplier circuit 1000100110011010 1001000100001011 1000000100001010 Stochastic Adder Circuit 01100010101 010111011001 S1S1 S3S3 Sum 01010101101 0010100110101 S2S2

45 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 45 Using Bit Stream Operators Computation principles similar to those of the stochastic adder and multiplier Operators can produce bit streams which represent the exact results of the operation Proposed Multiplication Algorithm - bit stream product (the count of 1s in the stream is equal to the product value) F1 2 1 0 x F2 2 1 0 0. F1 2 F2 0. F1 1 F2 0. F1 0 F2 1. F1 2 F2 1. F1 1 F2 1. F1 0 F2 2. F1 2 F2 2. F1 1 F2 2. F1 0 b48.. b33b32.. b17b16.. b5b4.. b1b0

46 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 46 b48.. b48 b47.. b47... b0.. b0 1 1 1 1 0 0 0 8 times 8 times 8 times +4 total count of 1s = 8 * product + 4 Using Bit Stream Operators Computation principles similar to those of the stochastic adder and multiplier Operators can produce bit streams which represent the exact results of the operation Redundancy is added to the bit streams in order to stand to multiple bit flips Adding robustness to the bit stream through redundancy

47 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 47 Using Bit Stream Operators Computation principles similar to those of the stochastic adder and multiplier Operators can produce bit streams which represent the exact results of the operation Redundancy is added to the bit streams in order to stand to multiple bit flips Conversion of bit streams to binary coded values is delayed as much as possible, and conversion circuits must use TMR or n-MR for protection against faults

48 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 48 Using Bit Stream Operators Computation principles similar to those of the stochastic adder and multiplier Operators can produce bit streams which represent the exact results of the operation Redundancy is added to the bit streams in order to stand to multiple bit flips Conversion of bit streams to binary coded values is delayed as much as possible, and conversion circuits must use TMR or n- MR for protection against faults Issues to be further investigated: size of bit streams and area of the conversion circuits

49 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 49 VOTERVOTER correct output What is Wrong with TMR ? TMR protects only against single faults in one of the modules Module 1 Module 2 Module 3 correct output

50 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 50 Module 2 wrong output What is Wrong with TMR ? Module 1 Module 3 correct output VOTERVOTER TMR protects only against single faults in one of the modules

51 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 51 Module 2 correct output What is Wrong with TMR ? TMR does not protect against double faults in different modules Module 1 Module 3 wrong output VOTERVOTER

52 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 52 VOTERVOTER correct output What is Wrong with TMR ? When a single fault occurs in the voter circuit, the voter output may be wrong Module 1 Module 2 Module 3 correct output

53 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 53 VOTERVOTER correct output ? What is Wrong with TMR ? Module 1 Module 2 Module 3 correct output When a single fault occurs in the voter circuit, the voter output may be wrong

54 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 54 Making TMR (n-MR) more reliable Known solutions imply in area, performance and / or power penalties deadlock: how to protect the output generator ?

55 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 55 Making TMR (n-MR) more reliable Known solutions imply in area, performance and / or power penalties deadlock: how to protect the output generator ? Proposed solution: use TMR to cope with single faults in the modules

56 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 56 Making TMR (n-MR) more reliable Known solutions imply in area, performance and / or power penalties deadlock: how to protect the output generator ? Proposed solution: use TMR to cope with single faults in the modules replace the digital voter by an analog voter that uses a comparator to generate the output

57 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 57 Known solutions imply in area, performance and / or power penalties deadlock: how to protect the output generator ? Proposed solution: use TMR to cope with single faults in the modules replace the digital voter by an analog voter that uses a comparator to generate the output can support some noise, nevertheless producing the correct result Making TMR (n-MR) more reliable

58 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 58 The Analog Voter

59 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 59 Injection of faults in the comparator (*) Minimum Area Comparator (*) using CMOS 0.35µm

60 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 60 Electrical Simulation: Multiple Faults (SPICE and CMOS 0.35 m)

61 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 61 Dealing with Multiple Simultaneous Faults: n-MR The Analog Voter with 5 Inputs (for 5-MR)

62 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 62 Dealing with Multiple Simultaneous Faults: n-MR The Analog Voter with 5 Inputs (for 5-MR) Simulations with injection of 2 simultaneous faults also succeeded

63 Carlos A. L. Lisbôa Semana Acadêmica PPGC/UFRGS 17/10/2006 63 The Analog Voter... Oops ! Does this work ???


Download ppt "Programa de Pós-Graduação em Computação Instituto de Informática Universidade Federal do Rio Grande do Sul Porto Alegre – RS – Brazil Semana Acadêmica."

Similar presentations


Ads by Google