Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design Vishwani.

Similar presentations


Presentation on theme: "Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design Vishwani."— Presentation transcript:

1 Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07

2 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)2 Soft Errors  Soft errors are the errors caused by the operating environment.  They are not due to a permanent hardware fault.  Soft errors are intermittent or random, which makes their testing unreliable.  One way to deal with soft errors is to make hardware robust:  Capable of detecting soft errors  Capable of correcting soft errors  Both measures are probabilistic

3 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)3 Some Early References  J. von Neumann, “Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components,” pp. 329-378, 1959, in A. H. Taub, editor, John von Neumann: Collected Works, Volume V: Design of Computers, Theory of Automata and Numerical Analysis, Oxford University Press, 1963.  M. A. Breuer, “Testing for Intermittent Faults in Digital Circuits,” IEEE Trans. Computers, vol. C-22, no. 3, pp. 241-246, March 1973.  T. C. May and M. H. Woods, “Alpha-Particle-Induces Soft Errors in Dynamic Memories,” IEEE Trans. Electron Devices, vol. ED-26, no. 1, pp. 2-9, 1979.

4 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)4 Causes of Soft Errors  Interconnect coupling (crosstalk).  Power supply noise: IR-drop, delta-I.  Effects generally attributed to alpha-particles:  Charged particles: electrons, protons, ions.  Radiation (photons): X-rays, gamma-rays, ultra-violet light.

5 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)5 Sources of Alpha-Particles  Radioactive contamination in VLSI packaging material.  Ionosphere, magnetosphere and solar radiation.  Other electromagnetic radiation.

6 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)6 Alpha-Particle  Helium nucleus: two protons and two neutrons, mass = 6.65 ×10 -27 kg, charge = +2e (e = 1.6 ×10 -19 C).  Energy = 3.73 GeV

7 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)7 Soft Error Rate (SER)  Failures in time (FIT): One FIT is 1 error per billion hours of operation.  Alternative unit is mean time between failures (MTBF). 1 year MTBF =10 9 /(365×24)= 114,155 FIT

8 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)8 Particle Strike p - substrate n - + + + + - - Ion or Charged particle

9 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)9 Induced Current time current I(t) = I 0 (e – t/a – e – t/b ),a >> b

10 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)10 Voltage Induced at a Node V = Q/C Where Q = ∫ I(t) dt C = node capacitance Smaller node capacitance will result in larger voltage swing.

11 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)11 Effect on Digital Circuit IN OUT CK Combinational Logic Charged Particles Charged Particles

12 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)12 An SRAM Cell bit VDD WL BL 0 1

13 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)13 SRAM Cell Struck by Alpha-Particle Single-Event Upset (SEU) bit VDD WL BL 0→1 1→0 Charged Particles

14 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)14 D-Latch D CK = 0 Q 1 0

15 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)15 SEU in D-Latch D CK = 0 Q 1→0 0→1 Charged Particles

16 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)16 Single Event Transients in Combinational Logic CK 1111 0 1 0 1 Charged Particles

17 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)17 Effects of Transients  Error correcting effects  Transient pulse is filtered by gate inertia  Transient is blocked by an unsensitized path  Transient is blocked by an inactive clock  Error enhancing effects  Large number of gates can produce multiple pulses  Fanouts can multiply error pulses

18 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)18 SEUs in FPGA  Parts that can be affected  Look-up table (LUT)  Configuration memory cell  Flip-flop  Block RAM

19 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)19 LUT out F1 F2F3F4 1 0 1 1 0 1 1 0 0 0 0 0 1 1 1 0 Memory cells

20 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)20 SEU in LUT out F1 F2F3F4 1 0 1 0 0 1 1 0 0 0 0 0 1 1 1 0 Memory cells Charged Particle 1 changed to 0

21 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)21 Four Types of SEU in FPGA F1 F2 F3 F4 LUT FF M M M M MMM Configuration memory cell Type 1 Type 2 Type 3 Block RAM Type 4

22 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)22 SEU Detection Methods  Hardware redundancy  Time redundancy  Error detection codes (EDC)  Self-checker techniques

23 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)23 SEU Mitigation Techniques  Triple modular redundancy (TMR)  Multiple redundancy with voting  Error detection and correction codes (EDAC)  Hardened memory cells  FPGA-specific methods  Reconfiguration  Partial configuration  Rerouting design

24 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)24 Hardware Redundancy for Detection Combinational Logic Combinational Logic (duplicated) outputinputs Logic 1 indicates error Hardware overhead is high ~ 100% Performance penalty is negligible.

25 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)25 Time Redundancy for Detection Combinational Logic outputinputs Logic 1 indicates error Hardware overhead is low. Performance penalty ( ~ d) = maximum detectable pulse width. D Q CK+ d CK

26 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)26 Repeat on Error Detection Combinational Logic output inputs Logic 1 indicates error D Q CK+ d CK C Operation:If error is detected, then output retains its previous value. Repeating the computation can produce correct result.

27 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)27 Muller C-Element output C A B ABoutput 000 01 Old output 10 111 S Q R A B output

28 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)28 Triple Modular Redundancy (TMR) Combinational Logic copy 1 outputinputs Majority Voter Combinational Logic copy 3 Combinational Logic copy 2

29 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)29 Majority Voter Circuit A B ABCoutput 0000 0010 0100 0111 1000 1011 1101 1111 A B output Majority Voter C C

30 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)30 Alternative Implementations of Voter LUT 0001011100010111 output A B C A B C VDD

31 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)31 Triple Modular Redundancy (TMR) Combinational Logic output inputs D Q CK CK+ d Majority Voter D Q CK+2d CK+3d

32 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)32 TMR for Memory Cells Combinational Logic output inputs D Q CK Majority Voter D Q CK Problems: 1.Accumulation of errors in flip-flops. 1.Voter is not protected.

33 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)33 FF Refresh and TMR for Memory Cells output D Q CK D Q CK Majority Voter Majority Voter Majority Voter Majority Voter r1r2r3

34 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)34 A Resistor Hardened SRAM Cell bit VDD WL BL 0 1

35 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)35 References  F. L. Kastensmidt, L. Carro and R. Reis, Fault- Tolerant Techniques for SRAM-Based FPGAs, Springer, 2006.  S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, “Robust System Design with Built-In Soft- Error Resilience,” Computer, vol. 38, no. 2, pp. 43-52, February 2005.

36 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)36 Summary of Topics Covered (1)  Nanotechnology devices  Moore’s law  System level design for testability and test scheduling problem  Verification  Logic equivalence  Binary decision diagrams  Power consumption and low-power concepts  Multi-core parallelism  Microprocessors  Memories

37 Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)37 Summary of Topics Covered (2)  Timing  Timing verification  Timing simulation  Static timing analysis  Timing optimization  Linear programming and clock constraints  Clock skew problem  Zero skew design  Retiming, constraint graph and performance optimization  Soft errors and fault-tolerant design


Download ppt "Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design Vishwani."

Similar presentations


Ads by Google