Download presentation
Presentation is loading. Please wait.
1
Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07
2
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)2 Soft Errors Soft errors are the errors caused by the operating environment. They are not due to a permanent hardware fault. Soft errors are intermittent or random, which makes their testing unreliable. One way to deal with soft errors is to make hardware robust: Capable of detecting soft errors Capable of correcting soft errors Both measures are probabilistic
3
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)3 Some Early References J. von Neumann, “Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components,” pp. 329-378, 1959, in A. H. Taub, editor, John von Neumann: Collected Works, Volume V: Design of Computers, Theory of Automata and Numerical Analysis, Oxford University Press, 1963. M. A. Breuer, “Testing for Intermittent Faults in Digital Circuits,” IEEE Trans. Computers, vol. C-22, no. 3, pp. 241-246, March 1973. T. C. May and M. H. Woods, “Alpha-Particle-Induces Soft Errors in Dynamic Memories,” IEEE Trans. Electron Devices, vol. ED-26, no. 1, pp. 2-9, 1979.
4
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)4 Causes of Soft Errors Interconnect coupling (crosstalk). Power supply noise: IR-drop, delta-I. Effects generally attributed to alpha-particles: Charged particles: electrons, protons, ions. Radiation (photons): X-rays, gamma-rays, ultra-violet light.
5
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)5 Sources of Alpha-Particles Radioactive contamination in VLSI packaging material. Ionosphere, magnetosphere and solar radiation. Other electromagnetic radiation.
6
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)6 Alpha-Particle Helium nucleus: two protons and two neutrons, mass = 6.65 ×10 -27 kg, charge = +2e (e = 1.6 ×10 -19 C). Energy = 3.73 GeV
7
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)7 Soft Error Rate (SER) Failures in time (FIT): One FIT is 1 error per billion hours of operation. Alternative unit is mean time between failures (MTBF). 1 year MTBF =10 9 /(365×24)= 114,155 FIT
8
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)8 Particle Strike p - substrate n - + + + + - - Ion or Charged particle
9
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)9 Induced Current time current I(t) = I 0 (e – t/a – e – t/b ),a >> b
10
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)10 Voltage Induced at a Node V = Q/C Where Q = ∫ I(t) dt C = node capacitance Smaller node capacitance will result in larger voltage swing.
11
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)11 Effect on Digital Circuit IN OUT CK Combinational Logic Charged Particles Charged Particles
12
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)12 An SRAM Cell bit VDD WL BL 0 1
13
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)13 SRAM Cell Struck by Alpha-Particle Single-Event Upset (SEU) bit VDD WL BL 0→1 1→0 Charged Particles
14
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)14 D-Latch D CK = 0 Q 1 0
15
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)15 SEU in D-Latch D CK = 0 Q 1→0 0→1 Charged Particles
16
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)16 Single Event Transients in Combinational Logic CK 1111 0 1 0 1 Charged Particles
17
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)17 Effects of Transients Error correcting effects Transient pulse is filtered by gate inertia Transient is blocked by an unsensitized path Transient is blocked by an inactive clock Error enhancing effects Large number of gates can produce multiple pulses Fanouts can multiply error pulses
18
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)18 SEUs in FPGA Parts that can be affected Look-up table (LUT) Configuration memory cell Flip-flop Block RAM
19
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)19 LUT out F1 F2F3F4 1 0 1 1 0 1 1 0 0 0 0 0 1 1 1 0 Memory cells
20
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)20 SEU in LUT out F1 F2F3F4 1 0 1 0 0 1 1 0 0 0 0 0 1 1 1 0 Memory cells Charged Particle 1 changed to 0
21
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)21 Four Types of SEU in FPGA F1 F2 F3 F4 LUT FF M M M M MMM Configuration memory cell Type 1 Type 2 Type 3 Block RAM Type 4
22
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)22 SEU Detection Methods Hardware redundancy Time redundancy Error detection codes (EDC) Self-checker techniques
23
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)23 SEU Mitigation Techniques Triple modular redundancy (TMR) Multiple redundancy with voting Error detection and correction codes (EDAC) Hardened memory cells FPGA-specific methods Reconfiguration Partial configuration Rerouting design
24
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)24 Hardware Redundancy for Detection Combinational Logic Combinational Logic (duplicated) outputinputs Logic 1 indicates error Hardware overhead is high ~ 100% Performance penalty is negligible.
25
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)25 Time Redundancy for Detection Combinational Logic outputinputs Logic 1 indicates error Hardware overhead is low. Performance penalty ( ~ d) = maximum detectable pulse width. D Q CK+ d CK
26
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)26 Repeat on Error Detection Combinational Logic output inputs Logic 1 indicates error D Q CK+ d CK C Operation:If error is detected, then output retains its previous value. Repeating the computation can produce correct result.
27
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)27 Muller C-Element output C A B ABoutput 000 01 Old output 10 111 S Q R A B output
28
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)28 Triple Modular Redundancy (TMR) Combinational Logic copy 1 outputinputs Majority Voter Combinational Logic copy 3 Combinational Logic copy 2
29
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)29 Majority Voter Circuit A B ABCoutput 0000 0010 0100 0111 1000 1011 1101 1111 A B output Majority Voter C C
30
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)30 Alternative Implementations of Voter LUT 0001011100010111 output A B C A B C VDD
31
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)31 Triple Modular Redundancy (TMR) Combinational Logic output inputs D Q CK CK+ d Majority Voter D Q CK+2d CK+3d
32
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)32 TMR for Memory Cells Combinational Logic output inputs D Q CK Majority Voter D Q CK Problems: 1.Accumulation of errors in flip-flops. 1.Voter is not protected.
33
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)33 FF Refresh and TMR for Memory Cells output D Q CK D Q CK Majority Voter Majority Voter Majority Voter Majority Voter r1r2r3
34
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)34 A Resistor Hardened SRAM Cell bit VDD WL BL 0 1
35
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)35 References F. L. Kastensmidt, L. Carro and R. Reis, Fault- Tolerant Techniques for SRAM-Based FPGAs, Springer, 2006. S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, “Robust System Design with Built-In Soft- Error Resilience,” Computer, vol. 38, no. 2, pp. 43-52, February 2005.
36
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)36 Summary of Topics Covered (1) Nanotechnology devices Moore’s law System level design for testability and test scheduling problem Verification Logic equivalence Binary decision diagrams Power consumption and low-power concepts Multi-core parallelism Microprocessors Memories
37
Spring 07, Apr 17, 19ELEC 7770: Advanced VLSI Design (Agrawal)37 Summary of Topics Covered (2) Timing Timing verification Timing simulation Static timing analysis Timing optimization Linear programming and clock constraints Clock skew problem Zero skew design Retiming, constraint graph and performance optimization Soft errors and fault-tolerant design
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.