Download presentation
Published byAnn Barker Modified over 9 years ago
1
Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil
PPGEE ’08 Reliability in Nanometer Technologies – Problems and Solutions Dr.-Ing. Frank Sill Department of Electrical Engineering, Federal University of Minas Gerais, Av. Antônio Carlos 6627, CEP: , Belo Horizonte (MG), Brazil
2
Agenda Motivation Failures in Nanometer Technologies
Techniques to Increase Reliability Shadow Transistors PPGEE‘08, Reliability
3
Motivation Reliability important for Normal user Companies
Medical applications Cars Air / Space Environment … PPGEE‘08, Reliability
4
Motivation Probability for failures increases due to:
Wolfdale 410 Mill. Yonah 151 Mill. Prescott 125 Mill. Northwood 55 Mill. Yonah, 151 Mill. Probability for failures increases due to: Increasing transistor count Shrinking technology PPGEE‘08, Reliability
5
Dimensions 10 µm 100 nm 1 mm 100 µm 10 cm 1 cm 1 m „65 nm“-Transistor
Source: „Spektrum der Wissenschaften“ „65 nm“-Transistor Source: Intel PPGEE‘08, Reliability
6
Failures in Nanometer Technologies
7
Process Failures Occur at production phase Based on Process Variations
Particles … Source: Mak PPGEE‘08, Reliability
8
Sub-wavelength Lithography
1 1000 365nm 248nm 193nm 180nm Generation [µ] 130nm Gap Lithography Wavelength [nm] 0,1 100 90nm 65nm Generation 45nm 32nm 13nm EUV 0,01 10 1980 1990 2000 2010 2020 Source: Mark Bohr, Intel PPGEE‘08, Reliability
9
Field-dependent Aberrations
Lens Towards Lens Wafer Plane Center: Minimal Aberrations Edge: High Aberrations Source: R. Pack, Cadence PPGEE‘08, Reliability
10
Varying Line Width 2.3 2.2 2.1 LineWidth [nm] 2.0 1.9 1.8 150 60 100
40 50 20 Wafer X Wafer Y Source: Zhou, 2001 PPGEE‘08, Reliability
11
Random Dopant Fluctuations
Causes Vth Variations Uniform Non-uniform Source: Borkar, Intel PPGEE‘08, Reliability
12
Power Density Sun’s Surface Rocket Nozzle Nuclear Reactor Hot Plate
4004 8008 8080 8085 8086 286 386 486 Pentium® P4 1 10 100 1000 10000 1970 1980 1990 2000 2010 Year Power Density (W/cm2) Rocket Nozzle Nuclear Reactor Prescott Pentium® Hot Plate Source: Moore, ISSCC 2003 PPGEE‘08, Reliability
13
Temperature Variation
Power Map On-Die Temperature Power density is not uniformly distributed across the chip Silicon is not a good heat conductor Max junction temperature is determined by hot-spots Impact on packaging, cooling Source: Borkar, Intel PPGEE‘08, Reliability
14
Temperature Variation cont’d
Power4 Server Chip Source: Devgan, ICCAD’03 PPGEE‘08, Reliability
15
Temperature Variation cont’d
IDS delay Drain current IDS [pA] Delay [s] Temperature [°C] Threshold voltage Vth changes with temperature drain-source current changes delay changes Source: Burleson, UMASS, 2007 PPGEE‘08, Reliability
16
Supply Voltage Drop Source: Trester, 2005 PPGEE‘08, Reliability
17
Failures Through Increasing Delay
Data are processed before clock phase is over Logic too slow! → Data processing longer than clock phase → Wrong Data in next clock phase! Clk Clock (Clk) Clk PPGEE‘08, Reliability
18
Soft Errors Source: Automotive 7-8, 2004 1 In 70’s observed: DRAMs occasionally flip bits for no apparent reason Ultimately linked to alpha particles and cosmic rays Collisions with particles create electron-hole pairs in substrate These carriers are collected on dynamic nodes, disturbing the voltage PPGEE‘08, Reliability
19
Soft Errors cont’d Internal state of node flips shortly
If error isn’t masked by Logic: Wrong input doesn’t lead to wrong output Electrical: Pulse is attenuated by following gates Timing: Data based on pulse reach flipflop after clock transistion wrong data PPGEE‘08, Reliability
20
Electromigration Electromigration:
Void Electromigration: Transport of material caused by the gradual movement of ions in a conductor One of the major failure mechanisms in interconnects. Proportional to the width and thickness of the metal lines Inversely proportional to the current density Top View Metal 1 Metal 1 Whisker, Hillock Cross Section View Metal 1 Metal 2 Source: Plusquellic, UMBC PPGEE‘08, Reliability
21
Electromigration cont’d
Void in 0.45mm Al-0.5%Cu line Source: IMM-Bologna Whiskers in Sn Source: EPA Centre Hillocks in ZnSn Source: Ku&Lin,2007 PPGEE‘08, Reliability
22
Time-Dependent Dielectric Breakdown (TDDB)
Tunneling currents Wear out of gate oxide Creation of conducting path between Gate and Substrate, Drain, Source Depending on electrical field over gate oxide, temperature (exp.), and gate oxide thickness (exp.) Also: abrupt damage due to extreme overvoltage (e.g. Electro- Static Discharge) Source: Pey&Tung Source: Pey&Tung PPGEE‘08, Reliability
23
Variability Trends Source: Burleson, UMASS, 2007 PPGEE‘08, Reliability
24
Variability Trends cont’d
Soft Error / Chip (Logic & Mem) Technology [nm] Source: Borkar, Intel PPGEE‘08, Reliability
25
Variability Trends cont’d
Frequency and sub-threshold leakage variations 1.4 Frequency ~30% Leakage Power ~5-10X 1.3 30% 1.2 130nm ~1000 samples Normalized Frequency 1.1 1.0 5X 0.9 1 2 3 4 5 Normalized Leakage (Isub) Source: Borkar, Intel PPGEE‘08, Reliability
26
Variability Trends cont’d
Increasing probability for Gate-Oxide-Breakdown high-k? Source: Borkar, Intel Source: Kauerauf, EDL, 2002 PPGEE‘08, Reliability
27
100 Billion Transistors Future Designs 100 BT integration capacity
Intermittent failures Billions unusable (variations) Some will fail over time Source: Borkar, Intel PPGEE‘08, Reliability 27
28
Approaches to Increase Reliability
29
Failure Measurement Reliability R(t): Mean Time To Failure MTTF:
Probability of a system to perform as desired until time t Example: R(tx) = 0.8 80 % chance that system is still running at time tx Mean Time To Failure MTTF: Average time that a system runs until it fails Failure rate λ: Probability that system fails in given time interval PPGEE‘08, Reliability
30
Bathtube Failure Model
Wearout period Increasing failure rate Based on TDDB, EM, etc. Infant mortality Declining failure rate Based on latent reliability defects Normal lifetime Constant failure rate Based on TDDB, EM, hot-electrons… Failure rate 1-40 weeks 7-15 years Time PPGEE‘08, Reliability
31
Classification Failure Permanent Temporary Transient Intermittent
Defects , wearout out of range parameters EM TDDB ... Temporary Transient Intermittent Process variations , infant mortality random dopant fluctation ... Radiation Soft errors Non - Power supply , coupling operation peaks Source: Mitra, 2007 PPGEE‘08, Reliability
32
The Whole System Counts!
PPGEE‘08, Reliability
33
Triple Module Redundancy (TMR)
Logic L Input A Voter Copy of Logic L B Output C Be sure everyone has a conduct sheet. Re GROUND RULES: 1. In terms of allowed collaboration vs. individual work, ask if you are not sure. 2. Deactivate all cell phones or pagers during class unless you are on-call during your job. 3. No tape-recording permitted. Take notes. Copy of Logic L PPGEE‘08, Reliability
34
Triple Module Redundancy: Voter
Hardware realization of 1-bit majority voter A OUT = AB+AC+BC Out B C : Requires 2 gate delays PPGEE‘08, Reliability
35
Triple Module Redundancy cont’d
Note: For a constant module failure rate 1.0 TMR Reliability 0.5 Simplex (only 1 module) Time After certain time: Reliability of TMR system is lower than of simplex system Why: After some time probability that 2 modules are wrong is higher that 2 modules are working! PPGEE‘08, Reliability
36
Test inputs and responses
Self Adaptive Design Extend idea of clock domains to Adaptive Power Domains Tackle static process and slowly varying timing variations Control VDD, Vth (indirectly by body bias), fclk by calibration at Power On Module Test Module VDD VBB Test inputs and responses fclk PPGEE‘08, Reliability
37
Self Adaptive Design: Example
21 submodules per die Applying 0.5V Forward/Reverse Body Biasing (FBB/RBB) in steps of 32 mV, respectively noBB ABB within die ABB 100% 97% highest bin 100% yield 60% Accepted die 20% 0% Source: Borkar, Intel Higher Frequency For given Freq and Power density 100% yield with ABB 97% highest freq bin with ABB for within die variability PPGEE‘08, Reliability
38
Razor Flip-Flop For uncertainty- and variation-tolerant design
Razor methodology Voltage-scaling methodology based on real-time detection and correction of circuit timing errors Use the actual hardware to check for errors Latch the input data twice: Once on the clock edge, and then a little later If the data is not the same, you are going too fast Source: Austin, Computer Magazine, 2004 PPGEE‘08, Reliability
39
Razor Flip-Flop cont’d
Source: Austin, 2004 PPGEE‘08, Reliability
40
Shadow Transistor Approach
41
TDDB model TDDB between gate and channel Vout/VDD rel. delay Model:
For an Inverter, 65nm-BPTM: Vout/VDD rel. delay RGC [kΩ] → Model: W= W1+W2 Based on: Segura et. al., “A Detailed Analysis of GOS Defects in MOS Transistors: Testing Implications at Circuit Level” 1995. PPGEE‘08, Reliability
42
TDDB Model cont’d TDDB between gate and source/drain Vout/VDD Model:
For an Inverter, 65nm-BPTM: Vout/VDD Model: RGC [kΩ] → Based on: Segura et. al., “A Detailed Analysis of GOS Defects in MOS Transistors: Testing Implications at Circuit Level” 1995. PPGEE‘08, Reliability
43
Shadow Transistors 1. Insertion of additional transistors in parallel to vulnerable transistors Shadow transistors (ST) RGC [kΩ] → wo/ ST w/ ST RGC [kΩ] → w/ ST wo/ ST For an Inverter, 65nm-BPTM PPGEE‘08, Reliability
44
Shadow Transistors cont’d
2. Application of H-Vt/To transistors with: Higher threshold voltage Thicker gate oxide Less vulnerable to TDDB MTTF – Mean Time To Failure Source: Srinivasan, “RAMP: A Model for Reliability Aware Microprocessor Design” Stathis, J., “Reliability Limits for the Gate Insulator in CMOS Technology” PPGEE‘08, Reliability
45
Shadow Transistors cont’d
3. Selective insertion of shadow transistors in parallel to vulnerable transistors: Component reliability depends on Activity, state, temperature, size, fabrication … Most vulnerable can be identified Shadow transistors only added in parallel to most vulnerable devices. Netlist modification PPGEE‘08, Reliability
46
Shadow Transistors cont’d
3. Selective insertion of shadow transistors in parallel to vulnerable transistors: Component reliability depends on Activity, state, temperature, size, fabrication … Most vulnerable can be identified New Approach Estimation of stress factors Determination of components reliability Adding redundancy only at most vulnerable components Advantage: Lower area, power and delay penalty compared to complete redundancy or random insertion [Sri04] Shadow transistors only added in parallel to most vulnerable devices. Netlist modification Source: [Sri04] Sirisantana, D&T, 2004 PPGEE‘08, Reliability
47
Shadow Transistors cont’d
Increased reliability in respect to TDDB H-Vt/To: Reliability increases by ~5x (for Δtox = 0.15 nm) Remarkable increase of system life time Advantages Higher input capacity → higher delay and dynamic power dissipation Area increase Drawbacks Remarks Only slight improvements for Gate-Drain/Source breakdown H-Vt/To has to be supported by technology PPGEE‘08, Reliability
48
ST – Improvement MTTF ≈ 23 % additional transistors 13.9 % 8.8 %
PPGEE‘08, Reliability
49
ST – Improvement MTTF (H-Vt/To)
Average: MTTF: % Delay: % Pdyn: % Trans: % Average: MTTF: % Delay: % Pdyn: % Trans: % PPGEE‘08, Reliability
50
Take Home Messages Integrated circuits face several kinds of failures
Decreasing structures sizes create more failure sources Future designs should (have to) be failure tolerant Possible approaches: Triple Module Redundancy (TMR) Self-Adapting Designs Razor Flip-Flops Shadow Transistors There’s still a lot to do! PPGEE‘08, Reliability
51
Thank you! franksill@ufmg.br
PPGEE‘08, Reliability
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.