January 4-8, 2008VLSI Design 20081 Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering.

Slides:



Advertisements
Similar presentations
IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
Advertisements

Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
Discussion of: “Terrestrial-based Radiation Upsets: A Cautionary Tale” CprE 583 Tony Kuker 12/06/05.
Metal Oxide Semiconductor Field Effect Transistors
Sp09 CMPEN 411 L16 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 16: Introduction to Soft Errors [Adapted from Rabaey’s Digital Integrated Circuits,
Single Event Upsets (SEUs) – Soft Errors By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M University, College.
April 30, Cost efficient soft-error protection for ASICs Tuvia Liran; Ramon Chips Ltd.
Microprocessor Reliability
Geiger-Muller detector and Ionization chamber
2007 MURI Review The Effect of Voltage Fluctuations on the Single Event Transient Response of Deep Submicron Digital Circuits Matthew J. Gadlage 1,2, Ronald.
Radiation Detectors / Particle Detectors
® 1 Shubu Mukherjee, FACT Group Cache Scrubbing in Microprocessors: Myth or Necessity? Practical Experience Report Shubu Mukherjee Joel Emer, Tryggve Fossum,
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
1 Fundamentals of Microelectronics  CH1 Why Microelectronics?  CH2 Basic Physics of Semiconductors  CH3 Diode Circuits  CH4 Physics of Bipolar Transistors.
1 A Design Approach for Radiation-hard Digital Electronics Rajesh Garg Nikhil Jayakumar Sunil P Khatri Gwan Choi Department of Electrical and Computer.
May 14-16, 2008 NATW' Probabilistic Soft Error Rate Estimation from Statistical SEU Parameters Fan Wang* Vishwani D. Agrawal Department of Electrical.
March 16-18, 2008SSST'20081 Soft Error Rate Determination for Nanometer CMOS VLSI Circuits Fan Wang Vishwani D. Agrawal Department of Electrical and Computer.
A Delay-efficient Radiation-hard Digital Design Approach Using Code Word State Preserving (CWSP) Elements Charu Nagpal Rajesh Garg Sunil P. Khatri Department.
04/09/02EECS 3121 Lecture 25: Interconnect Modeling EECS 312 Reading: 8.3 (text), 4.3.2, (2 nd edition)
March 12, 2008Fan's MS Defense1 Soft Error Rate Determination for Nanometer CMOS VLSI Circuits Master’s Defense Fan Wang Department of Electrical and Computer.
Embedded Systems Laboratory Informatics Institute Federal University of Rio Grande do Sul Porto Alegre – RS – Brazil SRC TechCon 2005 Portland, Oregon,
Radiation Effects in Microelectronics EE-698a Course Seminar by Aashish Agrawal.
VLSI System Design – ECES 681 Lecture: Interconnect -1 Prashant Bhadri Office: Rhodes Hall - 933C Department of ECECS, College of.
Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design Vishwani.
1 Enhancing Random Access Scan for Soft Error Tolerance Fan Wang* Vishwani D. Agrawal Department of Electrical and Computer Engineering, Auburn University,
EE415 VLSI Design The Devices: Diode [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Spring 2008 CSE 591 Compilers for Embedded Systems Aviral Shrivastava Department of Computer Science and Engineering Arizona State University.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 32: November 24, 2010 Uncorrelated Noise.
1 Dependability Benchmarking of VLSI Circuits Cristian Constantinescu Intel Corporation.
1 Efficient Analytical Determination of the SEU- induced Pulse Shape Rajesh Garg Sunil P. Khatri Department of ECE Texas A&M University College Station,
Electric Components. Basics 1 Current: electrons moving together in same direction (electrons are always moving in materials like metals but in a random.
1 MAPLD C192Degalahal SESEE: A Soft Error Simulation & Estimation Engine V Degalahal 1, S M Çetiner 2, F Alim 2, N Vijaykrishnan 1, K Ünlü 2, M.
G.K.BHARAD INSTITUTE OF ENGINEERING DIVISION :D (C.E.) Roll Number :67 SUBJECT :PHYSICS SUBJECT CODE : Presentation By: Kartavya Parmar.
Total Dose Effects on Devices and Circuits - Principles and Limits of Ground Evaluation-
EGRE 427 Advanced Digital Design Figures from Application-Specific Integrated Circuits, Michael John Sebastian Smith, Addison Wesley, 1997 Chapter 4 Programmable.
Source: Lundstrom/Fossom/Yang/Neudeck, Purdue EE612 lecture notes.
Reconfiguration Based Fault-Tolerant Systems Design - Survey of Approaches Jan Balach, Jan Balach, Ondřej Novák FIT, CTU in Prague MEMICS 2010.
Single Event Effects in microelectronic circuits Author: Klemen Koselj Advisor: Prof. Dr. Peter Križan.
Space Radiation and Fox Satellites 2011 Space Symposium AMSAT Fox.
SiLab presentation on Reliable Computing Combinational Logic Soft Error Analysis and Protection Ali Ahmadi May 2008.
Soft errors in adder circuits Rajaraman Ramanarayanan, Mary Jane Irwin, Vijaykrishnan Narayanan, Yuan Xie Penn State University Kerry Bernstein IBM.
1 Fundamentals of Microelectronics  CH1 Why Microelectronics?  CH2 Basic Physics of Semiconductors  CH3 Diode Circuits  CH4 Physics of Bipolar Transistors.
Seattle June 24-26, 2004 NASA/DoD IEEE Conference on Evolvable Hardware Self-Repairing Embryonic Memory Arrays Lucian Prodan Mihai Udrescu Mircea Vladutiu.
Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.
VTS 2012: Zhao-Agrawal1 Net Diagnosis using Stuck-at and Transition Fault Models Lixing Zhao* Vishwani D. Agrawal Department of Electrical and Computer.
ForgioneMAPLD 2005/P1041 PDSOI and Radiation Effects: An Overview Josh Forgione NASA GSFC / George Washington University.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2004 Daniel J. Sorin Duke University.
SET Fault Tolerant Combinational Circuits Based on Majority Logic
Trends in IC technology and design J. Christiansen CERN - EP/MIC
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
SEU Hardening Incorporating Extreme Low Power Bitcell Design (SHIELD)
A4 1 Barto "Sequential Circuit Design for Space-borne and Critical Electronics" Dr. Rod L. Barto Spacecraft Digital Electronics Richard B. Katz NASA Goddard.
Paper by F.L. Kastensmidt, G. Neuberger, L. Carro, R. Reis Talk by Nick Boyd 1.
Gill 1 MAPLD 2005/234 Analysis and Reduction Soft Delay Errors in CMOS Circuits Balkaran Gill, Chris Papachristou, and Francis Wolff Department of Electrical.
A Novel, Highly SEU Tolerant Digital Circuit Design Approach By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.
CS203 – Advanced Computer Architecture Dependability & Reliability.
CODES: component degradation simulation tool ESA Project 22381/09/NL/PA.
MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.
1 Introduction to Engineering Fall 2006 Lecture 17: Digital Tools 1.
Comparison Study of Bulk and SOI CMOS Technologies based Rad-hard ADCs in Space Feitao Qi , Tao Liu , Hainan Liu , Chuanbin Zeng , Bo Li , Fazhan Zhao.
Integrated Circuits.
Rad (radiation) Hard Devices used in Space, Military Applications, Nuclear Power in-situ Instrumentation Savanna Krassau 4/21/2017 Abstract: Environments.
SE-Aware HPC Extension : Selective Data Protection for reducing failures due to soft errors 7/20/2006 Kyoungwoo Lee.
CFTP ( Configurable Fault Tolerant Processor )
VLSI Design MOSFET Scaling and CMOS Latch Up
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Soft Error Rates with Inertial and Logical Masking
Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.
R.W. Mann and N. George ECE632 Dec. 2, 2008
Presentation transcript:

January 4-8, 2008VLSI Design Single Event Upset An Embedded Tutorial Fan Wang Vishwani D. Agrawal Department of Electrical and Computer Engineering Auburn University, AL USA 21 th International Conf. on VLSI Design, Hyderabad, India, January 4-8, 2008

January 4-8, 2008VLSI Design Motivation for This Work  With the continuous downscaling of CMOS technologies, the device reliability has become a major bottleneck.  The sensitivity of electronic systems can potentially become a major cause of soft (non-permanent) failures.  It is necessary for both circuit designer and test engineer to have the basic knowledge of soft errors caused by the basic radiation mechanisms, and the soft error mitigation techniques.

January 4-8, 2008VLSI Design Outline  Introduction to Soft Errors  What is Soft Error?  Historical notes  Basic radiation mechanisms in silicon  Soft error resilience techniques  A case study  Conclusion

January 4-8, 2008VLSI Design Introduction to SEU  Certain behaviors in the state of the art electronic circuits caused by random factors.  Single event upset (SEU) is non-permanent, non-functional error.  Definition from NASA Thesaurus: “ Single Event Upset (SEU): Radiation-induced errors in microelectronic circuits caused when charged particles (usually from the radiation belts or from cosmic rays) lose energy by ionizing the medium through which they pass, leaving behind a wake of electron-hole pairs”.

January 4-8, 2008VLSI Design What is Soft Error  A “fault” is the cause of errors.  A non-permanent fault is a non-destructive fault and falls into two categories:  Transient faults, caused by environmental conditions like temperature, humidity, pressure, voltage, power supply, vibrations, fluctuations, electromagnetic interference, ground loops, cosmic rays and alpha particles.  Intermittent faults caused by non-environmental conditions like loose connections, aging components, critical timing, resistive or capacitive variations and noise in the system.  With advances in manufacturing, “soft error” caused by cosmic rays and alpha particles are dominant causes of failures in electronic systems.

January 4-8, 2008VLSI Design Historical Notes  In the period 1954 through 1957 failures in digital electronics were reported during the above-ground nuclear bomb tests.  In 1962, Wallmark and Marcus predicted that cosmic rays would start upsetting microcircuits due to heavy ionized particle strikes when feature sizes become small enough.  In 1970s and early 1980s, the effects of radiation received attention and more researchers examined the physics of these phenomena. Same as the fault tolerant computing theory.  In 1978, May and Woods of Intel Corporation determined that these errors were caused by the alpha particles emitted in the radioactive decay of uranium and thorium present just in few parts-per-million levels in package materials.  In 1979, Guenzer and Wolicki reported that the error causing particles came not only from uranium and thorium but that nuclear reactions generated high energy neutrons and protons. The term “SEU” has been in use since this paper.  In 1979, Ziegler and Lanford from IBM predicted that cosmic rays could result in the same upset phenomenon in electronics (not only memories) even at sea level.

January 4-8, 2008VLSI Design Soft Error Rate of Specific Applications  Figure of Merit: 1.Fail In Time (FIT) 2. MTTF (Mean Time To Failure) The number of failures per 10 9 device hours. 1 year MTTF = 10 9 /(24*365) FIT = 114,155 FIT  SER of contemporary commercial chips is controlled to within 100~1000 FITs!!!  Most hard failure mechanisms produce error rate on the order of 1~100 FIT  Programmable Logic SER is almost 100 times larger than combinational logic FPGAXC4010EXC4010XL Process0.60um0.35um Vcc5v3.3v 1 SEU every1×10 6 hours2.8×10 5 hours M. Ohlsson, P. Dyreklev, K. Johansson and P. Alfke, “Neutron Single Event Upsets in SRAM-Based FPGAs”, proc IEEE Nuclear & Space Radiation Effects Conference Chuck Stroud, “FPGA Architectures and Operation for Tolerating SEUs”, Electrical Engineering VLSI design and test seminar, Spring 2007, Auburn University.  Soft Error Rate for SRAM-Based FPGAs:  Smaller design rule and lower supply voltages  Used radiation chamber to calculate SEU frequency at altitude of 10km at 60°N (Sweden) Projecting this for 3 design rule shrinks and 2 voltage reductions we get ≈1 SEU every 28.2 hrs

January 4-8, 2008VLSI Design Example: SRAM-Based FPGA System* Table cont. *1. Example (1) is tested at Denver, using SpaceRad 4.5 (a software radiation effects prediction software program). Source: Actel. 2. All systems are without any protection.

January 4-8, 2008VLSI Design Radiation Mechanisms for Silicon (1) 1.Alpha particles are emitted when the nucleus of an unstable isotope decays to a lower energy state. (dominant soft error cause for DRAM in 1970s)  Uranium and thorium have the highest activity among naturally occurring radioactive materials.  In the terrestrial environment, major sources of radioactive impurities are lead-based isotopes in solder bumps of the flip-chip technology, gold used for the bond wires and lid plating, aluminum in ceramic packages, lead-frame alloys and interconnect metalization. ** With carefully selected materials, this mechanism effect can be greatly reduced.

January 4-8, 2008VLSI Design Radiation Mechanisms for Silicon (2) 2.High-energy ( > 1 MeV*) neutrons from cosmic radiation induces soft errors in semiconductor devices via secondary ions produced by the neutron reaction with silicon nuclei.  Cosmic rays which are of galactic origin react with the Earth’s atmosphere to produce complex cascades of secondary particles.  Neutrons are the most likely cosmic radiation sources to cause SEU in deep-submicron semiconductors at terrestrial altitude. The neutron flux is dependent on the altitude above sea level, the density of the neutron flux increases with altitude ** Nowadays, Neutron is the major cause among all fail mechanisms. * MeV: Million Electron Volts

January 4-8, 2008VLSI Design Radiation Mechanisms for Silicon (3) 3.The secondary radiation induced from the interaction of cosmic ray neutrons and boron is the third significant source of ionizing particles in electronic systems.  Low-energy cosmic neutron interactions with the isotope boron-10 ( 10 B). 10 B is commonly used as p-type dopant for junction formation IC package. ** This mechanism can be greatly reduced or eliminated by removing source of 10 B Baumann et al, IEEE Trans. Device and Materials Reliability, vol. 1, no. 1, pp. 17–22, 2001.

January 4-8, 2008VLSI Design Single Event Transient (SET)  SET is caused by the generation of charge due to a high-energy particle passing through a sensitive node.  Each SET has its unique characteristics like polarity, waveform, amplitude, duration, etc. depend on particle impact location, particle energy, device technology, device supply voltage and output load.  The off transistors struck by a heavy ion with high enough LET* in the junction area are most sensitive to SEU.  Specifically, the channel region of the off-NMOS transistor and the drain region of the off-PMOS transistor. *Linear Energy Transfer is a measure of the energy transferred to the device per unit length as an ionizing particle travels through a material.

January 4-8, 2008VLSI Design More Details of SET Generation (a) Along the path traverses, the particle produces a dense radial distribution of electron-hole pairs. (b) Outside the depletion region the non-equilibrium charge distribution induces a temporary funnel-shaped potential distortion along the trajectory of the event (drift component). (c) Funnel collapses, diffusion component then dominates the collection process until all excess carriers have been collected, recombined, or diffused away from the junction area. (d) Current vs. Time to illustrate the charge collection and SET generation.

January 4-8, 2008VLSI Design Analytical Model of SET  The time constants depend strongly on the type of ion, its initial energy and the properties of the specific technology.  Approximate analytical model for ion track charge collection is a double-exponential form. It gives an induced current with a rapid rise time but a more gradual fall time: *Typical values are approximately 1.64 x sec for and 5.10x sec for. *Experimental Results from NASA JPL

January 4-8, 2008VLSI Design SET in CMOS Inverter *For example, in ami12 technology, when the output load capacitance is 100fF and the cumulative collected charge is 0.65pC, the amplitude of the voltage pulse is 0.65pC/100fF = 0.65 x C/100 x F = 0.65V.

January 4-8, 2008VLSI Design Soft Error Mitigation Techniques  The soft error tolerant techniques can be classified into two types: recovery and prevention.  Recovery: Recovery error after it does occur. Include on-line recovery mechanisms, fault tolerant computing, ECC/parity check, redundancy etc.  Prevention: The methods to protect microchips from soft-errors before it occurs.  The need for a recovery mechanism stems from the fact that prevention techniques may not be enough for contemporary microchips.  Soft error is not the only reason why computer systems need to resort to a recovery procedure. Random errors due to noise, unreliable components, and coupling effects may also require the recovery mechanism.

January 4-8, 2008VLSI Design Some Mitigation Techniques  Prevention Techniques 1.Purify the Fabrication Material:  Uranium and thorium impurities have been reduced below one hundred parts per trillion for high reliability.  To eliminate 10 B, alternative insulators that don’t contain boron are used. 2.Radiation Hardened Process Technologies  SER performance can be greatly improved by adapting the process technology either to reduce the collected charge or increase the critical charge.  Specific methods: use additional well isolation; replace bulk silicon with SOI. 10x reduction in SER achieved over conventional bulk devices when a fully depleted SOI substrate is used. But SOI is more expensive and parasitic bipolar action limit further reduction of SER.

January 4-8, 2008VLSI Design Picked Mitigation Techniques  Recovery Techniques 1.Redundancy  To gain higher system reliability by sacrificing the minimality of time or space or both.  Classic design: Triple Modular Redundancy (TMR) with majority voter  New design: time redundancy based on C-element gate to compare two samples of combinational primary outputs at t 0 and t 0+d. 2.Error Detection and Correction Code (EDAC)  Simple solution for memory: add a parity bit to each memory word.  In most situations, it must be combined with a system-level approach for error recovery. * S. Mitra, Z. Ming, S. Waqas, N. Seifert, B. Gill, and K. S. Kim, “Combinational Logic Soft Error Correction,” in Proc. International Test Conference, 2006, pp. 1–9.

January 4-8, 2008VLSI Design A Case Study: IBM eServer z990 System  z990 configuration 1.z990 contains 4 pluggable nodes connected through a planar board. 2.Each node contains up to 64 GB physical memory and 32 MB L2 cache for a system capacity of 256 GB memory and 126 MB L2 cache.  Error tolerance techniques used: 1.Extensive use of ECC and parity with retry on data and controls; 2.Full SRAM ECC and parity protection 3.Microprocessor mirroring

January 4-8, 2008VLSI Design Conclusion  SER in logic and memory chips will continue to increase as devices become more sensitive to soft errors at sea level  Open soft error issues: 1.How EDA tools handle soft error hardening? 2.Analysis of radiation mechanisms (too complex to be comprehensive) 3.Soft error rate analysis for logics 4.Error mitigation methods

January 4-8, 2008VLSI Design Useful References and Further Readings 1.“Single Event Phenomena”, (Messenger and Ash, 1993) 2.“Ionizing Radiation Effects in MOS Devices and Circuits”, (Ma and Dressendorfer, 1989) 3.“Handbook of Radiation Effects”, (A. Holmes-Siedle and L. Adams,1993) 4.“Fault-Tolerance Techniques for SRAM-Based FPGAs”, (Kastensmidt, Fernanda Lima, Carro, Luigi, Reis, Ricardo, 2006) 5.Test methods and standard: JEDEC89, JEDEC89A, JEDEC Journals: IEEE Trans on Nuclear Science, IEEE Trans Reliability 7.NASA Goddard’s test group: 7.NASA Space Environment and Effects Program …

January 4-8, 2008VLSI Design Thank You...