EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 1 1 Chapter 8 Coping with Physical Failures, Soft Errors, and Reliability Issues.

Slides:



Advertisements
Similar presentations
IC TESTING.
Advertisements

IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.
Tunable Sensors for Process-Aware Voltage Scaling
LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
An Integrated ECC and Redundancy Repair Scheme for Memory Reliability Enhancement National Tsing Hua University Hsinchu, Taiwan Chin-Lung Su, Yi-Ting Yeh,
Fault-Tolerant Systems Design Part 1.
Single Event Upsets (SEUs) – Soft Errors By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M University, College.
April 30, Cost efficient soft-error protection for ASICs Tuvia Liran; Ramon Chips Ltd.
Microprocessor Reliability
VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects Sarangi et al Prateeksha Satyamoorthy CS
CHALLENGES IN EMBEDDED MEMORY DESIGN AND TEST History and Trends In Embedded System Memory.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
Synchronous Digital Design Methodology and Guidelines
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Technical Seminar on Timing Issues in Digital Circuits
Chapter 11 Timing Issues in Digital Systems Boonchuay Supmonchai Integrated Design Application Research (IDAR) Laboratory August 20, 2004; Revised - July.
Mitigating the Performance Degradation due to Faults in Non-Architectural Structures Constantinos Kourouyiannis Veerle Desmet Nikolas Ladas Yiannakis Sazeides.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
Dynamic Scan Clock Control In BIST Circuits Priyadharshini Shanmugasundaram Vishwani D. Agrawal
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science August 20, 2009 Enabling.
X-Compaction Itamar Feldman. Before we begin… Let’s talk about some DFT history: Design For Testability (DFT) has been around since the 1960s. The technology.
Output Hazard-Free Transition Tests for Silicon Calibrated Scan Based Delay Testing Adit D. Singh Gefu Xu Auburn University.
FPGA Defect Tolerance: Impact of Granularity Anthony YuGuy Lemieux December 14, 2005.
Cost-Efficient Soft Error Protection for Embedded Microprocessors
Laboratory of Reliable Computing Department of Electrical Engineering National Tsing Hua University Hsinchu, Taiwan Delay Defect Characteristics and Testing.
Barcelona, Spain November 13, 2005 WAR-1: Assessing SEU Vulnerability Via Circuit-Level Timing Analysis 1 Assessing SEU Vulnerability via Circuit-Level.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
BIST vs. ATPG.
1 Enhancing Random Access Scan for Soft Error Tolerance Fan Wang* Vishwani D. Agrawal Department of Electrical and Computer Engineering, Auburn University,
Software-Based Online Detection of Hardware Defects: Mechanisms, Architectural Support, and Evaluation Kypros Constantinides University of Michigan Onur.
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations ‡ Computer Science and Engineering, UC San Diego variability.org.
Dynamic Test Set Selection Using Implication-Based On-Chip Diagnosis Nuno Alves, Yiwen Shi, Nicholas Imbriglia, and Iris Bahar Brown University Jennifer.
ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTES Fault Modeling.
Testing of integrated circuits and design for testability J. Christiansen CERN - EP/MIC
SiLab presentation on Reliable Computing Combinational Logic Soft Error Analysis and Protection Ali Ahmadi May 2008.
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
MICAS Department of Electrical Engineering (ESAT) Design-In for EMC on digital circuit December 5th, 2005 Low Emission Digital Circuit Design Junfeng Zhou.
Fault-Tolerant Systems Design Part 1.
Test and Test Equipment Joshua Lottich CMPE /23/05.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 8: February 4, 2004 Fault Detection.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2004 Daniel J. Sorin Duke University.
TOPIC : Different levels of Fault model UNIT 2 : Fault Modeling Module 2.1 Modeling Physical fault to logical fault.
Fault-Tolerant Systems Design Part 1.
Section 1  Quickly identify faulty components  Design new, efficient testing methodologies to offset the complexity of FPGA testing as compared to.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Adaptive Online Testing.
EE434 ASIC & Digital Systems Partha Pande School of EECS Washington State University
Hrushikesh Chavan Younggyun Cho Structural Fault Tolerance for SOC.
Eduardo L. Rhod, Álisson Michels, Carlos A. L. Lisbôa, Luigi Carro ETS 2006 Fault Tolerance Against Multiple SEUs using Memory-Based Circuits to Improve.
EE415 VLSI Design THE INVERTER [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
TOPIC : RTD, SST UNIT 5 : BIST and BIST Architectures Module 5.2 Specific BIST Architectures.
Defect-tolerant FPGA Switch Block and Connection Block with Fine-grain Redundancy for Yield Enhancement Anthony J. YuGuy G.F. Lemieux August 25, 2005.
Gill 1 MAPLD 2005/234 Analysis and Reduction Soft Delay Errors in CMOS Circuits Balkaran Gill, Chris Papachristou, and Francis Wolff Department of Electrical.
A Novel, Highly SEU Tolerant Digital Circuit Design Approach By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.
CALTECH CS137 Fall DeHon CS137: Electronic Design Automation Day 9: October 17, 2005 Fault Detection.
CS203 – Advanced Computer Architecture Dependability & Reliability.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
Chapter 5 - Internal Memory 5.1 Semiconductor Main Memory 5.2 Error Correction 5.3 Advanced DRAM Organization.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Sequential circuits and Digital System Reliability
BIC 10503: COMPUTER ARCHITECTURE
Post-Silicon Calibration for Large-Volume Products
Hardware Assisted Fault Tolerance Using Reconfigurable Logic
Presentation transcript:

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 1 1 Chapter 8 Coping with Physical Failures, Soft Errors, and Reliability Issues

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 2 2 What is this chapter about?  Gives an Overview of and Promising Solutions to the Causes of Manufacturing Defects and Soft Errors  Focus on  Signal Integrity  Defect-Based Tests  Process Sensors and Adaptive Design  Soft Errors –BISER –Circuit-Level Approaches  Defect and Error Tolerance

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 3 3 Coping with Physical Failures, Soft Errors, and Reliability Issues  Introduction  Signal Integrity  Manufacture Defects, Process Variations, and Reliability  Soft Errors  Defect and Error Tolerance  Concluding Remarks

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 4 4 Introduction  Defects  Random defects –Caused by manufacturing imperfections and occur in random places  Systematic defects –Caused by process or manufacturing variations Defect level (DL) is a function of process yield (Y) and fault coverage (FC)

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 5 5 Concept of Signal Integrity Signal integrity is the ability of a signal to generate correct responses in a circuit. A signal with good integrity stays within safe margins for its voltage amplitude and transition time.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 6 6 Basic Concept of Integrity Loss  Integrity Loss: any portion of signal that exceeds amplitude-safe and time-safe margin. where Vi is one of the acceptable amplitude levels and is a time frame during which integrity loss occurs.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 7 7 Sources of Integrity Loss  Interconnects  Power Supply Noise  Process Variations

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 8 8 Integrity Loss Sensors/Monitors (1)  Current Sensor  Current sensors are often used to detect the completion of asynchronous circuits.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 9 9 Integrity Loss Sensors/Monitors (2)  Power Supply Noise Sensor  The voltage depends on the power/ground bounces: the higher the PSN is, the longer the propagation and the higher the voltage will be.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Integrity Loss Sensors/Monitors (3)  Noise Detector (ND) Sensor  ND sensor is designed to detect integrity loss due to voltage violations.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Integrity Loss Sensors/Monitors (4)  Integrity Loss Sensor (ILS)  The integrity loss sensor is a delay violation sensor.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Integrity Loss Sensors/Monitors (5)  Jitter Monitor  Jitter is often defined as the time deviation of a signal from its ideal location in time.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Integrity Loss Sensors/Monitors (6)  A ring oscillator can work as a Process Variation Sensor  The variation of delay caused by PV-faults in any of the inverters in the loop results in deviation in the frequency of the oscillator, which can be detected. , where is an odd number of inverters and is the delay of one inverter.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Readout Architectures (1)  BIST-Based Architecture  When a noise or delay violation occurs (flag=1), the contents of all scan cells are then scanned out through Sout for further reliability and diagnosis analysis. BIST Architecture Readout Circuitry

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Readout Architectures (2)  Scan-Based Architecture  At the driving side of an interconnect, pattern generation BSC(PGBSC) is used to generate test patterns. At the receiving side of the interconnect, an observation BSC(OBSC) is used to detect integrity loss.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Readout Architectures (3)  Basic Concept of PV-Test Architecture  On-chip ROs with counters, embedded in a test chip are used to detect process variation by measuring the RO’s frequency shifts.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Manufacture Defects, Process Variations, and Reliability  100% single stuck-at fault coverage cannot guarantee perfect product quality, because there are remaining defects that are: Timing-dependent Sequence-dependent Attributed to timing-dependent, non-single-stuck-at faults

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Structural Tests  A Defect-Based Test Architecture

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Defect-Based Tests  Small Delay Defect Tests  Bridge Defect Tests  N-Detect Tests  Tests  VLV Tests

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Reliability Stress  Concept of Infant Mortality  Methods to screen infant mortality Method I - Burn-in Where ttf is time to failure, C is a constant, is the activation energy (eV), k is the boltzman ’ s constant, and T is an absolute temperature. Method II - Elevated Voltage Stress

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Redundancy and Memory Repair  Redundancy:  Spare rows, columns, or blocks  Repair schemes:  Pellston Technology [Wuu 2005]: If repeated error are detected, disable cache line (set “not to use” bit)  Perform memory BIST at new operating conditions; exclude failing cells and resize cache (cache size can vary larger or smaller, depending on whether new conditions are more favourable or worse)

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Process Sensors and Adaptive design  Compare traditional test structures put on the scribe lines and embed additional process sensors on-chip.  On-Chip Process Sensors:  Process Variation Sensor  Thermal Sensor  Dynamic Voltage Scaling

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Process variation Sensor  Ring oscillators: Many factors can affect the frequency of the ring oscillator such as process variation, temperature and voltage.  Analog Process Variation Sensor: The analog circuit will be sensitive to different process parameters. Neither can report the process variation at the specific spot on the die and unlikely to extract and analyze the data in real time.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Thermal Sensor Thermal Sensor □  On-chip thermal sensors are the last defence to prevent system crash or permanent damage to the chip.  Thermal sensor example: Figure 8.14:Thermal sensor example

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Dynamic Voltage Scaling □  DVS Figure 8.15: Dynamic voltage scaling scheme

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P  Use sleep transistors and dynamic biasing to save power  Use the adaptive test method for smart binning Dynamic Voltage Scaling (cont’d)

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Soft Errors  Introduction  Sources of Soft Errors and SER Trends  Coping with Soft Errors

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Introduction  Soft errors  Soft errors are transient single-event upsets (SEUs) caused by various type of radiation  Cosmic radiation is the major source of soft errors,especially in memories.  Terrestrial radiation is another source of soft errors.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Sources of Soft Errors and SER Trends  If a glitch is induced at the junction (red label) in a memory element, its state can be reversed.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Sources of Soft Errors and SER Trends  Logic circuits are less susceptible to these glitches than memories for the following reasons. The glitch must be of sufficient strength to propagate from the location of the strike. The glitch needs to have a functionally sensitized path to be latched. The glitch must arrive at a latch during its latching window.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Coping with Soft Errors □  As chips are susceptible to soft errors, many soft error protection schemes targeting chip designs have been proposed.  Fault Tolerance  Error-resilient microarchitectures  soft errroe mitigation

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Fault Tolerance □  Removing the source of soft errors to improve the reliability of a chip.  Three fundamental fault tolerance schemes:  Hardware (spatial) redundancy –assumption that defects and radiation particles will only hit on a specific device and not another device  Time (temporal) redundancy –assumption that the radiation strike will not happen on the same circuitry against at a slightly later time  Information redundancy –using error-detecting code or error-correcting code to represent information contents

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P □ Fault Tolerance  Common fault tolerance schemes used in high reliability system  Duplicate and compare –used in mainframes and high-end servers  Triple modular redundancy –used for systems that cannot fail  Redundant multithreading –using error-detecting code or error-correcting code to represent information contents

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Error-Resilient Microarchitectures □  Two representative error-resilient processor microarchitectures  DIVA  Razor  DIVA  Dynamic Implementation Verification Architecture (DIVA)  DIVA Checker –a smaller and simpler shadow processor –contain a functional checker stage (CHK), commit stage (CT), and a watchdog timer(WT)  DIVA Core –The main processor that fetches, decodes, and executes instructions, holding their speculative results in the reorder buffer (ROB)

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Error-Resilient Microarchitectures □  Razor  Dynamic voltage scaling (DVS) is one of the most effective and widely used methods for power-aware computing.  The key idea of Razor is to tune the supply voltage by monitoring the error during circuit of operation; this is accomplished with a shadow unit, but this shadow unit has been pushed all the way down into a Razor flip- flop. This Razor flip-flop is shown in Figure 8.21a.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Error-Resilient Microarchitectures □

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Error-Resilient Microarchitectures □  Razor A reduced overhead Razor flip-flop with the metastability detection circuit is illustrated in Figure 8.21b.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Soft Error Mitigation  Soft error mitigation techniques are to provide partial immunity of a design to potential soft errors while significantly minimizing the required cost over fault tolerance schems.  There are three soft error mitigation methods:  (1) Built-In Soft-Error Resilience (BISER) BISER proposed in [Mitra 2005] can be used to allow scan design to protect a device from soft errors during normal operation.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Soft Error Mitigation  Figure 8.22 shows the BISER scan cell design that reduces the impact of soft errors affecting storage elements by more than 20 times.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Soft Error Mitigation  Circuit-level approaches (2) Gate resizing for soft error mitigation [Zhou 2006] is based on physical-level design modifications. Figure 8.23 illustrates the effect of gate resizing on the amplitude and width of a 0-to-1 transient at the output of a gate.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Soft Error Mitigation  Circuit-level approaches (3) Netlist transformation for soft error mitigation [Almukhaizim 2006] is based on logic-level design modifications..

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Defect and Error Tolerance  Defect Tolerance  Insert redundancy circuitry in a circuit under test  The circuit can continue correct operation in the presence of defects.  Error Tolerance  Allow the circuit to continue acceptable operation in the presence of errors

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Random Spot defects  Assume a design consists N submodules.  Each module has n unique positions where a defect would cause it to fail its tests.  D defects uniformly distributed over the submodule.  Number of defects in any submodule is independent of the number of defects in other submodules.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Defect Probability  Probability that an arbitrary position on a submodule is associated with a defect is: p = D / (nN)  Probability of having d defects in a given submodule is: P(d) = C(n,d) p d (1-p) n-d where C(n,d) = n! / (d!(n-d)!)

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Poisson Distribution  P(d) is binomially distributed, the average number of defects in an arbitrary submodule is: E(d) = λ = np = D / N  For large n and small p, the binomial distribution can be approximated by Poisson distribution

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Example  Assume a submodule is equally likely to be defect-free or defective:  Thus, λ =  Effective yield can increase significantly if the system can accept some defective submodules.

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Probability of Having Exact d Defects at a Submodule as a Function of Yield (Y) for Various Values of Failure Rate λ d λ = λ = λ = λ = λ = λ = λ = λ = λ = Y = Y = Y = Y = Y = Y = Y = Y = Y =

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Defect Tolerance Switch M M M  Used to be called redundancy repair  A typical defect-tolerant design is shown on the left  Two spares (identical modules)  A switch used to select one module

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Error Tolerance  The main Objective of error tolerance is to increase the effective yield of a process by identifying defective but acceptable chips  This lies in the development of  An accurate method to estimate error rate  An effective method to predict yield

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Fault-Oriented Test Methodology Fault-Oriented Test Methodology  Enhance effective yield based on error-rate analysis  Estimate error rate of each modeled fault  A set of acceptable faults is identified based on their error rates Testing Unacceptable Chips Acceptable Chips Fault Ranking IC Fabrication

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Error-Oriented Test Methodology  Focus on errors produced by defective chips rather than on modeled faults  estimate the error rates of these chips  determine the acceptability of the faulty chips by estimated results Error-Rate Estimation Estimated Error Rate Classification Based on Estimated Error Rate Acceptable Chip Set 1 Acceptable Chip Set 2 Unacceptable Chips … Testing Good Chips IC Fabrication Bad Chips

EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P Concluding Remarks  Circuit Errors can be caused by manufacturing defects and soft errors.  Design for Manufacturability (DFM) – Fault avoidance schemes to cope with physical failures caused by signal integrity, defects, and process variations during manufacturing.  Design for Reliability (DFR) – Embedded error resilience and defect tolerance circuitry on-chip to tolerate soft errors and manufacturing defects.