Download presentation
Presentation is loading. Please wait.
1
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 1 1 Chapter 8 Coping with Physical Failures, Soft Errors, and Reliability Issues
2
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 2 2 What is this chapter about? Gives an Overview of and Promising Solutions to the Causes of Manufacturing Defects and Soft Errors Focus on Signal Integrity Defect-Based Tests Process Sensors and Adaptive Design Soft Errors –BISER –Circuit-Level Approaches Defect and Error Tolerance
3
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 3 3 Coping with Physical Failures, Soft Errors, and Reliability Issues Introduction Signal Integrity Manufacture Defects, Process Variations, and Reliability Soft Errors Defect and Error Tolerance Concluding Remarks
4
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 4 4 Introduction Defects Random defects –Caused by manufacturing imperfections and occur in random places Systematic defects –Caused by process or manufacturing variations Defect level (DL) is a function of process yield (Y) and fault coverage (FC)
5
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 5 5 Concept of Signal Integrity Signal integrity is the ability of a signal to generate correct responses in a circuit. A signal with good integrity stays within safe margins for its voltage amplitude and transition time.
6
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 6 6 Basic Concept of Integrity Loss Integrity Loss: any portion of signal that exceeds amplitude-safe and time-safe margin. where Vi is one of the acceptable amplitude levels and is a time frame during which integrity loss occurs.
7
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 7 7 Sources of Integrity Loss Interconnects Power Supply Noise Process Variations
8
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 8 8 Integrity Loss Sensors/Monitors (1) Current Sensor Current sensors are often used to detect the completion of asynchronous circuits.
9
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 9 9 Integrity Loss Sensors/Monitors (2) Power Supply Noise Sensor The voltage depends on the power/ground bounces: the higher the PSN is, the longer the propagation and the higher the voltage will be.
10
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 10 10 Integrity Loss Sensors/Monitors (3) Noise Detector (ND) Sensor ND sensor is designed to detect integrity loss due to voltage violations.
11
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 11 11 Integrity Loss Sensors/Monitors (4) Integrity Loss Sensor (ILS) The integrity loss sensor is a delay violation sensor.
12
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 12 12 Integrity Loss Sensors/Monitors (5) Jitter Monitor Jitter is often defined as the time deviation of a signal from its ideal location in time.
13
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 13 13 Integrity Loss Sensors/Monitors (6) A ring oscillator can work as a Process Variation Sensor The variation of delay caused by PV-faults in any of the inverters in the loop results in deviation in the frequency of the oscillator, which can be detected. , where is an odd number of inverters and is the delay of one inverter.
14
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 14 14 Readout Architectures (1) BIST-Based Architecture When a noise or delay violation occurs (flag=1), the contents of all scan cells are then scanned out through Sout for further reliability and diagnosis analysis. BIST Architecture Readout Circuitry
15
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 15 15 Readout Architectures (2) Scan-Based Architecture At the driving side of an interconnect, pattern generation BSC(PGBSC) is used to generate test patterns. At the receiving side of the interconnect, an observation BSC(OBSC) is used to detect integrity loss.
16
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 16 16 Readout Architectures (3) Basic Concept of PV-Test Architecture On-chip ROs with counters, embedded in a test chip are used to detect process variation by measuring the RO’s frequency shifts.
17
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 17 17 Manufacture Defects, Process Variations, and Reliability 100% single stuck-at fault coverage cannot guarantee perfect product quality, because there are remaining defects that are: Timing-dependent Sequence-dependent Attributed to timing-dependent, non-single-stuck-at faults
18
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 18 18 Structural Tests A Defect-Based Test Architecture
19
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 19 19 Defect-Based Tests Small Delay Defect Tests Bridge Defect Tests N-Detect Tests Tests VLV Tests
20
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 20 20 Reliability Stress Concept of Infant Mortality Methods to screen infant mortality Method I - Burn-in Where ttf is time to failure, C is a constant, is the activation energy (eV), k is the boltzman ’ s constant, and T is an absolute temperature. Method II - Elevated Voltage Stress
21
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 21 21 Redundancy and Memory Repair Redundancy: Spare rows, columns, or blocks Repair schemes: Pellston Technology [Wuu 2005]: If repeated error are detected, disable cache line (set “not to use” bit) Perform memory BIST at new operating conditions; exclude failing cells and resize cache (cache size can vary larger or smaller, depending on whether new conditions are more favourable or worse)
22
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 22 22 Process Sensors and Adaptive design Compare traditional test structures put on the scribe lines and embed additional process sensors on-chip. On-Chip Process Sensors: Process Variation Sensor Thermal Sensor Dynamic Voltage Scaling
23
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 23 23 Process variation Sensor Ring oscillators: Many factors can affect the frequency of the ring oscillator such as process variation, temperature and voltage. Analog Process Variation Sensor: The analog circuit will be sensitive to different process parameters. Neither can report the process variation at the specific spot on the die and unlikely to extract and analyze the data in real time.
24
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 24 24 Thermal Sensor Thermal Sensor □ On-chip thermal sensors are the last defence to prevent system crash or permanent damage to the chip. Thermal sensor example: Figure 8.14:Thermal sensor example
25
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 25 25 Dynamic Voltage Scaling □ DVS Figure 8.15: Dynamic voltage scaling scheme
26
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 26 26 Use sleep transistors and dynamic biasing to save power Use the adaptive test method for smart binning Dynamic Voltage Scaling (cont’d)
27
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 27 27 Soft Errors Introduction Sources of Soft Errors and SER Trends Coping with Soft Errors
28
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 28 28 Introduction Soft errors Soft errors are transient single-event upsets (SEUs) caused by various type of radiation Cosmic radiation is the major source of soft errors,especially in memories. Terrestrial radiation is another source of soft errors.
29
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 29 29 Sources of Soft Errors and SER Trends If a glitch is induced at the junction (red label) in a memory element, its state can be reversed.
30
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 30 30 Sources of Soft Errors and SER Trends Logic circuits are less susceptible to these glitches than memories for the following reasons. The glitch must be of sufficient strength to propagate from the location of the strike. The glitch needs to have a functionally sensitized path to be latched. The glitch must arrive at a latch during its latching window.
31
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 31 31 Coping with Soft Errors □ As chips are susceptible to soft errors, many soft error protection schemes targeting chip designs have been proposed. Fault Tolerance Error-resilient microarchitectures soft errroe mitigation
32
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 32 32 Fault Tolerance □ Removing the source of soft errors to improve the reliability of a chip. Three fundamental fault tolerance schemes: Hardware (spatial) redundancy –assumption that defects and radiation particles will only hit on a specific device and not another device Time (temporal) redundancy –assumption that the radiation strike will not happen on the same circuitry against at a slightly later time Information redundancy –using error-detecting code or error-correcting code to represent information contents
33
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 33 33 □ Fault Tolerance Common fault tolerance schemes used in high reliability system Duplicate and compare –used in mainframes and high-end servers Triple modular redundancy –used for systems that cannot fail Redundant multithreading –using error-detecting code or error-correcting code to represent information contents
34
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 34 34 Error-Resilient Microarchitectures □ Two representative error-resilient processor microarchitectures DIVA Razor DIVA Dynamic Implementation Verification Architecture (DIVA) DIVA Checker –a smaller and simpler shadow processor –contain a functional checker stage (CHK), commit stage (CT), and a watchdog timer(WT) DIVA Core –The main processor that fetches, decodes, and executes instructions, holding their speculative results in the reorder buffer (ROB)
35
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 35 35 Error-Resilient Microarchitectures □ Razor Dynamic voltage scaling (DVS) is one of the most effective and widely used methods for power-aware computing. The key idea of Razor is to tune the supply voltage by monitoring the error during circuit of operation; this is accomplished with a shadow unit, but this shadow unit has been pushed all the way down into a Razor flip- flop. This Razor flip-flop is shown in Figure 8.21a.
36
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 36 36 Error-Resilient Microarchitectures □
37
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 37 37 Error-Resilient Microarchitectures □ Razor A reduced overhead Razor flip-flop with the metastability detection circuit is illustrated in Figure 8.21b.
38
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 38 38 Soft Error Mitigation Soft error mitigation techniques are to provide partial immunity of a design to potential soft errors while significantly minimizing the required cost over fault tolerance schems. There are three soft error mitigation methods: (1) Built-In Soft-Error Resilience (BISER) BISER proposed in [Mitra 2005] can be used to allow scan design to protect a device from soft errors during normal operation.
39
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 39 39 Soft Error Mitigation Figure 8.22 shows the BISER scan cell design that reduces the impact of soft errors affecting storage elements by more than 20 times.
40
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 40 40 Soft Error Mitigation Circuit-level approaches (2) Gate resizing for soft error mitigation [Zhou 2006] is based on physical-level design modifications. Figure 8.23 illustrates the effect of gate resizing on the amplitude and width of a 0-to-1 transient at the output of a gate.
41
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 41 41 Soft Error Mitigation Circuit-level approaches (3) Netlist transformation for soft error mitigation [Almukhaizim 2006] is based on logic-level design modifications..
42
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 42 42 Defect and Error Tolerance Defect Tolerance Insert redundancy circuitry in a circuit under test The circuit can continue correct operation in the presence of defects. Error Tolerance Allow the circuit to continue acceptable operation in the presence of errors
43
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 43 43 Random Spot defects Assume a design consists N submodules. Each module has n unique positions where a defect would cause it to fail its tests. D defects uniformly distributed over the submodule. Number of defects in any submodule is independent of the number of defects in other submodules.
44
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 44 44 Defect Probability Probability that an arbitrary position on a submodule is associated with a defect is: p = D / (nN) Probability of having d defects in a given submodule is: P(d) = C(n,d) p d (1-p) n-d where C(n,d) = n! / (d!(n-d)!)
45
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 45 45 Poisson Distribution P(d) is binomially distributed, the average number of defects in an arbitrary submodule is: E(d) = λ = np = D / N For large n and small p, the binomial distribution can be approximated by Poisson distribution
46
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 46 46 Example Assume a submodule is equally likely to be defect-free or defective: Thus, λ = 0.693. Effective yield can increase significantly if the system can accept some defective submodules.
47
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 47 47 Probability of Having Exact d Defects at a Submodule as a Function of Yield (Y) for Various Values of Failure Rate λ d λ = 0.105 λ = 0.223 λ = 0.357 λ = 0.511 λ = 0.693 λ = 0.916 λ = 1.204 λ = 1.609 λ = 2.303 Y = 0.90 0.09 Y = 0.80 0.18 0.02 Y = 0.70 0.25 0.04 0.01 Y = 0.60 0.31 0.08 0.01 Y = 0.50 0.35 0.12 0.03 Y = 0.40 0.37 0.17 0.05 0.01 Y = 0.30 0.36 0.22 0.09 0.03 0.01 Y = 0.20 0.32 0.26 0.14 0.06 0.02 Y = 0.10 0.23 0.27 0.20 0.12 0.05 0.02 0.01 0123456701234567
48
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 48 48 Defect Tolerance Switch M M M Used to be called redundancy repair A typical defect-tolerant design is shown on the left Two spares (identical modules) A switch used to select one module
49
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 49 49 Error Tolerance The main Objective of error tolerance is to increase the effective yield of a process by identifying defective but acceptable chips This lies in the development of An accurate method to estimate error rate An effective method to predict yield
50
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 50 50 Fault-Oriented Test Methodology Fault-Oriented Test Methodology Enhance effective yield based on error-rate analysis Estimate error rate of each modeled fault A set of acceptable faults is identified based on their error rates Testing Unacceptable Chips Acceptable Chips Fault Ranking IC Fabrication
51
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 51 51 Error-Oriented Test Methodology Focus on errors produced by defective chips rather than on modeled faults estimate the error rates of these chips determine the acceptability of the faulty chips by estimated results Error-Rate Estimation Estimated Error Rate Classification Based on Estimated Error Rate Acceptable Chip Set 1 Acceptable Chip Set 2 Unacceptable Chips … Testing Good Chips IC Fabrication Bad Chips
52
EE141 System-on-Chip Test Architectures Ch. 8 – Physical Failures - P. 52 52 Concluding Remarks Circuit Errors can be caused by manufacturing defects and soft errors. Design for Manufacturability (DFM) – Fault avoidance schemes to cope with physical failures caused by signal integrity, defects, and process variations during manufacturing. Design for Reliability (DFR) – Embedded error resilience and defect tolerance circuitry on-chip to tolerate soft errors and manufacturing defects.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.