Statistical Diagnosis for Intermittent Scan Chain Hold-Time Fault Laboratory for Reliable Computing (LaRC) Electrical Engineering Department National Tsing Hua University Yu Huang, Wu-Tung Cheng, S. M. Reddy, Cheng-Ju Hsieh, Yu-Ting Hung ITC 2003
2 References [1] A Technique for Fault Diagnosis of Defects in Scan Chains Ruifeng Guo, Srikanth Venkataraman ITC 2001 [2] Efficient Diagnosis for Multiple Intermittent Scan Chain Hold-Time Faults Yu Huang, Wu-Tung Cheng, Cheng-Ju Hsieh ATS 2003
3 Outline Introduction Fault Model Hold-time faults Upper bound and lower bound calculation Statistical diagnosis algorithm Experimental results Conclusion
4 Introduction Signal integrity and design integrity issue: SI issue: crosstalk, IR drop, power and ground bounce DI issue: electron migration, hot electrons, wire self-heating Intermittent faults – caused by unpredictable disturbances Internal signal change External noise Stochastically observed and difficult to be modeled
5 Introduction Scan designs are susceptible to hold-time violations Wire delay is difficult to calculate accurately Inserted delay element for fixing hold time? When the hold time margin is small, it may cause the hold-time error Statistical diagnosis Permanent fault is a special type of intermittent fault
6 Fault model of Transition Faults Slow-to-rise: → X Slow-to-fall: → Fast-to-rise: → X Fast-to-fall: →
7 Timing Diagram For a Single Flip-Flop
8 Timing Diagram For a Scan Chain tdtd
9 Hold-Time Fault Type Type-I: captures incorrect data iff a “0 → 1” transition at the input of a faulty cell Type-II: captures incorrect data iff a “1 → 0” transition at the input of a faulty cell Type-III: Hold-time fault happens whenever there is a transition at the input of a faulty cell
10 Fault With Probably Triggered Ex. Type-I hold time fault → or The fault is only triggered with a probability Prob. The diagnosis of the faulty site may not point to the exact faulty scan cell
11 Assumption of Statistical Diagnosis Algorithm Use flush patterns to identify the faulty chains and fault types Other types of scan chain faults can be diagnosed by this method with simple modification The hold-time fault can only happen during the scan chain loading/unloading. The capture is fault-free
12 Upper Bound Calculation X10XXXXXXX10XXXX Shift in Capture Shift outX11XXXXXXX10XXXX X11XXXXXXX11XXXX
13 Set Constraint Effectively Step 1: reduce the faulty masks of the candidate set Based on flushing pattern responses Ex. The candidate set includes scan cells (14,11,8,3) on the chain Step 2: reduced the set by identifying scan cells impossibly to be corrupted Sub-Step 1: set “X” to scan cell s which is impossibly to be corrupted EX. Load X001 to the faulty scan chain and perform logic simulation
14 Set Constraint Effectively Sub-Step 2: All scan cells and POs that captured “X”s are grouped into a set, say G s,all scan cells or POs in G s should be observed to have failures for this pattern by tester (per pattern based total match condition) Sub-Step 3: If more than one scan cells has corrupted values, it might produce fault masking. Observed correctly by the tester Multiple fan-in from faulty chains At least two of these fan-in from faulty chains have sensitive transitions
15 Set Constraint Effectively Sub-Step 4: G s is empty Fault effect is not propagated to other place Still set s to a possibly corrupted cell Step 3: Re-simulate the pattern set for multiple iterations. The upper bound might be updated towards downstream cells 1X01X11X X10XX11XXX10X00X Shift in Capture Shift outX11XX11XXX10X00X 1001X11X X10XX110XX10X00X Shift in Capture Shift outX10XX111XX10X00X
16 Lower Bound Calculation If erroneous values are observed from A i, where A i (k ≧ i ≧ 1) is a scan cell on good chain The faulty behavior on Ai must be caused at one of the set {B i1, B i2, …, B i,si } where B ij (s i ≧ j ≧ 1) satisfies: On the faulty chain Within the Fan-in cone of A i Has a sensitive transition during scan loading Possibly corrupted scan cell The faulty cell must be in the upstream cells of the last downstream cell among {B i1, B i2, …, B i,si }, i.e., the cell is closest to Min j=1..Si (B ij )
17 Lower Bound Calculation Lower_Bound ≧ Max i=1..k (Min j=1..Si (B ij ))
18 Calculate Lower Bound For Multiple Faulty Chain Step 1: If Min ij ≧ upper bound for the faulty chain j, use “—” to replaceMin ij Step 2: If A i has only one item that is not “—”, chain j must be responsible for the observed fault at A i. If the lower bound is less than Min ij, update the lower bound to Min ij Step 3: Apply more patterns to add more columns to the dependency table and repeat step1~2 until no more update for a specified no. of patterns A1A1 A2A2 ……AkAk Lower Bound Faulty Chain 1Min 11 Min 21 ……Min k1 0 Faulty Chain 2Min 12 Min 22 ……Min k2 0 …… 0 Faulty Chain nMin 1n Min 2n ……Min kn 0
19 Statistical Diagnosis Ranking: Calculate the probability of each candidate faulty scan cell Bayes theorem: [X 1, X 2, …, X n ] is a partition of a set of all possible n outcomes of an event X ∵ P(x k ) = 1/Length(j)
20 Assumption UP j and LO j are the upper bound and lower bound for the fault site on one scan chain j: If k is out of range [UP j, LO j ], P(Y|X k )=0 Scan_in Scan_out UP j LO j 0X10XX100
21 Assumption For all sentisitive transitions captured in the downstream of scan cell UP j, we didn’t observe any failures Scan_in Scan_out UP j LO j 0X10XX100
22 Assumption When any sensitive transition loaded into [UP j, LO j ], we deterministically know whether this transition is possibly or impossibly to be corrupted
23 An Example for Statistical Diagnosis P 1 (Y|X k ) = (1-Prob) u if k U_Section[u] = 0otherwise P 2 (Y|X k ) = Prob *(1-Prob) l if k L_Section[l] = 0otherwise Prob = fault observed / total sensitive transition P(Y|X k ) = P 1 (Y|X k ) * P 2 (Y|X k )
24 Information of The Test Cases Designs # of simulated gates # of PIs # of POs # of scan chains Lengths of the longest scan chain # of faulty chains Real faulty sites F Faulty Chain I: (301, 407) Faulty chain II: (57) F M —
25 Experimental Results Designs # of applied scan patterns Upper boundLower bound The cells with the highest probability F13 Faulty Chain I: 301 Faulty Chain I: 177 (299, 300, 301) Faulty Chain II: 57 Faulty Chain II: 0 (57) F216138(10, 11, 12, 13) Diagnosis resolution = (# of candidate faulty cell with highest probability) -1
26 Experimental Results Injected faulty site# of patterns for diagnosisProbability of triggering faultDiagnosis resolution Site %18 80%13 100% %7 80%4 100% %7 80%4 100%4 Site %22 80%17 100% %8 80%5 100% %7 80%3 100%1 Site %15 80%15 100% %6 80%6 100% %2 80%1 100%1
27 Conclusion Proposed a method to calculate an upper/lower bound on the candidate faulty cells The root causes of intermittent scan chain hold time faults are very difficult to model Diagnosis of this problem is helpful to reduce cost of silicon debug and improve yield This method is efficient and effective for large industrial design s with multiple faulty scan chains