At-Speed Test Considering Deep Submicron Effects

Slides:



Advertisements
Similar presentations
IC TESTING.
Advertisements

Digital Integrated Circuits© Prentice Hall 1995 Design Methodologies Design for Test.
Retiming Scan Circuit To Eliminate Timing Penalty
Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 121 Lecture 12 Advanced Combinational ATPG Algorithms  FAN – Multiple Backtrace (1983)  TOPS – Dominators.
Copyright 2001, Agrawal & BushnellLecture 12: DFT and Scan1 VLSI Testing Lecture 10: DFT and Scan n Definitions n Ad-hoc methods n Scan design  Design.
Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 261 Lecture 26 Logic BIST Architectures n Motivation n Built-in Logic Block Observer (BILBO) n Test.
Slides based on Kewal Saluja
March 23, 2001VLSI Test: Bushnell-Agrawal/Lecture 211 Lecture 21 I DDQ Current Testing n Definition n Faults detected by I DDQ tests n Vector generation.
Supply Voltage Noise Aware ATPG for Transition Delay Faults Nisar Ahmed and M. Tehranipoor University of Connecticut Vinay Jayaram Texas Instruments, TX.
Aiman El-Maleh, Ali Alsuwaiyan King Fahd University of Petroleum & Minerals, Dept. of Computer Eng., Saudi Arabia Aiman El-Maleh, Ali Alsuwaiyan King Fahd.
Copyright 2005, Agrawal & BushnellVLSI Test: Lecture 19alt1 Lecture 19alt I DDQ Testing (Alternative for Lectures 21 and 22) n Definition n Faults detected.
Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 211 Lecture 21 I DDQ Current Testing n Definition n Faults detected by I DDQ tests n Vector generation.
X-Compaction Itamar Feldman. Before we begin… Let’s talk about some DFT history: Design For Testability (DFT) has been around since the 1960s. The technology.
5/1/2006VTS'061 Upper Bounding Fault Coverage by Structural Analysis and Signal Monitoring Vishwani D. Agrawal Auburn University, Dept. of ECE, Auburn,
1 Lecture 23 Design for Testability (DFT): Full-Scan n Definition n Ad-hoc methods n Scan design Design rules Scan register Scan flip-flops Scan test sequences.
HIGH-SPEED VLSI TESTING WITH SLOW TEST EQUIPMENT Vishwani D. Agrawal Agere Systems Processor Architectures and Compilers Research Murray Hill, NJ
Output Hazard-Free Transition Tests for Silicon Calibrated Scan Based Delay Testing Adit D. Singh Gefu Xu Auburn University.
Practically Realizing Random Access Scan Anand S. Mudlapur Department of Electrical and Computer Engineering Auburn University, AL USA.
1 Oct 24-26, 2006 ITC'06 Fault Coverage Estimation for Non-Random Functional Input Sequences Soumitra Bose Intel Corporation, Design Technology, Folsom,
10/25/2007 ITC-07 Paper Delay Fault Simulation with Bounded Gate Delay Model Soumitra Bose Design Technology, Intel Corp. Folsom, CA Hillary.
Institute of Digital and Computer Systems 1 Fabio Garzia / Finding Peak Performance in a Process23/06/2015 Chapter 5 Finding Peak Performance in a Process.
Vishwani D. Agrawal James J. Danaher Professor
January 16, '02Agrawal: Delay testing1 Delay Testing of Digital Circuits Vishwani D. Agrawal Agere Systems, Murray Hill, NJ USA
HIGH-SPEED VLSI TESTING WITH SLOW TEST EQUIPMENT Vishwani D. Agrawal Agere Systems Processor Architectures and Compilers Research Murray Hill, NJ
Laboratory of Reliable Computing Department of Electrical Engineering National Tsing Hua University Hsinchu, Taiwan Delay Defect Characteristics and Testing.
Class Design Project - Test Generation 1 Class Design Project Test Generation Hillary Grimes III ELEC Project Presentation April 26, 2007.
Signal Integrity Methodology on 300 MHz SoC using ALF libraries and tools Wolfgang Roethig, Ramakrishna Nibhanupudi, Arun Balakrishnan, Gopal Dandu Steven.
Janusz Rajski Nilanjan Mukherjee Mentor Graphics Corporation Janusz Rajski Nilanjan Mukherjee Mentor Graphics Corporation.
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
Testimise projekteerimine: Labor 2 BIST Optimization
Power Reduction for FPGA using Multiple Vdd/Vth
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPaper #15 1 Hongshin Jun, Bill Eklow 9/15/2010 BTW10, Fort Collins, CO PCC - Programmable.
THE INVERTERS. DIGITAL GATES Fundamental Parameters l Functionality l Reliability, Robustness l Area l Performance »Speed (delay) »Power Consumption »Energy.
1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.
SiLab presentation on Reliable Computing Combinational Logic Soft Error Analysis and Protection Ali Ahmadi May 2008.
Washington State University
PRAVEEN VENKATARAMANI VISHWANI D. AGRAWAL Auburn University, Dept. of ECE Auburn, AL 36849, USA 26 th International.
Logic BIST Logic BIST.
ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTEMS
Low Power – High Speed MCML Circuits (II)
A Robust Pulse-triggered Flip-Flop and Enhanced Scan Cell Design
European Test Symposium, May 28, 2008 Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI Kundan.
Robust Low Power VLSI ECE 7502 S2015 Minimum Supply Voltage and Very- Low-Voltage Testing ECE 7502 Class Discussion Elena Weinberg Thursday, April 16,
1 Compacting Test Vector Sets via Strategic Use of Implications Kundan Nepal Electrical Engineering Bucknell University Lewisburg, PA Nuno Alves, Jennifer.
By Praveen Venkataramani
ECE 260B – CSE 241A Testing 1http://vlsicad.ucsd.edu ECE260B – CSE241A Winter 2005 Testing Website:
Weak SRAM Cell Fault Model and a DFT Technique Mohammad Sharifkhani, with special thanks to Andrei Pavlov University of Waterloo.
Introduction to Clock Tree Synthesis
Power Problems in VLSI Circuit Testing Keynote Talk Vishwani D. Agrawal James J. Danaher Professor Electrical and Computer Engineering Auburn University,
04/21/20031 ECE 551: Digital System Design & Synthesis Lecture Set : Functional & Timing Verification 10.2: Faults & Testing.
CS203 – Advanced Computer Architecture
CS203 – Advanced Computer Architecture
Hayri Uğur UYANIK Very Large Scale Integration II - VLSI II
COUPING WITH THE INTERCONNECT
Testability in EOCHL (and beyond…)
Lecture 23 Design for Testability (DFT): Full-Scan (chapter14)
Definition Partial-scan architecture Historical background
ME2500 DESIGN FOR TESTABILITY [Slide 3] DfT Structures for Delay Testing BY DREAMCATCHER
ELEC Digital Logic Circuits Fall 2014 Logic Testing (Chapter 12)
ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTEMS
Timing Analysis 11/21/2018.
Pattern Compression for Multiple Fault Models
Automatic Test Generation for Combinational Circuits
Testing in the Fourth Dimension
Automatic Test Pattern Generation
VLSI Testing Lecture 9: Delay Test
VLSI Testing Lecture 7: Delay Test
Lecture 26 Logic BIST Architectures
Test Data Compression for Scan-Based Testing
A Random Access Scan Architecture to Reduce Hardware Overhead
Presentation transcript:

At-Speed Test Considering Deep Submicron Effects D. M. H. Walker Dept. of Computer Science Texas A&M University walker@cs.tamu.edu

Life as a DFT Engineer Test Cost Quality Yield

Outline Introduction KLPG Results on Silicon Supply Noise & Power Model Conclusions

Test Cost Must Fall Fast Test cost/transistor must follow Moore’s Law Need 100x transistor test cost reduction! ITRS2005, Cost-Perf MPU

But … Test cell cost not following Moore’s Law Already using DFT tester or old ATE Handler and probe card cost not scaling High-speed I/Os cost more Must reduce test time Parallel testing running out of gas Must reduce test time per transistor Constant test time per chip

Reducing Test Time Per Transistor Less time, more transistors  must wiggle more wires in less time Higher power dissipation But… tests/transistor rising to cope with DSM Even higher power dissipation

But … Max Power/Transistor is Falling ITRS2005, Cost-Perf MPU

Future Digital Test All About Power? Fraction of chip that can be fired up at one time is decreasing Mission-mode power constraints Test supply noise Test thermal limits Limits intra-die test parallelism How to screen most defects per Joule?

Eliminate Wasted Energy Useless transitions Scan power Unnecessary capture power Much research/commercial activity Low-odds test patterns Luck – tails of BIST, WRP Shotgun blasts – N-detect, DOREME, TARO, …

Squeeze Chip Harder Instead IDDQ MINVDD … Small Delay Defect KLPG

Our Delay Test Research Defect-Based Delay test ATPG considering: Resistive shorts and opens Process variation Capacitive crosstalk Temperature gradients Power supply noise Power dissipation

Kitchen Sink Fault Model Have we forgotten anything? Crosstalk Die-to-Die Supply Temp Litho Intra-Die Spot Defect Noise Process Variation Functional Failure Local Delay Fault Reliability Hazard Combined Delay Fault Global Delay Fault

Target Realistic Defects Resistive Short Stanojevic et al Resistive Open Madge et al

But… Fault population too large Limited fault model accuracy Limited fab data Limited calibration time, cost Fault model must be abstract enough for fortuitous detection of unmodeled faults “The vectors do the work, not the fault model” – J. H. Patel

Our Approach Test K longest rising/falling paths through each gate/line (KLPG) Targets resistive opens Targets resistive shorts Sensitize opposing lines Few bridges per line with largest critical area [Tripp] Larger K deals with delay uncertainty Supply noise Process variation Delay modeling errors Crosstalk Analogy to N-detect

Outline Introduction KLPG Results on Silicon Supply Noise & Power Model Conclusions

K Longest Paths Per Gate (KLPG) CodGen ATPG developed at Texas A&M CodSim fault simulator Tests K longest paths through each gate/line Detects small defects on each gate/line Covers all transition faults Needs SDF May produce more patterns than TF test

Test Generation Algorithm Search space Scan cells Scan cells Constraints from outside search space

KLPG Test Generation Flow Start Extend the partial path with longest potential delay Insert into the partial path store Y Apply side inputs and perform direct implications Apply heuristics to avoid false paths N Conflict? Complete path? N End Final justification Y

Apply KLPG to Industrial Designs TetraMAX/FastScan dofile/procfile/library Hierarchical Verilog Design Same inputs as TF test generation CPU Time: ~3x TF test generation Memory: 400 MB/1M gates KLPG Test Generator sdf K Test Data ….010001 100001.... Only 0’s & 1’s are different from TF ATPG outputs Test Sequence …… Load pattern Pulse clock ...... Tester

Chips are Slower Using KLPG Test Transition fault test 10 ns KLPG-1 test 11 ns 180 nm / 40k gates / full scan / 2.3k scan cells

Cleaner Shmoo Transition Fault KLPG

KLPG Silicon Experiment TI ASIC design 738K gates (597K gates in 250 MHz clock domain) 130 nm technology 5 clock domains (highest 250 MHz) 8 scan chains, 14 963 muxed D flip-flops in 250 MHz domain 24 devices marginally pass regular TF test

Test Size Comparison Test # Patterns Comments Path Delay Test 744 Tests 2 137 critical paths Regular TF 1 445 Dynamic compaction Randomized TF 1 471 KLPG-1 12 579 Static compaction

Up to 3% delay decrease seen in KLPG-1 KLPG Test Results Up to 3% delay decrease seen in KLPG-1 KLPG unique detects

KLPG with Bridge Faults KLPG-1 targets resistive opens SAF, N-detect, KLPG-1 tests have good coverage of resistive shorts Sar-Dessai and Walker, ITC99 Qiu, Walker et al, TECHCON03, VTS04 Sensitization much easier than propagation Propagate first, then sensitize Ignore input-dependent gate strength Ignore opposing transition

Bridge Fault ATPG Approach Generate longest path through bridge site Set DC bits to sensitize opposing value on bridged line (e.g. 0 opposing ) No extra uncompacted patterns, since need to test resistive opens Else, set opposing value first, then generate path “Top-off” patterns, but may compact

Bridge Fault Robust LOC Results Circuit # Lines # Shorts Robust KLPG-1 with Shorts Robust KLPG-1 w/o Shorts # Test Patterns CPU (m:s) s13207 13 207 26 414 1 006 2:30 909 2:25 s15850 15 850 31 700 520 2:45 472 2:35 s35932 35 932 71 864 43 15:51 36 14:31 s38417 38 417 76 834 1 061 15:03 949 14:21 s38584 38 584 77 168 589 12:00 526 11:20 2 random non-feedback bridges to each line = TF count. Shorts between gate inputs and to power/GND are excluded.

Bridge ATPG Results Modest cost increase Pattern count increases 4.7-19.4% ATPG time increases <9.2% Expect less impact on large designs, due to lower care bit density

KLPG Improvements Compaction Coverage metric Crosstalk

Dynamic Compaction Test Set Compaction Static Compaction Performed after test generation Dynamic Compaction Performed during test generation Classic method good for stuck-at tests but not suitable for path delay tests Develop dynamic compaction for KLPG tests 31

Dynamic Compaction Approach Vector pair and NAs (circles) for Path1 V2 V1 I1 I2 I3 I4 I5 I6 O1 O2 O3 O4 O5 O6 Path1 XX 11 0X X1 1X 00 A O2 O3 O6 O5 Vector pair and NAs for Path1 & 2 V4 I1 I2 I3 I4 I5 I6 O1 O4 Path1 1X 00 XX A x Path2 B Vector pair and NAs (Xs) for Path2 V3 XX 1X X0 I1 I2 I3 I4 I5 I6 O1 O2 O3 O4 O5 O6 x Path2 B Now let’s go to the basic idea of our dynamic compaction algorithm. When a path is generated, a set of necessary assignments necessary to sensitize and propagate the transition along the path are identified. This picture shows Path1 with falling transition through line A. Circles are the necessary assignments for Path1. V1 is the vector pair generated through a PODEM-like final justification procedure. V2 can detect Path1 too. But because of the limitation of PODEM-like final justification algorithm, it can not be generated alone. The second picture gives Path2 with rising transition through line B. Suppose V3 is the only vector pair can test Path2. If we generate V1 and V3 at first, later we can not compact them together because first bit of input I2 conflicts. But potentially we can compact path1 and path2 together since V2 is compatible with V3. How can we do it? The solution is to combine two sets of necessary assignments together and call final justification procedure. SoV4 can be generated successfully to test Path1 and Path2 at the same time. 32

Dynamic Compaction Algorithm Definitions: vector : output for ATE pattern : a set of necessary assignments associated with one or more paths POOL : a data structure to save patterns Check the compatibility between necessary assignments of new path against a pattern in the POOL Generation of final test vector is postponed until test generation is finished 33

Dynamic Compaction Flow Start with new pattern F POOL empty? Y End Insert F into POOL Set P to the first pattern in POOL N Y Conflict check between F and P N End of POOL? Conflict? Set P to the next pattern in POOL Y Combine necessary assignments of F and P & Do Final Justification N N Pass Justification? Update P with F Y Reorder P in POOL 34

Dynamic Compaction Experiments K Longest robustly testable path generation through each line (K=1) Launch-on-shift/capture Compare to static compaction POOL size influence on vector count KLPG-1 vs Transition Fault Test 35

Dynamic Compaction Algorithm Definitions: vector : output for ATE pattern : a set of necessary assignments associated with one or more paths POOL : a data structure to save patterns Check the compatibility between necessary assignments of new path against a pattern in the POOL Generation of final test vector is postponed until test generation is finished 36

Circuits ISCAS 89 benchmark circuits Full scan Unit delay model Chip1 (44K Gates) Partial scan Embedded memories Chip2a (22K Gates) SDF delay 37

Robust Test (launch-on-capture) % Vector Reduction Rate Vector Count 60% 60% 48% 21% 55% 43% 39% 53% 37% 23% 26% 38

POOL Size Influence (LOC robust) # of Vectors 39

KLPG-1 Test Set Construction Non-robust test Robust test Long transition fault test A long transition fault test tests longer paths than a regular transition fault test 40

Test Size (KLPG-1 vs. Transition) chip1 chip2a chip3 Robust Non-robust Long TF Total Comm. TF 289 6 7 302 231 24 4 28 68 425 41 1 467 365 249 134 70 453 528 1192 452 103 1747 1900 619 687 493 1799 2537 4406 1688 550 6644 1445 For comparison, this figure shows the dynamically compacted KLPG-1 test size and Transition Fault Test size. Column 2 to column 4 give the vector size of robust test, top-off Non-robust test and top-off Long transition tests. Please note that many non-robust paths are also compacted into previous generated robust test vectors. This number only shows the new vectors generated. The same for Long TF. Column 5 gives the total vector number of KLPG-1 test. The number of transition fault test vectors generated by the commercial tool is listed in the last column. You can see that our KLPG-1 test size is at the same level with transition fault test. In several cases, KLPG-1 test size is even smaller than transition fault test. For chip3, the gate No is around 600K. KLPG-1 test size is around 5 times of commercial tool. But in this case commercial tool generated very small vector size for this design. This indicates that many transition faults in chip3 are easy-to-detect but testing them through the longest paths results in many more necessary assignments and lower compaction rate. Intuitively KLPG-1 test size should be several times bigger than commercial tool, since we add more constrains into the circuit. Considering the higher quality of KLPG-1 test, it is very promising. 41

Dynamic Compaction Results Dynamic Compaction for KLPG tests Up to 3x reduction in vector count ~2x CPU time increase Small additional memory consumption KLPG-1 test size comparable to commercial transition fault test 42

Dynamic Compaction Future Work Heuristics to accelerate dynamic compaction Advanced algorithms for more optimal results Dynamic compaction for more complicated industrial designs Constraints for power supply noise and temperature 43

Delay Fault Coverage Metric VTS04 metric not constructive for delay test quality Need longest path through each line to accurately compute it – must run KLPG SDQM has same problem Die-to-die and intra-die process variation Die-to-die now done as post-process – wasteful Simple bounds on when to stop path generation – coverage vs. pattern count

Fault Coverage vs. K c7552 Drop fault when UB/LB coverage falloff Most sites need only a few paths

Ideal K in C5315 with Die-to-Die Most sites need 1 or 2 paths Most paths in many-path sites are ~same length Can drop most w/o much coverage loss

Capacitive Crosstalk Crosstalk affects near-critical paths Don’t worry about near-critical due to spot defect – probability dominated by defect Consider case (b)

Capacitive Crosstalk Filter out couplings based on arrival time Use simple greedy algorithm Couplings in order of delay increase Sensitize opposing transition one at a time May miss many little coupling case Worry about timing alignment? Probabilistic Compaction impact? – More care bits

Crosstalk Alignment Need path from PI to crosstalk site to have correct timing KLPG ATPG algorithm uses min/max delay constraints Targets are opposing transition in timing window Constraints narrow as path is built If potential alignment/transition is not realized, drop target Update timing with each crosstalk site, since could set other crosstalk sites to help or oppose

Outline Introduction KLPG Results on Silicon Supply Noise & Power Model Conclusions

Supply Noise Supply noise significantly impacts the timing performance of DSM designs Frequency Gate Density Power Density Supply Voltage Delay sensitivity to voltage Excessive supply noise may come from: Random fill of don’t care bits Test pattern compaction Noise  longer delay  Overkill As technology advances to DSM regime, designs have become more and more sensitive to supply noise. Let’s see the following trends: Operating Frequency and gate density increases which means there are more simultaneous switching activity per unit area so power density increases In the mean time Supply voltage level decreases also gate delay becomes more sensitive to voltage level variation These trends lead to a more significant power supply noise impact on delay Why there’s excessive supply noise in delay testing compared with real functional mode? They may come from two sources: first is random filling of don’t care bits Delay test patterns generated by ATPG usually has a low fill-rate, and most bits are don’t care bits. In industry, random fill of those don’t care bits is usually applied to increase fortuitous detection of non-target defects. Unfortunately, industry data shows random fill can produce excessive supply noise. A second source of excessive noise comes from compaction. A highly compacted test pattern may generate excessive noise as well. As I have mentioned in the motivation slide, excessive noise causes unexpected extra delay, and finally results in noise-induced overkill. So that’s the background of the whole problem.

Concept: Effective Region Circuit extracted as RC network Effective Region for a device: RC time constant < Clock cycle Assumption: all caps in region are equally effective No action in current cycle, irrelevant Discharge in current cycle, effective First, I want to introduce several key concepts for the model. The first concept is effective region. We first extract the circuit as a RC network. Assume a current impulse occurs somewhere in the network. Capacitors around this impulse will begin to discharge from nearby to far away, and result in localized voltage drop. If, a capacitor is far enough away, possibly it will not discharge within the current clock cycle. Such capacitors are considered irrelevant to the noise analysis in current cycle. Therefore, we define an effective region for a switching device as the largest area centered by the switching device, and its RC time constant is less than the clock cycle time. It means, in the current clock cycle, the switching current for a device only comes from capacitors in its effective region. To make things easier, we further assume that all capacitors in the region are equally effective regardless of where the capacitor is. For instance, there are two capacitors A and B. A is very close to the center of effective region. B is close to the border. But as long as A & B has same capacitance. they will get discharged same amount of charge to the switching device. A B

Find Effective Region for a Device Current Algorithm: search region radius from small to maximum Practical improvement: binary search Perform only once for one design r To find out the effective region for a device, we can look at the circular regions centered by this device, and search from a small radius to maximum, and stop once the RC time constant of the area exceeds clock cycle time. A simple and practical improvement on the search algorithm is to do binary search which half of the maximum radius. The effective region is quite static. This is because resistance is static, and decoupling capacitance is also static. Parasitic circuit capacitance is usually much smaller than decoupling capacitance, and it varies little from pattern to pattern. Therefore, we only need to perform this search algorithm once.

Concept: Grid Grid is the smallest unit for analysis RC time small enough compared with clock Uniform voltage level Each Grid Contains: Decoupling capacitance Parasitic capacitance Switching devices Each grid  an Effective Region A second key concept is Grid Effective Region is a concept to define locality. If, two switching device are close enough, they will have same effective region and same voltage level. It saves us time to put them together for analysis. So here we define the grid concept, which is the smallest unit for analysis We divides the whole circuit into n*m grids. Each grid contains: 1) decoupling cap 2) parasitic cap 3) and a bunch of switching device. The RC time constant of a grid is small enough compared with clock cycle time, so we can safely make approximation that all switching device in a grid has the same voltage, and the same effective region. Therefore, we can also say one grid is associated with one effective region. We also view an effective region as a set of grids, instead of an area of devices and capacitors. This grid concept makes our computation much simpler. And it has a slight impact on accuracy. Each Effective Region consists of a set of grids

Grid Noise Model Cd Cp Iswitching_1 Iswitching_n Switching Devices Assumption: Off-chip current ignored during the launch cycle Switching charge is equally provided by all grids in its effective region The basic idea is: During the beginning period of the clock cycle, when most switching activity occurs, the power pads are unable to provide current immediately to satisfy the switching current demand. This is because off-chip inductance prevents the supply current from rising immediately. Therefore, most charge demanded by the switching devices comes from on-chip capacitance in the effective region. Here, the figure shows the model for one grid. Note that here, grids are not independent of each other. Each grid gets charge from the grids in its effective region, and it also get discharged for some other grids if it belongs to their effective regions. Here, we make an assumption that we simply ignore off-chip current. Our reason is, it has little impact on propagation delay since most transitions have completed before the off-chip current rises appreciably. For effective region, we have made an assumption that any capacitors in the effectively region are equally discharged. Now we shift to unit of grid, we also assume that all grids in the effective region are equally discharged.

Grid Noise Model Vmax = ( ( i · Qi )) / ( Cd + Cp ) Iswitching_1 Iswitching_n Switching Devices Vmax = ( ( i · Qi )) / ( Cd + Cp ) Grid i: a grid whose effective region covers current grid Qi: switching charge of Grid i i: fraction of Qi provided by current grid Based on these model, we are now analyzing every grid to calculate its worst-case voltage drop. As I mentioned in the previous slide, a grid should provide charge to those grids if it belongs to their effective region. So the worst-case voltage drop here is the total charge provided by this grid, divided by the grid capacitance. Here Grid I is a grid whose effective region covers current grid. Qi is the total switching charge demanded by Grid I. Qi is shared by all grids in grid I’s effective region. so alpha I is the fraction of Qi that comes current grid. Using this function, we can find out maximum voltage drop for all the grids on the circuit.

Switching Current Model Dynamic Charging Current Ipeak tbegin tend t Dynamic Charging Current: Look-up table by simulation Charge: Q = 0.5 · Ipeak · (tend – tbegin) Short Circuit Current: empirical function (Saturation Current, wire and device capacitance) Switching charge needs to be calculated for each device, so we need switching current model for this calculation Switching current drawn from the supply network in CMOS circuits mainly consists of two parts, the dynamic charging current on the output capacitive load, and the short circuit current. In most cases dynamic charging current is more significant than short circuit current. Here’s the dynamic charging current waveform. We model it as triangular. A table is built by simulation for each cell, such that one can determine its peak current and output transition time for different values of output load and input slope. Once we get the peak current and transition time from the table, we can get the total charge by calculating the area of this triangle. Short current is usually insignificant compared with dynamic charging current. We simply use an empirical function here to calculate short circuit charge.

Delay Model Look-up table at nominal voltage By simulation Delay = f(tin, Cout) Out_slew = g(tin, Cout) Delay/slew is linear to supply voltage linear factor by simulation In our work, we first model both nominal delay and transition time as a function of input slope and output capacitive load. A look-up table is built for each library cell using simulation. We then use the linear model to calculate real delay and slew rate as a function of voltage. Again, the linear factor comes from library cell simulation. In practice, we take voltage drop as half of worst-case, and apply delay model to calculate noise-aware delay.

Supply Noise Analysis Flow Start End Find Effective Region skip Calculate Delay Get Voltage Drop Load Vector Here’s the comprehensive supply noise analysis flow For each circuit, we need to associate each grid with a an effective region. This procedure only needs to be done once for each design before the first test pattern applied, then it can be skipped for the following patterns. F each test pattern, we will do logic simulation, assign switching charge to related grids, calculate voltage drop for each grid, and then calculate noise-aware delay. The complexity of this procedure is O(cell_count + grid_count2). In practice, we just make grid_count less than square root of cell_count, since it’s enough for accuracy. so the actual complexity is linear to cell_count only, which is the same as logic simulation. Switching Charge assigned to grids Logic Simulation Complexity: O(cell_count + grid_count2) Typically grid_count2 < cell_count

Experimental Design Experiments on NXP design 130nm DSP-like design (1M+ transistors) LOC path delay patterns with “X” bits statically sensitized paths  ensures transitions propagate on the target path Filling strategy: randomly set “X” bits to 1 with a specified rate Generate filled patterns using various fill rates We perform experiments on the same design that I introduced earlier. It is a 130nm DSP-like core, with over 1M transistors. We use LOC path delay patterns with lots of don’t care bits. The paths are statically sensitized so that we ensure transitions propagate on the target path. We then apply filling to these patterns, where we randomly set don’t-care bit to 1 with a specified rate, and fill the rest don’t care bits with 0. We also generate several batches of filled patterns using various filling rate.

Experimental Measurements Path delay by analysis is correlated with measurement We then apply our supply noise analysis approach to these test patterns and show correlation with tester measurement. The correlation here is 0.83, which is pretty good. We’ll discuss the offset in the next slide.

Experimental Measurements Noisy patterns cause significant delay increase Measured offset due to delay model characterization In this figure, the patterns are ordered by filling rate on x axis. The bottom blue line is nominal delay by delay model, yellow dots are our noise-aware analysis based on delay model, and the top purple dots are test measurements. This figures shows clearly our noise analysis predicts a similar trend as tester measurements. The offset comes from delay model characterization, since there’s a large mismatch between nominal delay and test measurement when noise is small. Ordered by fill rate

Supply Noise Future Work Supply noise model refinement Off-chip dI/dt current Array-bond chips Ground bounce Better activity estimation Focus effort on noisy patterns Incremental estimation for ATPG Avoid logic simulation

Constant Power Dissipation Constant power  linear temperature rise Easy to characterize Know temperature for each pattern Adjust capture clock timing Longer delay as temperature rises 35-55% delay increase for 100C rise in 65 nm Reorder patterns for constant power dissipation Consider groups of 10 patterns Takes 1-10 ms for ~1C rise 200 bit scan chain @ 100 MHz  2 s/pattern 10 patterns = 20 s << 1 ms

Minimize power variation Constant Power Flow Dynamic Compaction Issues Need fast power model Patterns not independent Power due to both scan-in and scan-out switching Mentor Preferred Fill Reduce capture power Adjacent Fill Reduce average power Reorder Patterns Minimize power variation

Power Modeling Prior work by Touba et al indicated WSA proportional to scan chain switching Improved using scan chain WSA Scan cell feeding more gates likely to cause more circuit switching Most circuit switching during scan happens in first few levels of logic Experiments showed almost no difference in pattern reordering results using model vs. simulation (exact) results

Power Model Results

Constant Power Algorithm Compute shift power of each pattern; /* power model */ Group patterns in order using specified group size; Compute total power P[i] of each group i; Compute average power ave of all groups; while iteration count not exceeded, do for each group i, do if P[i] > (1+pvb)*ave /* pvb = power variance bound = 0.05 here */ Find pattern Pn with lowest power in group j with lowest total power Select pattern Pm with highest power in group i and swap with Pn else if P[i] < (1-pvb)*ave Find pattern Pn with highest power in group j with highest total power Select pattern Pm with lowest power in group i and swap with Pn else continue to next group; Re-compute shift power for Pn-1, Pn, Pm-1, Pm /* power model */ Re-compute total shift power for group i, j Update ave;

s38417 Results

Constant Power Results Fast - < 1 minute on ISCAS89 Std. Dev./Average drops by 2.5-6x ~3% on ISCAS89 circuits Remaining variation mostly due to high-power patterns Solution: veto high-power patterns during compaction

Conclusions Demonstrated KLPG on industrial designs Modest test data volume increase Affordable ATPG time increase Demonstrated noise model on industrial design Demonstrated constant power reordering

Future Work Demonstrate on industrial data Fault Coverage Metric Drop faults detected with high probability Exploit spatial and structural correlation Maximize coupling capacitance Use supply noise model in compaction and filling dI/dt model and multi-cycle launch

Acknowledgements Current Students Zheng Wang Zhongwei Jiang Shiva Ganesan Former Students Jing Wang (AMD) Lei Wu (TI) Wangqi Qiu (Pextra) Colleagues at TI, NXP Sponsors – NSF, SRC

Needs SRC task 1618 liaison Design and test data

More Information http://faculty.cs.tamu.edu/walker http://research.cs.tamu.edu/eda walker@cs.tamu.edu

Questions?