ELEC516/10 Lecture 10 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 10 – Design for Testability Reading Assignment: Kang – CMOS Digital Integrated Circuit: Analysis and Design Chapter 15
ELEC516/10 Lecture 10 2 Testing your prototype!!! Test is time consuming and Test equipment is very expensive!!!! Test cost contributes greatly to the cost of the system (20-30% of the chip cost). You must think about the test during the design –End-up with untestable chip –Test your functionality as well as performance If you don’t test it, It won’t work!!!! PrototypeSpecification ?
ELEC516/10 Lecture 10 3 Introduction Testing is important, probably as important as the design process. Test the chip to make sure it is full functional is highly complex and time-consuming Cost of chip debugging is much higher than that of board-level debugging which is in turn much higher than that of system-level debugging. In production environment, many chips must be tested within a short time fro timely delivery to customers. Therefore design for testability become very critical.
ELEC516/10 Lecture 10 4 Testing Classification Diagnostic test –Used in chip/board level debugging –Defect localization “go/no go” or production test –Used in chip production Parametric test –Voltage and current test, instead of logic test –Check other parameters such as noise margin (NM), threshold voltage (Vt), delay time (tp) and temperature (T).
ELEC516/10 Lecture 10 5 Chip Debugging Design errors or fabrication defect? Micro-probing the die E-beam Single-die repair
ELEC516/10 Lecture 10 6 Testing is Expensive VLSI tester cost several million dollars (US) Volume manufacturing requires large number of testers, maintenance A lot of time, design company cannot afford this and a rental model is commonly used. The rent is counted by time usage. Tester time costs are in $/sec Test cost contributes 20-30% to total chip cost.
ELEC516/10 Lecture 10 7 Types of Testing StepError SourceTest Type DesignDesign flawsDesign Verification PrototypeDesign flaws/ Prototype flaws Functional test ManufacturePhysical defectsManufacture Test ShippingManu. Test, transport System Integration SameFunctional Test ServiceStress, AgeDiagnosis
ELEC516/10 Lecture 10 8 Manufacturing defects During Manufacturing: misalignment, dust and other particles, “stacking faults”, defects in dielectric, mask scratches, thickness variation: layer to layer shorts, discontinuous wires (open), circuit sensitivities (V th, L channel ): found during wafer probe of test structures. During packaging: Defects from scratching in handling, damage during bonding misalignment (need always to check the wire bonding), other defects undetected during wafer probe: found during test of packaged parts. During mounting: Defects from damage during board insertion(thermal, ESD), infant mortality (mfg defects that show up after a few hours of use). Noise problems, susceptibility to latch-up: found during testing/mounting on board. Long term: Defects that appear after months or years of utilization (metal migration, oxide damage during manufacture, impurities): found by the customer Errors can occur at different stage in the life-time of a chip
ELEC516/10 Lecture 10 9 Testers for volume manufacturing Each pin on the chip is driven/observed by a separate set of circuitry which typically can either: – drive the pin to one data value per cycle –or observe the value of the pin at a particular point in a clock cycle. Timing of input transitions and sampling is controlled by a high resolution timing generators Associated with each pin Device under test (DUT) is mounted on the test head
ELEC516/10 Lecture Test Strategy The test using the testers is achieved in many steps: –Supply a set of test vectors that specify an input or output value for every pin on every cycle. –Tester will load the program into the pin cards. –Run the program and report any miss-compares between an output value and the expected value.
ELEC516/10 Lecture Testers for volume manufacturing Behavioral model Specification Design Cycle Test patterns I/O vectors Memory Vcompare error Force/Compare
ELEC516/10 Lecture How many test vectors do we need? For exhaustive test: for a digital circuit with 25 inputs and 50 states, 2 75 cycles are required. Assuming 1us/cycle then test time >10 9 years. Exhaustive test is impractical and unnecessary. –We only need to verify that no faults are present which may take fewer vectors. –In fact many vectors can test the same fault. 2 n inputs required to exhaustively test circuit 2 n+m inputs required to exhaustively test the circuit
ELEC516/10 Lecture Fault Types and Models Testing Goal: to detect faults in fabrication, design and failures due to stressful operating conditions and reliability problems Test process: –Input test vector to the device under test (DUT) as its stimuli –Measured outputs are compared with the expected correct responses to determine the correctness –Difficulty: only system inputs and outputs pins are accessible –Another difficulty – generation of correct test vectors to detect all modeled faults and design errors. –Manual or automatic test pattern generator (ATPG) becomes a difficult task.
ELEC516/10 Lecture Defect causes Physical defects: –Defects in silicon substrate –Photolithographic defects –Mask contamination and scratches –Process variations and abnormalities –Oxide defects Physical defects -> electrical faults –Shorts (bridging faults) or Opens –Transistor stuck-on, stuck-open –Resistive shorts and opens –Excessive change in threshold voltage and current Electrical faults -> logical faults –Logical stuck-at-0 or stuck-at-1 –Slow transition (delay fault) –AND-bridging, OR-bridging
ELEC516/10 Lecture Fault models Traditional models, first developed for board-level tests, assumes that a node gets “stuck” at “0” or “1”, presumably by shorting to GND or Vdd. If the output is faulty the entire gate is “stuck”. There are also cases which would correspond to a transistor stuck or stuck- off. F=(A+B)’ What about Fx (F with stuck off fault)?
ELEC516/10 Lecture Fault Models Most Popular – “stuck-at” model Sa0 (output stuck at 0) Sa1 (input stuck at 1) Covers almost all (other) occurring faults, such as opens and shorts x y w z ,3: x sa1 2: y sa0 or x sa0 3: z sa1
ELEC516/10 Lecture Another example
ELEC516/10 Lecture 10 18
ELEC516/10 Lecture Stuck-at fault Single stuck at fault models are used frequently –Complexity of test generation is greatly reduced –Single stuck-at fault is independent of technology, design style –Single stuck-at tests cover a large percentage of multiple stuck-at fault –Single stuck-at tests cover a large percentage of unmodeled physical defects
ELEC516/10 Lecture Delay fault Cause timing failures at target speed Reason for delay fault –Improper estimation of on-chip interconnect delay and other timing consideration –Excessive variation in the fab. Process -> variations in circuit delay and clock skew –Open in metal line connecting parallel transistors –Aging effects such as hot carrier induced delay increase. Detecting delay fault is even more subtle than detecting functional faults in steady state.
ELEC516/10 Lecture Problem with stuck-at model: CMOS open fault Sequential effect: needs two vectors to ensure detection x y x y z x y z 0 x z n-1 Other options: Use stuck-open or stuck-short models This requires fault-simulation and analysis at the switch or transistor level – very expensive
ELEC516/10 Lecture Cause short circuit between Vdd and GND for A-C=0 and D = 1 Possible approach: –Supply Current Measurement (IDDQ) –Not applicable for gigascale integration Problem with stuck-at model: CMOS short fault A C B D C A D B
ELEC516/10 Lecture Design for Testability Combinational functionSequential function 2 n inputs required to exhaustively test circuit 2 n+m inputs required to exhaustively test the circuit Exhaustive test is impossible or unpractical. We need to find meaningful vectors to test for possible faults????? –Not easy because of limited IO and increased complexity –Concept of: Controllability and observability
ELEC516/10 Lecture Controllability and Observability Controllability – measure of how easy the controller (test engineer) can establish a specific signal value at each node by setting values at the circuit inputs Observability - measure of how easy the controller (test engineer) can determine the signal value at any logic node by control values at the circuit primary inputs and observing the primary circuit outputs Degree of controllability and observability (testability) can be measured with respect to whether the test vectors are generated deterministically or randomly.
ELEC516/10 Lecture Path Sensitization Step1: Sensitize the circuit: Find input values that produce a value on the faulty node that’s different from the value forced by the fault. For our S-A-1 fault example, we want output of OR gate to be 0. Is this always possible? What would it mean if no such input values exist? Is the set of sensitizing input values unique? What’s left to do?
ELEC516/10 Lecture Error Propagation Step2: Fault propagation: Select a path that propagates the faulty value to an observed output (Z in our example) Step3: Backtracking: Find a set of input values that enables the selected path. Is this always possible? What would it mean if no such input values exist?
ELEC516/10 Lecture Testability Example of non-testable error: For x=1 we need both a and =1, What ever the value of C, one of the three outputs is 1: PB!!!! Two possible propagation of “1” Pb: Fault propagation
ELEC516/10 Lecture Controllability and Observability Line 7 cannot be tested at the primary output. Thus this circuit is not fully testable. Reason: reconvergent fanout of line 7 Fault test pattern generation: Fault sensitization – input vector to sensitize a fault Fault propagation – condition that propagate the fault to the output so that it can be observed.
ELEC516/10 Lecture Controllability and Observability Circuit with poor controllability –Circuits with feedbacks –Decoder and clock generator Circuits with poor observability –Sequential circuits with long feedback loops –Circuits with reconvergent fanouts –Redundant nodes –Embedded memories such as RAM, ROM, PLA Use self test for circuits with poor observability
ELEC516/10 Lecture Generating and Validating Test- vectors Automatic test-pattern generation (ATPG) –For given fault, determine excitation vector (called test vector) that will propagate error to primary (observable) output –Majority of available tools: combinational networks only –Sequential ATPG available from academic research Fault simulation –Determine fault coverage of proposed test-vector set –Simulate correct network in parallel with faulty networks Both require adequate models of faults in CMOS IC
ELEC516/10 Lecture ATPG Process Fault Selection Fault Observe Point Assessment Fault Excitation Vector Generation Fault Simulation Fault Dropping
ELEC516/10 Lecture ATG for fanout-free combinational circuits 2 steps –Activate (excite) the fault from the primary input For signal l with stuck-at-v fault, set primary input values such that signal l equal to v’ Called justification problem – find an assignment of PI vaues that results in a desired value setting on a specified signal in the circuit –Propagate the resulting error to a primary output Composite logic values (v/v f ), where v and v f are values of the same signal in N and N f, where N and N f are the fault-free circuit, and faulty circuit, respectively. Composite logic values (1/0, denoted by D) and (0/1, denoted by D’) represent errors We have this logic behavior: –D+D’=1, D.D’=0,D+D = D.D=D,D’+D’=D’.D’=D’, D+0=D, D’+0=D’,….
ELEC516/10 Lecture Test generation for the fault l stuck-at-v in a fanout-free circuit Begin set all values to x Justify(l,v’) if v = 0 then propagate(l,D) else propage(l,D’) end
ELEC516/10 Lecture Example a b c d e f g i h j Stuck-at-0 a b c d e f g i h j x 0 D 0 1 D D
ELEC516/10 Lecture Circuits with Fanout Two basic goals: fault activation and error propagation Fanout – several ways to propagate an error to PO Fundamental difficulty –Reconvergent fanout – the resulting line justification problems are no longer independent
ELEC516/10 Lecture Example d a b c e f1f1 f2f2 G1G1 G2G2 G3G3 G4G4 G5G5 G6G6 s-a-1 The only vector that can test the fault is 111x0
ELEC516/10 Lecture Another Example s-a-1 a b c d e f h k l m n o p q r s
ELEC516/10 Lecture Fault SImulation Applying a set of vectors to a structural (netlist) description of a design and determining how many and which faults are detected out of the total set of available faults. Concurrent fault simulation –Applies the vectors to many copies of the netlist at the same time. –Each copy contains one or more faults. –Each of these simulations is run concurrently with a good circuit simulation –If a difference is observed at the legal observation point between the good circuit and any faulty circuit simulation, the fault is listed as detected
ELEC516/10 Lecture What can we do to increase testability? Increase observability: –Add more pins (???? Can be a problem) –Add small “probe” bus to selectively enable different values onto the bus –Compress a sequence of values (for example a value of a bus over many clock cycles) into a small number of bits for later read-out Increase controllability –Use Multiplexers to isolate sub-modules and select sources of test data as inputs –Provide easy setup of internal states
ELEC516/10 Lecture Test approaches Scan-based testing Built-in self test
ELEC516/10 Lecture Scan-based technique Minimize the use of additional I/O pins for testing Use scan registers with both shift and parallel load capabilities. Storage cells in registers act as observation points, control points or both. Reduce testing of a sequential circuit to that of a combinational circuit
ELEC516/10 Lecture Scan the idea Two modes of operations: normal and one in which all registers are chained into one long shift register which can be loaded and read-out serially. Comb. Logic A reg Comb. Logic A reg Scan-in Scan-out Scan-based structure
ELEC516/10 Lecture Scan-based structure
ELEC516/10 Lecture Scan-path register scanin scan 22 11 in out scanout keep load
ELEC516/10 Lecture Scan-based Test- operation In 0 latch scanin test latch test In 1 latch test In 2 latch test In 3 scanout Out 0 Out 1 Out 2 Out 3 Test 11 22 N cycle scan-in 1 cycle evaluation N cycle scan-out Testing time per test pattern increases due to shifting time in long register.
ELEC516/10 Lecture Scan-path Testing Reg + > in1 in0 out scanin scanout
ELEC516/10 Lecture JTAG – Boundary Scan Testing PCB and multichip modules carrying multiple pins Shift registers are placed in each chip close to I/O pins in order to form a chain around the board for testing PCB Scan path
ELEC516/10 Lecture Buit-In Self Test –The idea Problem: Scan-based approach is very useful for testing combinational logic but can be impractical when trying to test memory blocks because of the number of separate test values required to get adequate fault coverage. Solution: use on-chip circuitry to generate test data and check the results. Can be used at every power-on to verify correct operation! Generate pseudo-random data for most circuit using e.g. a linear Feedback shift register (LFSR). For pseudo-random input data, compute some output values and compare against expected value “signature” at the end of the test.
ELEC516/10 Lecture Built-in self test Parts of the circuit are used to test the circuit itself. Essential circuit modules: –Pseudo random pattern generator (PRPG) –Output response analyzer (ORA)
ELEC516/10 Lecture PRPG using LFSR LSFR – linear feedback shift register Q0 Q1 Q
ELEC516/10 Lecture Signature Analysis Reduce chip area, data compression schemes are used to compare the compacted test responses instead of the entire raw test data. Signature analysis – based on cyclic redundancy checking Use polynomial division which divides the polynomial representation of the test output data by a characteristic polynomial and then finds the remainder as the signature. The signature is then compared with the expected signature to determine whether the device is faulty or not. Sometimes the fault may be un-detected - aliasing
ELEC516/10 Lecture ORA by LFSR The signature is the content of this register after the last input bit has been sampled. The input sequence {an} is represented by G(x) and the output sequence by Q(x). Gx( = Q(x)P(x) + R(x) where P(x) is the characteristic polynomial of LFSR and R(x) is the remainder. For the above LFSR we have P(x) = 1+x 2 +x 4 +x 5 For the input sequence [ ], the G(x) = x 7 +x 6 +x 5 +x 4 +x 2 +1 and the remainder term R(X) = x 4 +x 2 which corresponds to the register content of [00101]
ELEC516/10 Lecture ORA On-chip storage of a fault dictionary containing all test inputs with the corresponding outputs is too expensive. A simple alternative is to compare the output of two identical circuits for the same input –Cannot detect if both circuits have the same faults Self-checking design- detect fault autonomously during on-line operation –A checker circuit is inserted such that the checker generates and sends out a signal when on-line faults occur.
ELEC516/10 Lecture Built-in Logic Block Observer (BILBO) A form of ORA, used in each cluster of partitioned registers Allows monitoring of circuit operation through exclusive ORing into LFSR at multiple points, which corresponds to the signature analyzer with multiple inputs C o C 1 Mode 0 0 linear shift 1 0 signature analysis 11 data latch 0 1 reset
ELEC516/10 Lecture BILBO Application Combination Logic Combination Logic BILBO-A BILBO-B scanIn scanOut In Out
ELEC516/10 Lecture Memory Self-Test Memory Under Test FSM Signature Analysis Data-in Data-out Address & R/W control Patterns: Writing/Reading 0s, 1s Walking 0s, 1s Galloping 0s, 0s
ELEC516/10 Lecture Current monitoring: IDDQ –The idea CMOS logic should draw no current when it’s not switching. So after initializing circuit and disabling pseudo-NMOS gates, the power supply current should be zero after all signals have settled. Good for detecting short faults. Need to try several different circuit states to ensure all parts of the chip have been observed.
ELEC516/10 Lecture Current Monitoring I DDQ Test Under bridging fault, static currents drawn from the power supply, much larger than leakage current Test different situations –Gate oxide short –Channel punch through –P-n diode leakage –Transmission-gate defect IDDQ test only needs sensitization, but not propagation Performance in open drain and open gate test is less effective