A Highly Testable Pass Transistor Based Structured ASIC Design Methodology Kanupriya Gulati Nikhil Jayakumar Sunil P. Khatri.

A Highly Testable Pass Transistor Based Structured ASIC Design Methodology Kanupriya Gulati Nikhil Jayakumar Sunil P. Khatri

Motivation for Structured ASICs Process (microns) 2.00.80.60.350.250.180.130.1 Single Mask Cost ($K) 1.51.52.54.57.5124060 # of Masks 1212121620263034 Mask Set cost ($K) 1818307215031210002000 A full set of lithography masks can cost between $1-3M. A full set of lithography masks can cost between $1-3M. Roughly 25% reduction in ASIC design starts in past 7 years. [Sematech Annual Report 2002], [ A. Sangiovanni- Vincentelli “The Tides of EDA”, keynote talk, DAC 2003]. Roughly 25% reduction in ASIC design starts in past 7 years. [Sematech Annual Report 2002], [ A. Sangiovanni- Vincentelli “The Tides of EDA”, keynote talk, DAC 2003].

Our Solution Use a regular array of pass transistor logic based if-then- else (ITE) cells with flip-flops along the edges of the die as the underlying circuit structure. Use a regular array of pass transistor logic based if-then- else (ITE) cells with flip-flops along the edges of the die as the underlying circuit structure. Stock such arrays pre-processed up until metallization step Stock such arrays pre-processed up until metallization step Or, use previously generated masks for all other layers and use new masks for only METAL, VIA layers. Or, use previously generated masks for all other layers and use new masks for only METAL, VIA layers. To create an ASIC for a given design – technology-map this design to the smallest available array. To create an ASIC for a given design – technology-map this design to the smallest available array. Only METAL and VIA masks require changes. Only METAL and VIA masks require changes.

Advantages Can share masks for several layers. Can share masks for several layers. Reduces NRE. Reduces NRE. No need for the designer to worry about DFM issues. No need for the designer to worry about DFM issues. Improved yield. Improved yield. New designs can be implemented faster. New designs can be implemented faster. Task of engineering change simplified – design modification requires only METAL, VIA mask changes. Task of engineering change simplified – design modification requires only METAL, VIA mask changes. Generating test patterns for such a design is easy. Generating test patterns for such a design is easy. 100% test coverage in time linear in the size of the network 100% test coverage in time linear in the size of the network No redundant faults in the design. No redundant faults in the design.

The Gap between FPGA and ASIC Low speed Low speed High Power High Power Cost-effective for low volume products Cost-effective for low volume products FPGA ASIC High Speed High Speed Low Power Low Power Cost-effective for high volume products Cost-effective for high volume products Necessary for products requiring high performance or low power. Necessary for products requiring high performance or low power. What bridges the gap?

Taxonomy of Regular Logic Fabrics “Exploring Regular Fabrics to Optimize the Performance-Cost Trade-off” – L. Pillegi et.al. As we move further away from Standard cell (ASIC), we lose As we move further away from Standard cell (ASIC), we lose Area Area Speed Speed Power Power As we move closer to FPGAs, we gain As we move closer to FPGAs, we gain Flexibility Flexibility Lower NRE Lower NRE Our Approach

Overview Convert a logic netlist to a partitioned Reduced Order Binary Decision Diagram (ROBDD). Convert a logic netlist to a partitioned Reduced Order Binary Decision Diagram (ROBDD). Each ROBDD node is implemented as an ITE cell. Each ROBDD node is implemented as an ITE cell. Place these ITE cells in an area and delay efficient manner on a pre-fabricated array of ITE cells. Place these ITE cells in an area and delay efficient manner on a pre-fabricated array of ITE cells.

ITE Cell Structure Used NMOS pass-gate based structure. Used NMOS pass-gate based structure. Each ITE cell generates buffered output and its complement. Each ITE cell generates buffered output and its complement. Delay of NMOS pass-gate ITE cell was found to be similar to that of CMOS pass-gate based ITE cell with a smaller area. Delay of NMOS pass-gate ITE cell was found to be similar to that of CMOS pass-gate based ITE cell with a smaller area. Probably due to the increased diffussion capacitance in CMOS pass-gates. Probably due to the increased diffussion capacitance in CMOS pass-gates. T E i i out

ITE Cell Design MUX control signals run along the length of the cell. MUX control signals run along the length of the cell. Each ITE cell has 3 variable signals and three complemented variable signals running horizontally in metal 3. Each ITE cell has 3 variable signals and three complemented variable signals running horizontally in metal 3. Appropriate placement of stacked vias at the horizontal metal 3 wires allows the ITE cell to be connected to any one of the 3 variables in the corresponding row of the array. Appropriate placement of stacked vias at the horizontal metal 3 wires allows the ITE cell to be connected to any one of the 3 variables in the corresponding row of the array. Metal layers 1 and 2 used for most of the layout, metal layer 3 used to route variables and their complement. Metal layers 1 and 2 used for most of the layout, metal layer 3 used to route variables and their complement. i1i1 i1i1 i2i2 i2i2 i3i3 i3i3 VDD GND

Synthesis – Partitioned ROBDD Synthesis of logic netlist into a partitioned ROBDD structure done in VIS. Synthesis of logic netlist into a partitioned ROBDD structure done in VIS. Primary input variables are ordered using a DFS ordering. Primary input variables are ordered using a DFS ordering. Enable dynamic variable ordering before building ROBDDs Enable dynamic variable ordering before building ROBDDs Do bottom up construction of ROBDDs Do bottom up construction of ROBDDs Let set of variables in ROBDD manager be V (initially PIs). Let set of variables in ROBDD manager be V (initially PIs). If size of any ROBDD > user-specified threshold ‘ B ’ If size of any ROBDD > user-specified threshold ‘ B ’ Introduce new variable v (intermediate ROBDD variable) and continue building ROBDDs on a set of variables V U v. Introduce new variable v (intermediate ROBDD variable) and continue building ROBDDs on a set of variables V U v. Results in a series of ROBDDs Results in a series of ROBDDs Size of each ROBDD bounded by B. Size of each ROBDD bounded by B. Output of these ROBDDs represent either a primary output or an intermediate ROBDD variable. Output of these ROBDDs represent either a primary output or an intermediate ROBDD variable.

Example Given multi-level logic network with primary inputs {x1,x2, x3,x4} Given multi-level logic network with primary inputs {x1,x2, x3,x4} As bottom-up ROBDD construction proceeds, new variables y1 and y2 are created. As bottom-up ROBDD construction proceeds, new variables y1 and y2 are created. Z is built in terms of {y1, y2} Z is built in terms of {y1, y2} x1x1 x2x2 x3x3 x4x4 x1x1 x2x2 x3x3 x4x4 y1y1 y2y2 z z y1y1 y2y2

Placement First Replicate ITE cells whose outputs are heavily loaded in order to limit fanout First Replicate ITE cells whose outputs are heavily loaded in order to limit fanout Correspond to ROBDD nodes with high in-degrees. Correspond to ROBDD nodes with high in-degrees. If in-degree of ROBDD node = k, then replicate this node times. If in-degree of ROBDD node = k, then replicate this node times. we use K = 3 we use K = 3 Compute initial estimate of number of ITE cells ‘ n ’ in any row of the ITE array and number of rows ‘ m ’ of the ITE array as follows: Compute initial estimate of number of ITE cells ‘ n ’ in any row of the ITE array and number of rows ‘ m ’ of the ITE array as follows: where, x = width of each ITE cell where, x = width of each ITE cell y = height of each ITE cell y = height of each ITE cell N = total number of ITE cells N = total number of ITE cells

Placement Sort the N ITE cells in increasing order of their ROBDD variable index. Sort the N ITE cells in increasing order of their ROBDD variable index. Variable index is a measure of closeness of variable to the root of ROBDD. Variable index is a measure of closeness of variable to the root of ROBDD. A variable closer to the root has smaller index than one further from the root. A variable closer to the root has smaller index than one further from the root. Assign ITE cells to rows of the ITE array Assign ITE cells to rows of the ITE array

Assigning ITE cells to rows If there are n j ITE cells with variable index v j such that n j > n (n = number of ITE cells that can fit in one row) If there are n j ITE cells with variable index v j such that n j > n (n = number of ITE cells that can fit in one row) ITE cells need to span rows. ITE cells need to span rows. Sort these n j cells in decreasing order of cost – C. Sort these n j cells in decreasing order of cost – C. c i = children of node c c i = children of node c c j = parents of node c c j = parents of node c Helps keep routes short. Helps keep routes short. Level 4 Level 3 Level 2 Level 5 Level 6 a b Cost(a) = 5 – 2 =3 Cost(b) = 3 – 3 = 0

Assigning ITE cells to rows If there are n j ITE cells with variable index v j such that n j < n If there are n j ITE cells with variable index v j such that n j < n Attempt to populate corresponding row of the ITE array with additional ITE cells with variable index v j+1 Attempt to populate corresponding row of the ITE array with additional ITE cells with variable index v j+1 If row is still not full, add ITE cells with variable index v j+2 as well. If row is still not full, add ITE cells with variable index v j+2 as well. Each row can hold ITE cells which depend on at most 3 variables since the number of variables that can be routed over any ITE cell is 3. Each row can hold ITE cells which depend on at most 3 variables since the number of variables that can be routed over any ITE cell is 3.

Placement of ITE cells within rows ITE cells are arranged within rows to reduce crossings in the induced circuit graph (after planarization of the array of ITE cells). ITE cells are arranged within rows to reduce crossings in the induced circuit graph (after planarization of the array of ITE cells). Use DOT ( graphviz.org ) to do this. Use DOT ( graphviz.org ) to do this. DOT only re-arranges cells in each ITE row in a manner that minimizes graph crossings. DOT only re-arranges cells in each ITE row in a manner that minimizes graph crossings. DOT is not allowed to modify the assignment of ITE cells to rows. DOT is not allowed to modify the assignment of ITE cells to rows.

Implementing Sequential Designs Each row of ITE cells has a bank of 3 flip-flops. Each row of ITE cells has a bank of 3 flip-flops. Outputs of the flops can drive one of the inputs by means of a METAL and VIA mask change. Outputs of the flops can drive one of the inputs by means of a METAL and VIA mask change.

Route Use WROUTE (in Cadence’s Silicon Ensemble for DSM) to route the ITE cell array. Use WROUTE (in Cadence’s Silicon Ensemble for DSM) to route the ITE cell array. Use 4 metal layers for the route. Use 4 metal layers for the route. Example: alu2

Summary of Design Flow Convert netlist to partitioned ROBDD in VIS. Convert netlist to partitioned ROBDD in VIS. Perform cell replication if required to limit fanout. Perform cell replication if required to limit fanout. Perform ITE cell assignment to rows. Perform ITE cell assignment to rows. Re-arrange ITE cells within rows using DOT to minimize crossings in the graph induced by the interconnections among the ITE cells. Re-arrange ITE cells within rows using DOT to minimize crossings in the graph induced by the interconnections among the ITE cells. Use the result of DOT as the final placement and perform routing using WROUTE (or any other routing tool). Use the result of DOT as the final placement and perform routing using WROUTE (or any other routing tool).

Ease of Testability In traditional scanned standard-cell based circuits In traditional scanned standard-cell based circuits ATPG problem is NP complete. ATPG problem is NP complete. In our scanned ITE cell based approach In our scanned ITE cell based approach In functional mode In functional mode Partitioned ROBDD outputs are regular inputs to other partitions. Partitioned ROBDD outputs are regular inputs to other partitions. In test mode In test mode Primary inputs and the outputs of each partition are scanned in to allow independent testability of the different partitions. Primary inputs and the outputs of each partition are scanned in to allow independent testability of the different partitions.

Abstract View of Partitioned ROBDDs... Additional Scan-able nodes PIs PO x1x1 x2x2 x3x3 x4x4 y1y1 y2y2 x9x9 x6x6 x3x3 x4x4 z x5x5 y2y2...

Ease of Testability - Excitation ROBDD of  Path from to  Linear time BDD operation

Ease of Testability - Propagation  Path from to  Again a Linear time BDD operation Support variables for both conditions are Non-Overlapping !!Support variables for both conditions are Non-Overlapping !! Circuit is guaranteed irredundantCircuit is guaranteed irredundant 100% stuck fault coverage guaranteed in time linear in the size of the circuit.100% stuck fault coverage guaranteed in time linear in the size of the circuit. ROBDD of

Experiments To compare with standard-cell based design, the circuits were mapped to a library of 20 gates. To compare with standard-cell based design, the circuits were mapped to a library of 20 gates. Used SIS for optimization ( script.rugged ) and map. Used SIS for optimization ( script.rugged ) and map. Placement and routing done using SEDSM using 0.1um process and 4 metal layers. Placement and routing done using SEDSM using 0.1um process and 4 metal layers. Delay of standard-cell based designs: Delay of standard-cell based designs: Pre-characterized the library using SPICE (0.1um BPTM) Pre-characterized the library using SPICE (0.1um BPTM) Used sense package in SIS Used sense package in SIS “sense” returns longest sensitizeable path (false paths implicitly ignored) “sense” returns longest sensitizeable path (false paths implicitly ignored)

Experiments Partitioned ROBDD construction done using the “frontier method” in VIS. Partitioned ROBDD construction done using the “frontier method” in VIS. Tried the following different partitioning threshold numbers (B). Tried the following different partitioning threshold numbers (B). 5, 10, 15, 20 and 1000. 5, 10, 15, 20 and 1000. For each circuit, the result that yielded the smallest number of ROBDD nodes was selected. For each circuit, the result that yielded the smallest number of ROBDD nodes was selected. This partitioned ROBDD structure was then taken through our design flow. This partitioned ROBDD structure was then taken through our design flow.

Delay of ITE cell array: Delay of ITE cell array: Found by traversing longest topological path (in terms of number of ITE cells) between any circuit PI and PO Found by traversing longest topological path (in terms of number of ITE cells) between any circuit PI and PO Delay at each ITE cell is given by: Delay at each ITE cell is given by: If variable is a primary input: If variable is a primary input: D(cell) = MAX[ D(leftchild), D(rightchild)] + D(ITE block) D(cell) = MAX[ D(leftchild), D(rightchild)] + D(ITE block) If variable is an internal node If variable is an internal node D(cell) = MAX[ D(variable), D(leftchild), D(rightchild)] + D(ITE block) D(cell) = MAX[ D(variable), D(leftchild), D(rightchild)] + D(ITE block) D(ITE block) found from SPICE simulations (0.1um BPTM) D(ITE block) found from SPICE simulations (0.1um BPTM) Assumed that the ITE cell drove the maximum load allowed – hence delay estimates are conservative Assumed that the ITE cell drove the maximum load allowed – hence delay estimates are conservative Experiments

Results (Combinational designs) Delay penalty is ~ 2X Delay penalty is ~ 2X Area Penalty is ~ 6X Area Penalty is ~ 6X FPGAs typically have a 25X delay penalty and a 10X area penalty. FPGAs typically have a 25X delay penalty and a 10X area penalty. Ckt.Evaluation DelayArea StdCellITEOvhStdCellITEOvh alu27705000.651314.125601.95 alu410205270.5225005068.82.03 apex650013102.572678.114585.65.45 apex744010302.34885.146085.21 C190888025902.911827.682884.53 C3540125030502.444323.129491.26.82 C43293030703.3715.646406.48 C49960010701.781827.63974.42.17 C880121027502.271463.18985.66.14 dalu111024602.223164.139916.812.62 frg281017002.12575.624441.69.49 i888015601.774064.1403209.92 i98508100.952383.214035.25.89 t4817206000.832626.660802.31 term13207302.28663.12355.23.55 too_large51015503.041105.6105609.55 vda6506000.921508.0360804.03 x13809502.51105.69625.68.71 x351016603.252756.2516844.86.11 x44406501.481314.1112648.57 Avg2.016.08

Results (Sequential designs) Delay penalty is ~ 1.6X. Delay penalty is ~ 1.6X. Area penalty is ~ 3.4X. Area penalty is ~ 3.4X. FPGAs typically have a 25X delay penalty and a 10X area penalty FPGAs typically have a 25X delay penalty and a 10X area penalty Ckt.Evaluation DelayArea StdCellITEOvhStdCellITEOvh s14886306501.033277.662401.9 s14946506000.923108.164002.06 s2082705502.04105.11459.213.88 s3443906501.67715.62649.63.7 s3494106501.59742.62649.63.57 s3862905501.9885.12060.82.33 s4443807001.841105.628802.6 s5103904001.031105.63161.62.86 s5263307002.121314.12355.21.79 s526n3307002.121314.12457.61.87 s8205606501.161827.639682.17 s8325706501.141827.639682.17 Avg1.553.41

Speed-up of ATPG ATPG is about 30X faster for ITE cell based circuits. ATPG is about 30X faster for ITE cell based circuits. ITE based circuits are guaranteed irredundant and 100% testable in linear time!!! ITE based circuits are guaranteed irredundant and 100% testable in linear time!!! CktRegular ATPG (SIS)ATPG for ITEImprove C19080.780.0239.00 C35404.840.02242.00 C4320.10.520.19 C4990.320.0132.00 C8800.160.0116.00 frg217.210.4538.24 i816.260.16101.63 i90.60.0320.00 apex70.050.041.25 x31.950.1910.26 apex60.940.273.48 term10.560.0228.00 alu20.30.0215.00 alu41.470.473.13 too_large8.830.4121.54 vda3.424.370.78 x10.260.430.60 x40.320.281.14 Avg.31.90

Conclusions We have a method that can implement circuits quicker and with NRE amortized over a large number of designs. We have a method that can implement circuits quicker and with NRE amortized over a large number of designs. Strikes a reasonable compromise between ASICs and FPGAs. Strikes a reasonable compromise between ASICs and FPGAs. An ITE cell based design is easily testable. An ITE cell based design is easily testable. 100% testable in linear time 100% testable in linear time Guaranteed irredundant Guaranteed irredundant Testability gains arise from the use of partitioned ROBDD based PTL design approach Testability gains arise from the use of partitioned ROBDD based PTL design approach Same gains can be reaped in a regular PTL design approach Same gains can be reaped in a regular PTL design approach Can be modified to efficiently test for other faults Can be modified to efficiently test for other faults Delay faults, stuck open faults etc. Delay faults, stuck open faults etc.

Questions ?

A Highly Testable Pass Transistor Based Structured ASIC Design Methodology Kanupriya Gulati Nikhil Jayakumar Sunil P. Khatri.

Similar presentations

Presentation on theme: "A Highly Testable Pass Transistor Based Structured ASIC Design Methodology Kanupriya Gulati Nikhil Jayakumar Sunil P. Khatri."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Highly Testable Pass Transistor Based Structured ASIC Design Methodology Kanupriya Gulati Nikhil Jayakumar Sunil P. Khatri.

Similar presentations

Presentation on theme: "A Highly Testable Pass Transistor Based Structured ASIC Design Methodology Kanupriya Gulati Nikhil Jayakumar Sunil P. Khatri."— Presentation transcript:

Similar presentations

About project

Feedback