Download presentation
Presentation is loading. Please wait.
1
1 A Deep Sub-Micron VLSI Design Flow using Layout Fabrics Sunil P. Khatri University of Colorado, Boulder Amit Mehrotra University of Illinois, Urbana-Champaign Robert K Brayton Alberto L Sangiovanni-Vincentelli University of California, Berkeley
2
2 Our VLSI Design Flow Optimized logic netlist Layout Logic Optimization Technology Mapping Routing Placement Logic netlist
3
3 Motivation Modern IC processes Feature size well below 1 micron Certain electrical effects increasingly important Cross-talk Electromigration Self Heat Statistical variations Logic abstraction eroded Existing design paradigms need to be rethought
4
4 a C C 2 1 C C 2 C 2 1 a a v v a C C 2 1 C C 2 C 2 1 a a v v C 1 C 2 C 2 1 C 2 C Research Focus Tackled in an ad-hoc manner Increases turn-around time Verified cross-talk trends Accurate 3-D capacitance extraction Delay variation 2.47:1 (200 m wires, 10X drivers, 0.1 m technology) The cross-talk issue C C 2 1 C C 2 C 2 1 a v a a v a C C 2 1 C C 2 C 2 1 v a a v a C C 2 1 C C 2 C 2 1 v a
5
5 Outline Previous Approaches New idea: New idea: The Fabric Approach Fabric1 (in DAC-1999) Standard-cell based design Fabric3 (in ICCAD-2000) Network of PLA based design Further Tasks Summary
6
6 Previous Approaches [ALPHA 97] : Metal layers 3 and 6 dedicated to power Not viable in future processes [Rubio 94]: Functional analysis based on layout Post-layout methods don’t scale [Kirkpatrick 94, 96] : Concept of digital sensitivity Requires don’t-care and image computations
7
7 Solution: Layout Fabrics dense wiring fabric Repeating dense wiring fabric (DWF) pattern at minimum pitch by design We handle cross-talk by design A new layout and design paradigm S SS V VS G S S V G V
8
8 Research Contribution Verify cross-talk trends Fabric1 [KMBSO99] (in DAC) Incorportated into traditional design flow Fabric3 [KBS00] (in ICCAD-00) Network of PLAs Detailed electrical characterization Synthesis, wire removal algorithms Both utilize DWF pattern 1.02:1 cross-talk delay variation
9
9 Layout Fabrics Advantages Pre-characterized parasitics Uniform, low cross-coupling capacitance 40X 40X lower, 2% delay variation Uniform, low signal inductance Automatic power and ground routing Uniform, low power and ground resistance Can effectively implement regular structures Disadvantages 5% increase in total capacitance Area penalty Power increase
10
10 Capacitance in DWF Experimental setup “Strawman” process model, copper wires, low-K dielectric Capacitances from 3-D field solver (space3d) Simulated three wires in spice 0.1 micron process, Metal2 wires Length 200 microns, 10x minimum drivers Non-DWF Delay variation 2.47:1 Signal integrity problems for fast slew rates With DWF 40X reduction in cross-coupling capacitance Delay variation 1.02:1, no signal integrity problem
11
11 Inductance in the DWF Low and uniform in DWF Current return path is at minimum spacing In regular layout style, varies greatly Problems reported for clock signals Compared inductance of Metal8 trace Verified using ASITIC Inductance (nH / micron)
12
12 VDD/GND Resistance in DWF Check resistance at various points in DWF Compare with standard cell case Varies greatly Measured at end of row L/W = 1000/8 VDD/GND resistance (ohms)
13
13 Buffer Insertion in DWF Easily performed VDD and GND available all over routing area
14
14 Fabric1 - Introduction DWF pattern utilized chip-wide Library cells implemented in this pattern Std CellFabric Cell Synthesis, placement and routing use standard cell methodology
15
15 Fabric1 - Results
16
16 Fabric1 - Results
17
17 Fabric3 Programmable Logic Arrays Network of Programmable Logic Arrays Combine many logic nodes into a PLA Routing area utilizes DWF pattern PLA implements a multi-output function example : f = a b + c ; g = a b + c a b c abcbfg AND planeOR Plane
18
18 Fabric3 PLA Core Layout b g a a b f clk
19
19 PLAs v/s Standard Cells dense fast PLAs are dense and fast PLA Standard Cell
20
20 PLA Characteristics Why is the PLA area and delay so low? Wiring localized within PLA PLA core transistor sizes are minimum No p-transistor to n-transistor diffusion spacing “Gigahertz” chip utilized pre-charged PLAs High performance Quick implementation Didn’t use a network of PLAs
21
21 Network of PLAs PLAs are pre-charged Inputs to all PLAs must settle before evaluation begins a g f d b c e
22
22 Network of PLAs For correct operation: PLA dependency graph must be acyclic Evaluation of PLA i after completion of slowest PLA j in its “fanin” Self-timed design style Each PLA generates a completion signal Overhead of one wordline, one output Delay formula to find slowest PLA j
23
23 Decomposition Algorithm collapses wiring into PLAs Input: Input: multi-level combinational network W bound H bound Output: Output: Correct network of PLAs Our algorithm greedily grows a PLA until either bound is violated Attempt to reduce wires by selecting fanouts for inclusion in the PLA being grown
24
24 Choice of W, H Choice of W Driven by synthesis constraints Large W means larger runtimes espresso and folding done in inner loop Use W between 25 and 50 Choice of H Driven by power considerations Large H also affects synthesis runtimes Used H between 15 and 40
25
25 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 Fabric3 - Decomposition a g f d b c e
26
26 Place/Route Flow PLA generation PLA generation using perl script Layout generated on the fly 2 Layer experiments: Placement using vpr FPGA placement tool All PLAs have approximately same size Routing using wolfe interface to TimberWolfSC and yacr 3-6 Layer experiments: Placement using CADENCE qplace Routing using CADENCE router
27
27 Fabric3 - Area Results
28
28 Fabric3 - Timing Results
29
29 Fabric3 - Results Timing results essentially unchanged For C3540, delay variation due to cross-talk is 3.45:1 (Stdcell) versus 1.07:1 (Fabric3)
30
30 Fabric3 layout (2 Layer)
31
31 Future Tasks Better algorithms: Better ways of decomposing original netlist Refining the fabric: Alternative denser fabrics Encoding PLA inputs [Schmookler80] Connecting gates to PLA outputs Alternative implementation of logic blocks: Different PLA styles Alternative circuits
32
32 Summary Layout fabrics to eliminate cross-talk in DSM VLSI design New layout and design paradigm Fix cross-talk by design Highly regular and predictable Network of PLA based design flow PLA decomposition algorithms Minimal area penalty 15% timing improvement
33
33 Thank you!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.