1 A Deep Sub-Micron VLSI Design Flow using Layout Fabrics Sunil P. Khatri University of Colorado, Boulder Amit Mehrotra University of Illinois, Urbana-Champaign.

1 A Deep Sub-Micron VLSI Design Flow using Layout Fabrics Sunil P. Khatri University of Colorado, Boulder Amit Mehrotra University of Illinois, Urbana-Champaign Robert K Brayton Alberto L Sangiovanni-Vincentelli University of California, Berkeley

2 Our VLSI Design Flow Optimized logic netlist Layout Logic Optimization Technology Mapping Routing Placement Logic netlist

3 Motivation  Modern IC processes  Feature size well below 1 micron  Certain electrical effects increasingly important  Cross-talk  Electromigration  Self Heat  Statistical variations  Logic abstraction eroded  Existing design paradigms need to be rethought

4 a C C 2 1 C C 2 C 2 1 a a v v a C C 2 1 C C 2 C 2 1 a a v v C 1 C 2 C 2 1 C 2 C Research Focus  Tackled in an ad-hoc manner  Increases turn-around time  Verified cross-talk trends  Accurate 3-D capacitance extraction  Delay variation 2.47:1 (200  m wires, 10X drivers, 0.1  m technology)  The cross-talk issue C C 2 1 C C 2 C 2 1 a v a a v a C C 2 1 C C 2 C 2 1 v a a v a C C 2 1 C C 2 C 2 1 v a

5 Outline  Previous Approaches  New idea:  New idea: The Fabric Approach  Fabric1 (in DAC-1999)  Standard-cell based design  Fabric3 (in ICCAD-2000)  Network of PLA based design  Further Tasks  Summary

6 Previous Approaches  [ALPHA 97] :  Metal layers 3 and 6 dedicated to power  Not viable in future processes  [Rubio 94]:  Functional analysis based on layout  Post-layout methods don’t scale  [Kirkpatrick 94, 96] :  Concept of digital sensitivity  Requires don’t-care and image computations

7 Solution: Layout Fabrics dense wiring fabric  Repeating dense wiring fabric (DWF) pattern at minimum pitch by design  We handle cross-talk by design  A new layout and design paradigm S SS V VS G S S V G V

8 Research Contribution  Verify cross-talk trends  Fabric1 [KMBSO99] (in DAC)  Incorportated into traditional design flow  Fabric3 [KBS00] (in ICCAD-00)  Network of PLAs  Detailed electrical characterization  Synthesis, wire removal algorithms  Both utilize DWF pattern  1.02:1 cross-talk delay variation

9 Layout Fabrics  Advantages  Pre-characterized parasitics  Uniform, low cross-coupling capacitance  40X  40X lower, 2% delay variation  Uniform, low signal inductance  Automatic power and ground routing  Uniform, low power and ground resistance  Can effectively implement regular structures  Disadvantages  5% increase in total capacitance  Area penalty  Power increase

10 Capacitance in DWF  Experimental setup  “Strawman” process model, copper wires, low-K dielectric  Capacitances from 3-D field solver (space3d)  Simulated three wires in spice  0.1 micron process, Metal2 wires  Length 200 microns, 10x minimum drivers  Non-DWF  Delay variation 2.47:1  Signal integrity problems for fast slew rates  With DWF  40X reduction in cross-coupling capacitance  Delay variation 1.02:1, no signal integrity problem

11 Inductance in the DWF  Low and uniform in DWF  Current return path is at minimum spacing  In regular layout style, varies greatly  Problems reported for clock signals  Compared inductance of Metal8 trace  Verified using ASITIC Inductance (nH / micron)

12 VDD/GND Resistance in DWF  Check resistance at various points in DWF  Compare with standard cell case  Varies greatly  Measured at end of row  L/W = 1000/8 VDD/GND resistance (ohms)

13 Buffer Insertion in DWF  Easily performed  VDD and GND available all over routing area

14 Fabric1 - Introduction  DWF pattern utilized chip-wide  Library cells implemented in this pattern Std CellFabric Cell  Synthesis, placement and routing use standard cell methodology

15 Fabric1 - Results

16 Fabric1 - Results

17 Fabric3 Programmable Logic Arrays  Network of Programmable Logic Arrays  Combine many logic nodes into a PLA  Routing area utilizes DWF pattern  PLA implements a multi-output function  example : f = a b + c ; g = a b + c a b c abcbfg AND planeOR Plane

18 Fabric3 PLA Core Layout b g a a b f clk

19 PLAs v/s Standard Cells dense fast  PLAs are dense and fast PLA Standard Cell

20 PLA Characteristics  Why is the PLA area and delay so low?  Wiring localized within PLA  PLA core transistor sizes are minimum  No p-transistor to n-transistor diffusion spacing  “Gigahertz” chip utilized pre-charged PLAs  High performance  Quick implementation  Didn’t use a network of PLAs

21 Network of PLAs  PLAs are pre-charged  Inputs to all PLAs must settle before evaluation begins a g f d b c e

22 Network of PLAs  For correct operation:  PLA dependency graph must be acyclic  Evaluation of PLA i after completion of slowest PLA j in its “fanin”  Self-timed design style  Each PLA generates a completion signal  Overhead of one wordline, one output  Delay formula to find slowest PLA j

23 Decomposition  Algorithm collapses wiring into PLAs  Input:  Input: multi-level combinational network W bound H bound  Output:  Output: Correct network of PLAs  Our algorithm greedily grows a PLA until either bound is violated  Attempt to reduce wires by selecting fanouts for inclusion in the PLA being grown

24 Choice of W, H  Choice of W  Driven by synthesis constraints  Large W means larger runtimes  espresso and folding done in inner loop  Use W between 25 and 50  Choice of H  Driven by power considerations  Large H also affects synthesis runtimes  Used H between 15 and 40

25 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 a 4 3 2 1 1 g f d b c 1 e 2 Fabric3 - Decomposition a g f d b c e

26 Place/Route Flow  PLA generation  PLA generation using perl script  Layout generated on the fly  2 Layer experiments:  Placement using vpr  FPGA placement tool  All PLAs have approximately same size  Routing using wolfe  interface to TimberWolfSC and yacr  3-6 Layer experiments:  Placement using CADENCE qplace  Routing using CADENCE router

27 Fabric3 - Area Results

28 Fabric3 - Timing Results

29 Fabric3 - Results  Timing results essentially unchanged  For C3540, delay variation due to cross-talk is 3.45:1 (Stdcell) versus 1.07:1 (Fabric3)

30 Fabric3 layout (2 Layer)

31 Future Tasks  Better algorithms:  Better ways of decomposing original netlist  Refining the fabric:  Alternative denser fabrics  Encoding PLA inputs [Schmookler80]  Connecting gates to PLA outputs  Alternative implementation of logic blocks:  Different PLA styles  Alternative circuits

32 Summary  Layout fabrics to eliminate cross-talk in DSM VLSI design  New layout and design paradigm  Fix cross-talk by design  Highly regular and predictable  Network of PLA based design flow  PLA decomposition algorithms  Minimal area penalty  15% timing improvement

33 Thank you!!

1 A Deep Sub-Micron VLSI Design Flow using Layout Fabrics Sunil P. Khatri University of Colorado, Boulder Amit Mehrotra University of Illinois, Urbana-Champaign.

Similar presentations

Presentation on theme: "1 A Deep Sub-Micron VLSI Design Flow using Layout Fabrics Sunil P. Khatri University of Colorado, Boulder Amit Mehrotra University of Illinois, Urbana-Champaign."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 A Deep Sub-Micron VLSI Design Flow using Layout Fabrics Sunil P. Khatri University of Colorado, Boulder Amit Mehrotra University of Illinois, Urbana-Champaign.

Similar presentations

Presentation on theme: "1 A Deep Sub-Micron VLSI Design Flow using Layout Fabrics Sunil P. Khatri University of Colorado, Boulder Amit Mehrotra University of Illinois, Urbana-Champaign."— Presentation transcript:

Similar presentations

About project

Feedback