Interconnect Driver Design for Long Wires in FPGAs Edmund Lee, Guy Lemieux & Shahriar Mirabbasi University of British Columbia, Canada Electrical & Computer Engineering FPT 2006 Presentation
2 What this talk is about… □ Investigate the circuit design of switch drivers for long wires in FPGAs □ Topics considered ■ Driver circuit design ■ Wirelengths in FPGA architecture ■ Midpoint delays
3 Outline □ Motivation □ Problem Description □ Background □ Driver Design Approaches ■ Method 1: Elmore-based ■ Method 2: SPICE-based □ CAD Modeling, VPR Results □ Summary
4 Motivation □ Deep submicron interconnect delay is increasing □ Interconnect delay is a large component of FPGA delay □ Evolution of FPGA Switch Drivers ■ Bidirectional Unidirectional routing Bidirectional interconnect Unidirectional interconnect
5 Motivation … □ Only part of a wire is used in FPGAs □ critical sink locations are unknown □ can we improve all midpoint delays? Sink 1 Sink 2 Sink 3 Sink 4
6 Problem Description Given: Wire RC, total wire length Find: Buffer sizes, buffer locations, # of buffers
7 Background
8 Method 1: Elmore-based Design □ Provide circuit design solution □ Elmore delay model □ Multidimensional sweep ■ determine optimal wirelengths and buffersizes ■ Fix B1 to minimum size 3 stage distributed design
9 Elmore-based Design Results 100% 50% * Buffer 1 is fixed to minimum size 1mm 2mm 4mm 8mm 45%55% WirelengthOptimal driver configuration
10 Elmore-based Design Results □ Results ■ Distributed buffering is best with wires > 2mm ■ For all wirelengths, L1 = 0 ■ Delay is tolerant to shifts in buffer placement □ Limitations ■ Complexity related to number of stages ■ RC based Elmore approach Difficult to model multiplexer circuits Accuracy (delay and determining sizes)
11 Method 2: Spice-Based Design multiplexed (mux)distributed (distrib) Designs with best delay/mm Characterization: design(wirelength) buffersizes and delays Divide, characterize and combine…
12 Buffer-Wire Pre-Characterization Buffersize Wirelength Distributed (distrib)
13 Delay Concatenation □ Sum delays of each stage together ■ Fast to compute ■ Accurate (within 4% of HSPICE) Mux stage delay Distributed stage delay + x (N-1)Delay =
14 L0-Sweep □ Remaining Unknowns: ■ L0 (mux stage length) ■ L1 (distributed stage length) □ Length = L0 + L1*(N – 1) □ Sweep L0 for a fixed N Length ???
15 L0-Sweep 2 stage (N=2) Mux Dist L0L1
16 Spice-Based Design Results Wire- length (mm) # of Stages (N) 1 Multiplexed stage N-1 Distributed stages Delay Distrib / Lumped (ps/mm) Driver size B0 (x min.) Length L0 (mm) Driver size B1 (x min.) Length L1 (mm) Distributed design results (180nm, 1x spacing, 1x width) / / / / / 191 Distributed design results (90nm, 2x spacing, 2x width) / / / 125
17 Spice-Based Design Conclusions □ Distributed designs improve over lumped designs on wires longer than 2mm+ □ Longer wires achieve faster delay/mm ■ In an FPGA Multiplexing Interval Multiplexing Interval
18 Multiplexing Interval
19 What about Early Turns? □ Path Delay Profiles show potential improvement of the proposed circuit designs Lumped driver
20 VPR Modifications □ Assess the benefits of distributed buffering design on FPGAs □ Early Turn Model ■ Can compute a path delay profile for VPR □ Fast Path modeling
21 VPR Results MCNC Benchmarks Prior FPT04 Design Lumped driver Lumped + ETM Distributed Distributed + Fast
22 Summary □ Developed interconnect driver design methodology for FPGAs ■ Accounts for multiplexers ■ Examined early turns ■ Identified that longer wires can improve delay efficiency in FPGAs □ Results from VPR ■ Early turn modeling (5-10%) ■ Distributed buffers (2-3%) ■ Fast path (4-9%)
23 Future Work □ Circuit design ■ Advanced Circuits ■ Noise Modeling ■ Power and Area modeling □ CAD ■ Area Modeling ■ Heterogeneous Wiring ■ Detailed Turn Analysis ■ Embedding Delay Concatenation into VPR ■ Runtime Improvements for VPR