Interconnect Driver Design for Long Wires in FPGAs Edmund Lee University of British Columbia Electrical & Computer Engineering MASc Thesis Presentation
2 Contributions □ First attempt to combine repeater insertion with FPGA interconnect design □ Produced 3 interconnect driver design methodology for FPGAs ■ Lumped driver design ■ Distributed design Elmore-based HSpice-based □ Quantified significance of Early Turn Modeling and Fast Paths □ Paper submitted to FPT 2006
3 Outline □ Motivation and Background □ Problem Description and Goals □ Driver Design Approaches ■ Method 1: Elmore-based ■ Method 2: SPICE-based □ CAD Modeling, VPR Results □ Summary
4 Motivation □ Deep submicron interconnect delay is increasing □ Interconnect delay is a large component of FPGA delay □ Only part of a wire is used in FPGAs ■ Critical sink locations unknown ■ Improve all midpoint delays Sink 1 Sink 2 Sink 3 Sink 4
5 Problem Description Given: Wire RC, total wire length Find: Buffer sizes, buffer locations, # of buffers
6 Background
7 Method 1: Elmore-based Design □ Provide circuit design solution □ Elmore delay model □ Multidimensional sweep ■ determine optimal wirelengths and buffersizes ■ Fix B1 to minimum size 3 stage distributed design
8 Elmore-based Design Results 100% 50% * Buffer 1 is fixed to minimum size 1mm 2mm 4mm 8mm 45%55% WirelengthOptimal buffer configuration
9 Elmore-based Design Results □ Results ■ Distributed buffering is best with wires > 2mm ■ For all wirelengths, L1 = 0 ■ Delay is tolerant to shifts in buffer placement □ Limitations ■ Complexity related to number of stages ■ RC based Elmore approach Difficult to model multiplexer circuits Accuracy (delay and determining sizes)
10 Method 2: Spice-Based Design multiplexed (mux)distributed (distrib) Designs with best delay/mm Characterization: design(wirelength) buffersizes and delays Divide, characterize and combine…
11 Buffer-Wire Pre-Characterization Buffersize Wirelength Distributed (distrib) Multiplexed (mux)
12 Delay Concatenation □ Sum delays of each stage together ■ Fast to compute ■ Accurate (within 4% of HSPICE) □ Calculation can be embedded into VPR Mux stage delay Distributed stage delay + x (N-1)Delay =
13 L0-Sweep □ Remaining Unknown: L0 and L1 □ Length = L0 + L1*(N – 1) □ Sweep L0 for a fixed N
14 L0-Sweep 2 stage (N=2) Mux Dist L0L1
15 Spice-Based Design Results Wire- length (mm) # of Stages (N) 1 Multiplexed stage N-1 Distributed stages Delay Distrib / Lumped (ps/mm) Driver size B0 (x min.) Length L0 (mm) Driver size B1 (x min.) Length L1 (mm) Distributed design results (180nm, 1x spacing, 1x width) / / / / / 191 Distributed design results (90nm, 2x spacing, 2x width) / / / 125
16 Spice-Based Design Conclusions □ Distributed designs improve over lumped designs on wires longer than 2mm+ □ Longer wires achieve faster delay/mm ■ In an FPGA Multiplexing Interval Multiplexing Interval
17 Multiplexing Interval
18 What about Early Turns? □ Path Delay Profiles show potential improvement of the proposed circuit designs Lumped driver
19 VPR Modifications □ Assess the benefits of distributed buffering design on FPGAs □ Early Turn Model ■ Can compute a path delay profile for VPR □ Fast Path modeling
20 VPR Results MCNC Benchmarks Prior FPT04 Design Lumped driver Lumped + ETM Distributed Distributed + Fast
21 VPR Turn Data □ Normal turns went down… ■ Are normal turns not important?
22 Summary □ Developed interconnect driver design methodology for FPGAs ■ Identified that longer wires can improve delay efficiency in FPGAs □ Results from VPR ■ Early turn modeling (5-10%) ■ Distributed buffers (2-3%) ■ Fast path (4-9%)
23 Contributions □ First attempt to combine repeater insertion with FPGA interconnect design □ Produced 3 interconnect driver design methodology for FPGAs ■ Lumped driver design ■ Distributed design Elmore-based HSpice-based □ Quantified significance of Early Turn Modeling and Fast Paths □ Paper submitted to FPT 2006
24 Future Work □ Circuit design ■ Advanced Circuits ■ Noise Modeling ■ Power and Area modeling □ CAD ■ Area Modeling ■ Heterogeneous Wiring ■ Detailed Turn Analysis ■ Embedding Delay Concatenation into VPR ■ Runtime Improvements for VPR