Interconnect Driver Design for Long Wires in FPGAs Edmund Lee University of British Columbia Electrical & Computer Engineering MASc Thesis Presentation.

Slides:

Advertisements

Similar presentations

Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.

Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network

1 Wire-driven Microarchitectural Design Space Exploration School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332,

Advanced Interconnect Optimizations. Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 RAT = 700 Delay = 600 Slack = 100 RAT = 300 Delay = 250 Slack.

ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.

Improving Placement under the Constant Delay Model Kolja Sulimma 1, Ingmar Neumann 1, Lukas Van Ginneken 2, Wolfgang Kunz 1 1 EE and IT Department University.

Chop-SPICE: An Efficient SPICE Simulation Technique For Buffered RC Trees Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of.

High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department

Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.

Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.

ISQED’2015: D. Seemuth, A. Davoodi, K. Morrow 1 Automatic Die Placement and Flexible I/O Assignment in 2.5D IC Design Daniel P. Seemuth Prof. Azadeh Davoodi.

Layer Assignment Algorithm for RLC Crosstalk Minimization Bin Liu, Yici Cai, Qiang Zhou, Xianlong Hong Tsinghua University.

ER UCLA UCLA ICCAD: November 5, 2000 Predictable Routing Ryan Kastner, Elaheh Borzorgzadeh, and Majid Sarrafzadeh ER Group Dept. of Computer Science UCLA.

Lecture 3: Field Programmable Gate Arrays II September 10, 2013 ECE 636 Reconfigurable Computing Lecture 3 Field Programmable Gate Arrays II.

An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,

ASIC vs. FPGA – A Comparisson Hardware-Software Codesign Voin Legourski.

FPGA Defect Tolerance: Impact of Granularity Anthony YuGuy Lemieux December 14, 2005.

EE4271 VLSI Design Advanced Interconnect Optimizations Buffer Insertion.

Circuit Performance Variability Decomposition Michael Orshansky, Costas Spanos, and Chenming Hu Department of Electrical Engineering and Computer Sciences,

Ryan Kastner ASIC/SOC, September Coupling Aware Routing Ryan Kastner, Elaheh Bozorgzadeh and Majid Sarrafzadeh Department of Electrical and Computer.

ELEN 468 Lecture 271 ELEN 468 Advanced Logic Design Lecture 27 Interconnect Timing Optimization II.

The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays Steven J.

Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.

SLIP 2000April 9, Wiring Layer Assignments with Consistent Stage Delays Andrew B. Kahng (UCLA) Dirk Stroobandt (Ghent University) Supported.

Lecture 5: FPGA Routing September 17, 2013 ECE 636 Reconfigurable Computing Lecture 5 FPGA Routing.

Effects of Global Interconnect Optimizations on Performance Estimation of Deep Sub-Micron Design Yu Cao, Chenming Hu, Xuejue Huang, Andrew B. Kahng, Sudhakar.

Statistical Critical Path Selection for Timing Validation Kai Yang, Kwang-Ting Cheng, and Li-C Wang Department of Electrical and Computer Engineering University.

UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.

On-Chip Interconnect Analysis and Evaluation of Delay, Power, and Bandwidth Metrics under Different Design Goals.

ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement.

A Methodology for Interconnect Dimension Determination By: Jeff Cobb Rajesh Garg Sunil P Khatri Department of Electrical and Computer Engineering, Texas.

Yehdhih Ould Mohammed Moctar1 Nithin George2 Hadi Parandeh-Afshar2

Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.

Power Reduction for FPGA using Multiple Vdd/Vth

Titan: Large and Complex Benchmarks in Academic CAD

Coarse and Fine Grain Programmable Overlay Architectures for FPGAs

EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Static Timing Analysis and Gate Sizing.

LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.

Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.

Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.

Elmore Delay, Logical Effort

New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,

University of British Columbia Dept. of Electrical and Computer Engineering November 30, 2007 A Combined Clustering and Placement Algorithm for FPGAs Mark.

Optimal digital circuit design Mohammad Sharifkhani.

Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.

Impact of Interconnect Architecture on VPSAs (Via-Programmed Structured ASICs) Usman Ahmed Guy Lemieux Steve Wilton System-on-Chip Lab University of British.

Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.

Pattern Sensitive Placement For Manufacturability Shiyan Hu, Jiang Hu Department of Electrical and Computer Engineering Texas A&M University College Station,

Incremental Placement Algorithm for Field Programmable Gate Arrays David Leong Advisor: Guy Lemieux University of British Columbia Department of Electrical.

1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.

Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation

Modern VLSI Design 4e: Chapter 3 Copyright  2008 Wayne Wolf Topics n Wire delay. n Buffer insertion. n Crosstalk. n Inductive interconnect. n Switch logic.

Parallel Routing for FPGAs based on the operator formulation

Routing Tree Construction with Buffer Insertion under Obstacle Constraints Ying Rao, Tianxiang Yang Fall 2002.

Directional and Single-Driver Wires in FPGA Interconnect Guy Lemieux Edmund LeeMarvin TomAnthony Yu Dept. of ECE, University of British Columbia Vancouver,

An Improved “Soft” eFPGA Design and Implementation Strategy

Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Wire delay. n Buffer insertion. n Crosstalk. n Inductive interconnect.

FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

-1- Delay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products Samyoung Bang #, Kwangsoo Han ‡, Andrew B.

SEMI-SYNTHETIC CIRCUIT GENERATION FOR TESTING INCREMENTAL PLACE AND ROUTE TOOLS David GrantGuy Lemieux University of British Columbia Vancouver, BC.

Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of.

Interconnect Driver Design for Long Wires in FPGAs Edmund Lee, Guy Lemieux & Shahriar Mirabbasi University of British Columbia, Canada Electrical & Computer.

1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.

Oleg Petelin and Vaughn Betz FPL 2016

Runtime-Quality Tradeoff in Partitioning Based Multithreaded Packing

Jason Cong, David Zhigang Pan & Prasanna V. Srinivas

Crosstalk Noise in FPGAs

A New Hybrid FPGA with Nanoscale Clusters and CMOS Routing Reza M. P

Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.

Jason Cong, David Zhigang Pan & Prasanna V. Srinivas

Presentation transcript:

Interconnect Driver Design for Long Wires in FPGAs Edmund Lee University of British Columbia Electrical & Computer Engineering MASc Thesis Presentation

2 Contributions □ First attempt to combine repeater insertion with FPGA interconnect design □ Produced 3 interconnect driver design methodology for FPGAs ■ Lumped driver design ■ Distributed design Elmore-based HSpice-based □ Quantified significance of Early Turn Modeling and Fast Paths □ Paper submitted to FPT 2006

3 Outline □ Motivation and Background □ Problem Description and Goals □ Driver Design Approaches ■ Method 1: Elmore-based ■ Method 2: SPICE-based □ CAD Modeling, VPR Results □ Summary

4 Motivation □ Deep submicron interconnect delay is increasing □ Interconnect delay is a large component of FPGA delay □ Only part of a wire is used in FPGAs ■ Critical sink locations unknown ■ Improve all midpoint delays Sink 1 Sink 2 Sink 3 Sink 4

5 Problem Description Given: Wire RC, total wire length Find: Buffer sizes, buffer locations, # of buffers

6 Background

7 Method 1: Elmore-based Design □ Provide circuit design solution □ Elmore delay model □ Multidimensional sweep ■ determine optimal wirelengths and buffersizes ■ Fix B1 to minimum size 3 stage distributed design

8 Elmore-based Design Results 100% 50% * Buffer 1 is fixed to minimum size 1mm 2mm 4mm 8mm 45%55% WirelengthOptimal buffer configuration

9 Elmore-based Design Results □ Results ■ Distributed buffering is best with wires > 2mm ■ For all wirelengths, L1 = 0 ■ Delay is tolerant to shifts in buffer placement □ Limitations ■ Complexity related to number of stages ■ RC based Elmore approach Difficult to model multiplexer circuits Accuracy (delay and determining sizes)

10 Method 2: Spice-Based Design multiplexed (mux)distributed (distrib) Designs with best delay/mm Characterization: design(wirelength)  buffersizes and delays Divide, characterize and combine…

11 Buffer-Wire Pre-Characterization Buffersize Wirelength Distributed (distrib) Multiplexed (mux)

12 Delay Concatenation □ Sum delays of each stage together ■ Fast to compute ■ Accurate (within 4% of HSPICE) □ Calculation can be embedded into VPR Mux stage delay Distributed stage delay + x (N-1)Delay =

13 L0-Sweep □ Remaining Unknown: L0 and L1 □ Length = L0 + L1*(N – 1) □ Sweep L0 for a fixed N

14 L0-Sweep 2 stage (N=2) Mux Dist L0L1

15 Spice-Based Design Results Wire- length (mm) # of Stages (N) 1 Multiplexed stage N-1 Distributed stages Delay Distrib / Lumped (ps/mm) Driver size B0 (x min.) Length L0 (mm) Driver size B1 (x min.) Length L1 (mm) Distributed design results (180nm, 1x spacing, 1x width) / / / / / 191 Distributed design results (90nm, 2x spacing, 2x width) / / / 125

16 Spice-Based Design Conclusions □ Distributed designs improve over lumped designs on wires longer than 2mm+ □ Longer wires achieve faster delay/mm ■ In an FPGA  Multiplexing Interval Multiplexing Interval

17 Multiplexing Interval

18 What about Early Turns? □ Path Delay Profiles show potential improvement of the proposed circuit designs Lumped driver

19 VPR Modifications □ Assess the benefits of distributed buffering design on FPGAs □ Early Turn Model ■ Can compute a path delay profile for VPR □ Fast Path modeling

20 VPR Results MCNC Benchmarks Prior FPT04 Design Lumped driver Lumped + ETM Distributed Distributed + Fast

21 VPR Turn Data □ Normal turns went down… ■ Are normal turns not important?

22 Summary □ Developed interconnect driver design methodology for FPGAs ■ Identified that longer wires can improve delay efficiency in FPGAs □ Results from VPR ■ Early turn modeling (5-10%) ■ Distributed buffers (2-3%) ■ Fast path (4-9%)

23 Contributions □ First attempt to combine repeater insertion with FPGA interconnect design □ Produced 3 interconnect driver design methodology for FPGAs ■ Lumped driver design ■ Distributed design Elmore-based HSpice-based □ Quantified significance of Early Turn Modeling and Fast Paths □ Paper submitted to FPT 2006

24 Future Work □ Circuit design ■ Advanced Circuits ■ Noise Modeling ■ Power and Area modeling □ CAD ■ Area Modeling ■ Heterogeneous Wiring ■ Detailed Turn Analysis ■ Embedding Delay Concatenation into VPR ■ Runtime Improvements for VPR