FPGA Global Routing Architecture Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

Slides:



Advertisements
Similar presentations
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Advertisements

1 Cleared for Open Publication July 30, S-2144 P148/MAPLD 2004 Rea MAPLD 148:"Is Scaling the Correct Approach for Radiation Hardened Conversions.
ECE 506 Reconfigurable Computing Lecture 6 Clustering Ali Akoglu.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
Clustering of Large Designs for Channel-Width Constrained FPGAs Marvin TomGuy Lemieux University of British Columbia Department of Electrical and Computer.
Lecture 2: Field Programmable Gate Arrays I September 5, 2013 ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Lecture 3: Field Programmable Gate Arrays II September 10, 2013 ECE 636 Reconfigurable Computing Lecture 3 Field Programmable Gate Arrays II.
Evolution of implementation technologies
The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays Steven J.
7/13/ EE4271 VLSI Design VLSI Routing. 2 7/13/2015 Routing Problem Routing to reduce the area.
HARP: Hard-Wired Routing Pattern FPGAs Cristinel Ababei , Satish Sivaswamy ,Gang Wang , Kia Bazargan , Ryan Kastner , Eli Bozorgzadeh   ECE Dept.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Yehdhih Ould Mohammed Moctar1 Nithin George2 Hadi Parandeh-Afshar2
CSET 4650 Field Programmable Logic Devices
EGRE 427 Advanced Digital Design Figures from Application-Specific Integrated Circuits, Michael John Sebastian Smith, Addison Wesley, 1997 Chapter 7 Programmable.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n Circuit design for FPGAs: –Logic elements. –Interconnect.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
Power Reduction for FPGA using Multiple Vdd/Vth
FPGA Switch Block Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
CAD for Physical Design of VLSI Circuits
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.
Open Discussion of Design Flow Today’s task: Design an ASIC that will drive a TV cell phone Exercise objective: Importance of codesign.
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 6: Detailed Routing © KLMH Lienig 1 What Makes a Design Difficult to Route Charles.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Impact of Interconnect Architecture on VPSAs (Via-Programmed Structured ASICs) Usman Ahmed Guy Lemieux Steve Wilton System-on-Chip Lab University of British.
Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.
Introduction to FPGAs Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
Section 1  Quickly identify faulty components  Design new, efficient testing methodologies to offset the complexity of FPGA testing as compared to.
1 Carnegie Mellon University Center for Silicon System Implementation An Architectural Exploration of Via Patterned Gate Arrays Chetan Patel, Anthony Cozzie,
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Directional and Single-Driver Wires in FPGA Interconnect Guy Lemieux Edmund LeeMarvin TomAnthony Yu Dept. of ECE, University of British Columbia Vancouver,
An Improved “Soft” eFPGA Design and Implementation Strategy
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
ESE Spring DeHon 1 ESE534: Computer Organization Day 18: March 26, 2012 Interconnect 5: Meshes (and MoT)
ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.
ESE Spring DeHon 1 ESE534: Computer Organization Day 20: April 9, 2014 Interconnect 6: Direct Drive, MoT.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Oleg Petelin and Vaughn Betz FPL 2016
Sequential Programmable Devices
ESE534: Computer Organization
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Give qualifications of instructors: DAP
ELEN 468 Advanced Logic Design
We will be studying the architecture of XC3000.
Multiple Drain Transistor-Based FPGA Architectures
The Xilinx Virtex Series FPGA
Topics Circuit design for FPGAs: Logic elements. Interconnect.
An Active Glitch Elimination Technique for FPGAs
FPGA Glitch Power Analysis and Reduction
The Xilinx Virtex Series FPGA
EE4271 VLSI Design, Fall 2016 VLSI Channel Routing.
ESE534: Computer Organization
Give qualifications of instructors: DAP
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

FPGA Global Routing Architecture Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223

Effect of the Prefabricated Routing Track Distribution on FPGA Area-Efficiency V. Betz and J. Rose, IEEE Trans. VLSI 6(3): , Sep. 1998

Directional Bias and Non-uniformity ® Directional BiasNon-uniformity

FPGA Aspect Ratio Rectangular architectures increase the device perimeter … which in turn increases the I/O to logic ratio

Logic Pin Positions Full PerimeterTop-Bottom

CAD Flow Vary channel width via binary search Determine the min. channel width that yields a legal routing solution For directional bias and non-uniformity, maintain the correct ratios throughout the search Report averages for multiple benchmark circuits

Directional Bias / Square FPGA Optimal directional bias for full- perimeter pins is square Optimal directional bias for top/bottom pins is 2:1 Full-Perimeter Top-Bottom 8%

Area Efficiency vs. Aspect Ratio (w/Full-perimeter pins) Square is most area-efficient The most area efficient directional bias increases as the aspect ratio of the FPGA increases

Area Efficiency vs. Aspect Ratio As long as horizontal and vertical channel widths are appropriately balanced, aspect ratios (I/O counts) can be increased with minimal impact on core area

Extra-wide Center Channels R W = W center / W edge R C : Ratio of the number of channels having width W center to those having width W edge

Effect of R W and R C on Area Efficiency Greatest area efficiency for (near)- uniform architectures

Are FPGAs More Congested Near the Center? Not significantly!

One Extra-Wide Center Channel? Placement Objective #1 Placement Objective #2 That looks like a pretty good design point!

I/O Channels R I/O = W I/O / W Logic

Routability vs. R I/O (Overly constrained placer) Avg. 12% Favors a uniform allocation of resources across the chip

Conclusion Highest area-efficiency achieved with completely uniform channel capacities across the chip – Reason: Circuits tend to have routing demands that are spread uniformly across the chip Pin placement on logic blocks should match channel capacity distribution Caveat: Results are specific to THIS CAD flow, e.g., placement and routing algorithms, objectives, etc.

FPGA Routing Architecture: Segmentation and Buffering to Optimize Speed and Density V. Betz and J. Rose, International Symposium on FPGAs, 1999

FPGA Routing Architecture

Wire Length Tradeoff Too many short wires? – Long connections will use many short wires – Switches connect wires Increase delay; increase power/energy Too many long wires? – Short connections will use long wires Degrade speed, waste area

Pass Transistors vs. Tristate Buffers Less area Fast for short connections Better for connections that pass through many switches in series

CAD Flow

Switch Options

“End” vs. “Internal” Switches

Uniform Wire Segment Length Long connections must pass through too many buffers Short connections must use long wires For long connections metal resistance degrades speed Longer wires are less flexible; more tracks per channel needed to route

Varying Wire Lengths “[L]ength 4 wires provide an efficient way to make both long and short connections!”

Heterogeneous Routing Architecture 50% of routing tracks are length-4 and are connected by buffered switches 50% have other lengths and are connected by pass transistors Best for area Best for speed Sweet spot?

Heterogeneous Routing Architecture X% of routing tracks are length-4 and are connected by buffered switches (100 – X)% have other lengths and are connected by pass transistors To increase speed, make 17-83% of routing tracks pass-transistor-switched wires Increasing the fraction of routing tracks using length 2, 4, or 8 pass-transistor wires improves FPGA area efficiency up to ~83%

More Observations (no Charts) The best area/delay result is when the pass- transistor switched wires have length 4 or 8 The best architectures contain 50-80% pass- transistor-switch routing tracks – The 50% pass-transistor architectures give the best speed – The 83% pass-transistor architecture yield the best area efficiency

Long Wires / Switch Block Population

Lots of Data

Conclusion FPGAs should contain wires of moderate length – 4 to 8 logic block Mix of tri-state buffers and pass transistors is beneficial – The router (CAD tool) needs to know the difference Reducing switch-block internal population reduces area – 2.5% to 7.5% Significant overall improvements compared to Xilinx XC4000X – In retrospect: that architecture died a long time ago

Should FPGAs Abandon the Pass-Gate? C. Chiasson and V. Betz International Conference on Field Programmable Logic and Applications (FPL), 2013

Key Issues It isn’t 1999 anymore – Pass transistor performance and reliability has degraded as technology has scaled Transmission gates – Larger, but more robust, than pass transistors

Pass Transistor

Transmission Gate Gate Boosting: V SRAM+ > V DD

6-LUT w / Internal Rebuffering

Gate Boosting (Switch Block Mux)

CAD Flow

FPGA Tile Area, Avg. Critical Path Delay, and Power (VTR Benchmarks) Tile Area Avg. Critical Path Delay Avg. Power

Critical Path Delay and Dynamic Power with Decoupled V DD and V G

Power-Delay Product with Decoupled V DD and V G

Tile Area and Critical Path Delay Tile Area Critical Path

Conclusion Transmission gate vs. Pass-transistor FPGAs – 15% larger – 10-25% faster, depending on “gate boosting” Transmission gate with a separate power supply for gate terminal (decoupled results) – 50% power reduction with good delay

Directional and Single-Driver Wires in FPGA Interconnect G. Lemieux, et al. International Conference on Field Programmable Technology (ICFPT), 2004

Uni- and Bi-directional Wires

Switch Block (Length-1 Wires)

Directional Switch Block (Length-3 Wires)

Uni- and Bidirectional CLB Outputs

HSPICE Models Tri-state Single-driver switching elements

Area Overhead Bidir : Bi-directional wires; tri-state switches Dir-tri : Directional wires, tri-state switches Dir :Directional wires, single-driver switches Area savings (15- 34%, per benchmark) increases as channel width increases

Channel Width (Normalized to bidir) dir-tri requires up to 20% more tracks per channel than bidir 17% fewer tracks for spla dir requires fewer tracks than dir-tri Better CLB output connectivity

Transistor Count (Normalized to bidir) dir-tri yields 20% area savings Reducing transistor count reduces CLB area, which tile length (Average shrink length is 14%) dir reduces wire capacitance by 37% by eliminating tri-state drivers

Critical Path Delay (Normalized to bidir) dir-tri increases delay by 3% on average Fanout degradation dir reduced delay by 9% on average dir connects to equal # of tracks per direction (no fanout degradation) Lower capacitance due to length shrinkage

Conclusion Directional, single-driver wiring yields: – 25% area savings (15-34% for individual circuits) – 9% delay reduction (4-16% for individual circuits) – 32% area-delay product (23-45% for individual …) – 37% capacitance reduction No impact on channel width Minimal advantage to mixing uni- and bi- directional wires in the same device

Automatic Generation of FPGA Routing Architectures from High- Level Descriptions V. Betz and J. Rose International Conference on FPGAs, 2000

Parameters Number of logic block input and output pins

Parameters Sides of the logic block from which each I/O pin is accessible

Parameters Number of I/O pads per row/column

Parameters Switch Block topology (next lecture)

Parameters Percentage of tracks to which each CLB input connects (F c,in )

Parameters Percentage of tracks to which each CLB output connects (F c,out )

Parameters F c Values for I/O Pads (F c,pad )

Parameters Wire segment types – Length – % of tracks per channel of this type – Switch type (pass-transistor, tri-state buffer) – Switch block and connection block internal population density

Parameters for Delay Extraction I/O capacitance, equivalent resistance, and intrinsic delay for each switch type Capacitance and resistance of each wire segment type Delays of all combinational and sequential elements in a logic block I/O pad delay

Routing Resource Graph (RRG) (Needed by the Router)

Challenges Many FPGA architectures may satisfy the parameters – We want a GOOD architecture that satisfies them Satisfying all parameters may be difficult or impossible – E.g., F c,in = 100% AND C-block population = 40%

Approach 1.Generate C Block for all 4 sides of each CLB 2.Generate I/O C Block 3.Generate S Block 4.Replicate each pattern and stitch them together to form the 2D array (FPGA)

C Block Generation Challenges Each of the W tracks in a channel should be connected to approximately the same number of CLB input and output pins Each pin should connect to a mix of different wire types (e.g., wires of different lengths) Pins that appear on multiple sides of the CLB should connect to different tracks on each side Logically equivalent pins connect to different tracks

Pathological Switch Topologies Nets starting at out1 can only reach in1 Nets starting at out2 can only reach in2

More Routable Topology Nets starting at either output can reach either input

Unsatisfiable Topology 1. W = 3 tracks per channel 2. All wires have length L=3 3. Each wire has internal switch population of 50% 4. Disjoint switch box topology 5. Routing switches can only connect to the end of a wire segment

Adjust the Segment Start Points

Single Layout Tile

Example Architecture Description

Entire FPGA (Left) / Close-up (Right)

Segment Distribution

Complex Routing Architecture

Conclusion Parameterized architecture generation yields efficient design space exploration – Vaughn Betz and colleagues formed RightTrack CAD Corp., which was bought by Altera – RightTrack’s software was then used to design the Stratix II (killing the Stratix in the process) – Stratix III, IV, V are clear evolutions of the Stratix II