ECE 697F Reconfigurable Computing Lecture 5 FPGA Routing

Slides:



Advertisements
Similar presentations
Problem solving with graph search
Advertisements

ECE 667 Synthesis and Verification of Digital Circuits
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 22: December 2, 2005 Routing 2 (Pathfinder)
FPGA-Based System Design: Chapter 4 Copyright  2004 Prentice Hall PTR Topics n Logic synthesis. n Placement and routing.
VLSI Routing. Routing Problem  Given a placement, and a fixed number of metal layers, find a valid pattern of horizontal and vertical wires that connect.
Routing 1 Outline –What is Routing? –Why Routing? –Routing Algorithms Overview –Global Routing –Detail Routing –Shortest Path Algorithms Goal –Understand.
Reconfigurable Computing (EN2911X, Fall07)
EDA (CS286.5b) Day 14 Routing (Pathfind, netflow).
Lecture 9: Multi-FPGA System Software October 3, 2013 ECE 636 Reconfigurable Computing Lecture 9 Multi-FPGA System Software.
Lecture 5: FPGA Routing September 17, 2013 ECE 636 Reconfigurable Computing Lecture 5 FPGA Routing.
ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement.
ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
Introduction to Routing. The Routing Problem Apply after placement Input: –Netlist –Timing budget for, typically, critical nets –Locations of blocks and.
Escape Routing For Dense Pin Clusters In Integrated Circuits Mustafa Ozdal, Design Automation Conference, 2007 Mustafa Ozdal, IEEE Trans. on CAD, 2009.
Global Routing. Global routing:  To route all the nets, should consider capacities  Sequential −One net at a time  Concurrent −Order-independent 2.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.
Topology aggregation and Multi-constraint QoS routing Presented by Almas Ansari.
Solving Hard Instances of FPGA Routing with a Congestion-Optimal Restrained-Norm Path Search Space Keith So School of Computer Science and Engineering.
Placement. Physical Design Cycle Partitioning Placement/ Floorplanning Placement/ Floorplanning Routing Break the circuit up into smaller segments Place.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
FPGA CAD 10-MAR-2003.
Interconnect Driver Design for Long Wires in FPGAs Edmund Lee University of British Columbia Electrical & Computer Engineering MASc Thesis Presentation.
FPGA Routing Pathfinder [Ebeling, et al., 1995] Introduced negotiated congestion During each routing iteration, route nets using shortest.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
Interconnect Driver Design for Long Wires in FPGAs Edmund Lee, Guy Lemieux & Shahriar Mirabbasi University of British Columbia, Canada Electrical & Computer.
Placement and Routing Algorithms. 2 FPGA Placement & Routing.
1 Chapter 13 Mathematical models of networks give us algorithms so computationally efficient that we can employ them to evaluate problems too big to be.
A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer.
Oleg Petelin and Vaughn Betz FPL 2016
VLSI Physical Design Automation
ECE 636 Reconfigurable Computing Lecture 1 Course Introduction Prof
Partial Reconfigurable Designs
ESE534: Computer Organization
Adapted by Dr. Adel Ammar
Give qualifications of instructors: DAP
HeAP: Heterogeneous Analytical Placement for FPGAs
Department of Computer Science
Number Systems Give qualifications of instructors:
Give qualifications of instructors: DAP
Incremental Placement Algorithm for Field Programmable Gate Arrays
ISP and Egress Path Selection for Multihomed Networks
Greedy Algorithms / Interval Scheduling Yin Tat Lee
Objective of This Course
Haim Kaplan and Uri Zwick
EA C461 – Artificial Intelligence
ساختمان داده ها لیستهای پیوندی
Topics Logic synthesis. Placement and routing..
Dynamic FPGA Routing for Just-in-Time Compilation
Routing Algorithms.
Register-Transfer (RT) Synthesis
ECE 697F Reconfigurable Computing Lecture 4 FPGA Placement
Chapter 6 Network Flow Models.
CS137: Electronic Design Automation
Give qualifications of instructors: DAP
NAND and XOR Implementation
CS184a: Computer Architecture (Structure and Organization)
Fast Min-Register Retiming Through Binary Max-Flow
Circuit Analysis Procedure by Dr. M
ICS 252 Introduction to Computer Design
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Under a Concurrent and Hierarchical Scheme
Reconfigurable Computing (EN2911X, Fall07)
CS 151 Digital Systems Design Lecture 1 Course Overview
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

ECE 697F Reconfigurable Computing Lecture 5 FPGA Routing Give qualifications of instructors: DAP teaching computer architecture at Berkeley since 1977 Co-athor of textbook used in class Best known for being one of pioneers of RISC currently author of article on future of microprocessors in SciAm Sept 1995 RY took 152 as student, TAed 152,instructor in 152 undergrad and grad work at Berkeley joined NextGen to design fact 80x86 microprocessors one of architects of UltraSPARC fastest SPARC mper shipping this Fall

Overview Routability Estimation 1 step routing 2 step routing Pathfinder (and Pathfinder derivatives)

Island-Style Devices Two dimensional problem (X+Y)!/(X!Y!) possible paths Restricted within bounding box

FPGA issues Often want a fast answer. May be willing to accept lower quality result for less place/route time. May be interested in knowing wirability without needing the final configuration. Fast placement: constructive placement, iterative improvement through simulated annealing.

Will find shortest path for a single wire, if such a path exists. Maze routing Will find shortest path for a single wire, if such a path exists. Two phases: Label nodes with distance, radiating from source. Use distances to trace from sink to source, choosing a path that always decreases distance to source.

Lee (wavefront) router

Maze Routing L C S Evaluate shortest feasible paths based on a cost function Like row-based device global route allocates channel bandwidth not specific solutions. Formulate cost function as needed to address desired goal.

Two main type of FPGA routers Two phase routers Global router assigns net connections to specific FPGA channels Detailed router “expands” choices in channel to select available wires One phase routers Channel assignment and wire selection happens in one routing pass Two phase routers were initially popular Simpler to write and faster to execute More closely models ASIC routing techniques One phase routers shown to give MUCH better performance

Consider This Example L Connection B 1 2 3 Connection A Connection C Limited resources cause dependencies Need to enumerate possible routes Prioritize based on cost factors.

Graph Expansion 2,2 2,3 3,3 3,4 4,4 L C S 1 1 1 1 1 1 1 1 Enumerate possible expansions Limited by switchbox and connection box.

Detailed graph expansion Consider all possible routes within selected channels Limited by connection block and switch block connectivity

Detailed Routing: Lemieux and Brown Issues to consider Some wires span multiple blocks Need to allocate long connections to long wires Step 1: For each connection, locate all possible routes Step 2: Use complicated cost function to route connections Many limitations No seamless way to deal with nets with high fanout What happens if we run out of tracks? Doesn’t support multiple iterations.

Weighting Graph Costs Select paths that have fewest negative effects on others. Identify paths that are essential for a connection. Sharing factor to enhance high-fanout Selecting longer segments for nets to reduce wire delay Weighting these factors is not an easy task

VPR: One-step FPGA Router Combined global and detailed routing together Uses “expansion wavefront” starting at net source Expansion performed based on node cost

Routing Tradeoffs Bias router to find first, best route. Vary number of node expansions using: pcosti = (1 – a) x pcosti-1 + ncosti + a x disti

VPR versus Sega VPR allows full search of implementation space. Effective for multi-fanout nets 40% better than Sega

Making One-Step Routing Faster How should net fanouts be orders? Does this reduce area? What is the effect on performance? Relatively straightforward algorithm

Making One-Step Routing Faster What happens to the expansion list if there are many sinks? Binning helps speed up router run time. Many high fanout nets now handled by special global routing network.

Dealing with Multi-Fanout Nets Difficult to estimate routing area associated with multi-fanout nets Need to modify cost estimation

Dealing with Multi-Fanout Nets Use the cost factor to help scale bounding box costs associated with placement and routing estimation Very useful in determining likely routing success

Dealing with Multi-Fanout Nets Routing becomes much easier as the amount of routing resources increases

Linear growth of routing time? In the absence of congestion, the routing problem becomes greedy In general wiring should increase at a more than linear rate with circuit size Greedy allocation of resources may obscure this.

Routing Estimation for Island-style Devices Develop approach to use wire length to determine routability of device. Westimate = total wirelength/2*Nblocks* U Difficult to determine U, dependent on Fc and Fs Real difficulty in evaluating “wire length”

Estimating Wirelength Multi-fanout nets present a challenge. Use cost correction factor to adjust bounding box value (e.g. F=4, C=1.08) Not clear what to do for timing driven 84% accuracy for 15 benchmarks (Rose/Swartz) Even works for segmented devices.

Track count estimation Use to determine if device will route successfully Less of an issue for today’s commercial FPGAs Not really helpful for timing-driven compilation

Architectural Limitation Routing architecture necessitates domain selection. Bigger effect for multi-fanout nets

Domain Negotiation Evaluate occupancy of domains adjacent to outputs prior to net routing Rank domains by routability Penalize domains with little or no likelihood of successful completion. Add domain rank to node cost function at net output. pcosti = (1 – a) x pcosti-1 + ncosti + a x disti + rd

Results: Fast Routing – FFT16 Domain negotiation has larger effect as track width is reduced.

Routing Results Summary Average Route Time (s) Tracks : Wmin Tracks: Wmin+40% BFS 709 269 DFS- noneg 647 20 DFS-neg 333 18 Low-stress case shows order of magnitude speedup Routing proportional to wire length Domain negotiation reduces difficult route time by a factor of 2.

Difficult Routing Case Results Circuit Source Logic Blocks DFS-neg Min. Tracks BFS Min. Tracks fft16 RAW 11860 12 ssp96 12041 13 spm16 6632 11 bubble 8453 8 frisc MCNC 3556 15 s38417 6406 S38584.1 6447 clma 8383 16 elliptic NCNC 3849 pdc 4631 21 total 131 Depth-first case needs same track count as minimum breadth-first track count.

Pathfinder Use a non-decreasing history value to represent congestion. Similarities to multi-commodity flow Can be implemented efficiently but does require substantial run time Only update after an interation. ci = (1 + hn * hfac) * (1 + pn * pfac) + bn, n-1

Timing-Driven Routing Add delay cost component to routing. Represent delay along path as RC chain. Buffering important here. Note that timing driven routing selects most distant point for first route. Sets upper bound on delay. Need for combined breadth-first congestion and timing-driven route.

Timing-Driven Routing Difficult to estimate remaining timing along a path Difficult to balance costs for each critical net Some routers attempt to “look-ahead” to anticipate congested or time-critical areas Optimal approaches have generally failed.

Combined Placement and Routing Used depth-first route to select initial connections Swap blocks and rip up attached nets Bias nets that span the bulk of device onto long-line resources. Took 16X longer than place and route 8% to 15% improvement.

Summary Routing a difficult problem based on device size, complexity Early 2-step approaches became obsolete as device sizes grew. Pathfinder extensively used to determine best, shortest paths. Most routers are highly advanced.