Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc. San Jose, CA, USA *Work performed at University of British Columbia David Leong University of British Columbia Vancouver, BC, Canada Guy Lemieux University of British Columbia Vancouver, BC, Canada
2 Overview Introduction, Goals and Motivation –Reduce channel width, lower cost, make circuits “routable” Benchmark Circuits –Varying amount of interconnect variation Un/DoPack CAD Tool: –Iterative channel width reduction by whitespace insertion Results Conclusion
3 Overview Introduction, Goals and Motivation –Reduce channel width, lower cost, make circuits “routable” Benchmark Circuits –Varying amount of interconnect variation Un/DoPack CAD Tool: –Iterative channel width reduction by whitespace insertion Results Conclusion
4 Mesh-Based FPGA Architecture 9 logic blocks 4 wires per channel 3*4=12 total horizontal tracks LLLLLLLLLLLLLLLLLLLLL L L L L Larger FPGAs have more “aggregate” interconnect 16 logic blocks 4 wires per channel 4*4=16 total horizontal tracks
5 Motivation: Area of FPGA Devices Number of Layout Tiles SIZE of Layout Tile Total Layout AREA = SIZE * Number MCNC Circuits Mapped onto an FPGA
6 Motivation: Channel Width Demand Logic Range User buys bigger device. Interconnect Range User has no choice! Devices built for worst-case channel width (fixed width) Interconnect dominates area (>70%) MCNC Circuits Mapped onto an FPGA
7 Goal: Reduce Channel Width But { apex4, elliptic, frisc, ex1010, spla, pdc } are unroutable…. Can we make them routable in a Constrained FPGA? Altera Cyclone Channel width constraint of 80 routing tracks Constrained FPGA Channel width constraint of 60 routing tracks Smaller area, lower cost for low-channel-width circuits
8 Possible Solution Trade-off logic utilization for channel width –User can always buy more logic…. (not more wires) FPGA 1FPGA 2 LLLL LLLL LLLL LLLL LLLL LLLL LLLL LLLL L L L L LLLLL Trade-off: CLB count for Channel width What about area??
9 Features and Costs of Two FPGA Families Sample Benchmark Circuit –10,000 LEs –150 Routing Tracks –No Multipliers –100 K Memory Altera DeviceLEsMemoryMult.RoutingCost Cyclone 1C1212,060239,616080$56 Stratix 1S1010,570920, $190 Cyclone 1C2020,060294,912080$100 Stratix 1S2018,4601,669, $350 Sample Benchmark Circuit –20,000 LEs –75 Routing Tracks
10 Overview Introduction, Goals and Motivation –Reduce channel width, lower cost, make circuits “routable” Benchmark Circuits –Varying amount of interconnect variation Un/DoPack CAD Tool: –Iterative channel width reduction by whitespace insertion Results Conclusion
11 GNL Circuit Benchmark Suite Create benchmark circuits with variation –SoC Randomly integrate/stitch together “IP Blocks” –IP Blocks have varied interconnect needs Generate Netlist (GNL) Ghent University –Synthetic benchmark generator GNL circuits generated hierarchically –Root # I/Os, # IP blocks –Second Level 20 IP blocks, # LEs, Rent parameter
12 Rent Linear Interpolation 7 benchmark circuits Average Rent = 0.62, Stdev Rent = 0 /120 primary inputs/outputs
13 Overview Introduction, Goals and Motivation –Reduce channel width, lower cost, make circuits “routable” Benchmark Circuits –Varying amount of interconnect variation Un/DoPack CAD Tool: –Iterative channel width reduction by whitespace insertion Results Conclusion
14 Un/DoPack Flow Iterative non-uniform cluster depopulation tool Step 1: Traditional SIS/VPR Step 2: UnPack: –Congestion Calculator Step 3: DoPack: –Incremental Re-Cluster Step 4,5: Fast Place/Route
15 Un/DoPack Flow: SIS/VPR Step 1: Traditional SIS/VPR
16 Un/DoPack Flow: SIS/VPR Step 1: Traditional SIS/VPR
17 Un/DoPack Flow: SIS/VPR Step 1: Traditional SIS/VPR
18 Un/DoPack Flow: UnPack Step 2: UnPack: –Congestion Calculator
19 Un/DoPack Flow: UnPack Step 2: UnPack –Generate Congestion Map –CLB Label = Largest CW occ in 4 adjacent channels
20 Un/DoPack Flow: UnPack Step 2: UnPack: –Depop Center = Largest CLB label M X M Array
21 Un/DoPack Flow: UnPack Step 2: UnPack: –Option 1 Coarse Grain: Dpop Radius = M/4 Dpop Amt: 1 new row/col in array M X M Array
22 Un/DoPack Flow: UnPack Step 2: UnPack: –Option 2 Fine Grain: Dpop Radius = M/4, M/5, M/6, M/8 Dpop Amt: 1 new row/col in region M X M Array
23 Un/DoPack Flow: DoPack Step 3: DoPack: –Incremental Re-Cluster
24 Un/DoPack Flow: Fast P&R Step 4,5: Fast Place/Route
25 Un/DoPack Flow: Fast P&R Step 4,5: Fast Place/Route Fast Placement –UBC Incremental Placer (under development) –VPR –fast Fast Router –Use illegal pathfinder solution from first iterations Unsuccessful so far –Use full routed solution Slow but reliable
26 Overview Introduction, Goals and Motivation –Reduce channel width, lower cost, make circuits “routable” Benchmark Circuits –Varying amount of interconnect variation Un/DoPack CAD Tool: –Iterative channel width reduction by whitespace insertion Results Conclusion
27 Un/DoPack: Baseline Flow UnPack: Coarse grained congestion calculator DoPack: iRAC replica Fast Place: UBC Incremental Placer Fast Route: None FPGA Architecture: –LUT size (k) = 6 –Cluster size (N) = 16 –Inputs per cluster (I) = 51 –Wires of length (L) = 4
28 Area of GNL Benchmarks
29 Interconnect Variation: Impact on FPGA Architecture Design High Variation Circuits Require Wide Channel Width
30 Critical Path of GNL Benchmarks
31 Un/DoPack Congestion Map Before After Un/DoPack
32 Multi-Region Un-Pack Depopulate multiple regions at once –Depopulate each region separately –Smaller radius = M/10 Handle overlapping regions
33 Normalized Area
34 Normalized Critical Path
35 Run-Time Comparisons
36 Conclusion Un/DoPack: FPGA CAD flow –Find “local” congestion depopulate reduced interconnect demand FPGA benchmark circuit “suite” –Stdev: Used to vary interconnect demand Discoveries… –“Non-uniform” depopulation limits area inflation –“Interconnect variation” important for area inflation and FPGA architecture design –“Routing closure” achieved by re-clustering and incremental place & route UNROUTABLE circuits made ROUTABLE buy an FPGA with MORE LOGIC!!!
End of Talk