Download presentation
Presentation is loading. Please wait.
Published byShon Black Modified over 9 years ago
1
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada
2
2 Contributions Two new FPGA benchmark circuit “suites” –Meta Circuit: mimic “System-on-Chip” design by randomly “stitching” real designs –Stdev: synthetic clones of Meta Circuit, used to vary interconnect demand Two new FPGA CAD flows –DHPack: Design Hierarchy Packing Identify congested IP blocks depopulate reduced interconnect demand Conference paper: “Logic Block Clustering…”, published at DAC 2005 –Un/DoPack: UnPack and DoPack Find “local” interconnect congestion depopulate reduced interconnect demand Conference paper, submitted to DAC 2006 Discoveries… –“Non-uniform” depopulation limits area inflation –“BLE limiting” gives better interconnect controllability than “Input limiting” –“Interconnect variation” important for area inflation and FPGA architecture design –“Routing closure” achieved by re-clustering and incremental place & route UNROUTABLE circuits made ROUTABLE buy an FPGA with MORE LOGIC!!!
3
3 Mesh-Based FPGA Architecture 9 logic blocks 4 wires per channel 3*4=12 total horizontal tracks LLLLLLLLLLLLLLLLLLLLL L L L L Larger FPGAs have more “aggregate” interconnect 16 logic blocks 4 wires per channel 4*4=16 total horizontal tracks
4
4 Logic Utilization vs. Channel Width Trade-off logic utilization for channel width –User can always buy more logic…. (not more wires) FPGA 1FPGA 2 LLLL LLLL LLLL LLLL LLLL LLLL LLLL LLLL L L L L LLLLL Trade-off: CLB count for Channel width But….. can we achieve lower Total Area? ( = SIZE * CLB Count) ( No! but we can break even! )
5
5 Logic Element: BLE and CLB Basic Logic Element (BLE) –‘k’-input LUT + FF Configurable Logic Block (CLB) –‘N’ BLEs, ‘N’ outputs –‘ I ’ shared inputs ‘ I ’ Inputs ‘N’ Outputs BLE #1 BLE #2 BLE #3 BLE #4 BLE #5 CLB LLLL LLLL LLLL LLLL Note: I < k*N
6
6 CLB Depopulation General Approach –Use existing clustering tools –Do not fill CLB while clustering 1.Input-Limited Eg. Maximum 67% input utilization per CLB Might use all BLEs 2.BLE-Limited Eg. Maximum 60% BLE utilization per CLB Might use all Inputs BLE #1 BLE #2 BLE #3 BLE #4 BLE #5 CLB ‘ I ’ Inputs ‘N’ Outputs
7
7 Reducing Channel Width Results (max cluster size 16, max num inputs 51) Input-Limited No channel width control BLE-Limited (almost) monotonically increasing good channel width control
8
8 Meta Benchmark Circuit Creation Mimic process of creating large designs –“IP Blocks” MCNC Circuits –SoC Randomly integrate/stitch together “IP Blocks” –IP Blocks have varied interconnect needs Considered 3 stitching schemes… –Independent IP Blocks are not connected to each other –Pipeline Outputs of one IP block connected to inputs of next IP block –Clique Outputs of each IP block are uniformly distributed to inputs of all other IP blocks
9
9 DHPack: Meta Circuit P&R Use VPR FPGA tools from University of Toronto Observation 1 –VPR placer successfully groups IP blocks from random initial placement Observation 2 –VPR router confirms channel width of MetaCircuit is dominated by a few IP blocks { pdc, clma, ex1010 }
10
10 1 Channel Width Constraint Normalized Area DHPack: Meta Circuit P&R Results Clique MetaCircuit –P&R channel width results closely match “constraints” Shrink Channel Width by ~20% (from 95 to 75), NO AREA INCREASE by ~50% (from 95 to 50), 1.7x area increase Channel Width Constraint Channel Width ConstraintRouted
11
11 Meta Circuits vs. Stdev Circuits Meta Circuit Drawbacks –Design hierarchy boundaries not well-defined –Coarse-grained IP block boundary –Stitching unrealistic Flip Flop placed at every output Connections only have FO1 Stdev Circuits (created using GNL) –Synthetic clone of Meta circuits –Hierarchical specify Rent parameter of each partition Root # I/Os, # IP blocks Second Level 20 IP blocks, # LEs, Rent parameter
12
12 Stdev Circuits: Rent Parameters 7 benchmark circuits 240/120 primary inputs/outputs, approx 52,000 CLBs Rent parameter: Average 0.62, vary Stdev 0.0 to 0.12
13
13 Un/DoPack Flow Iterative non-uniform cluster depopulation tool Step 1: Traditional SIS/VPR Step 2: UnPack: –Congestion Calculator Step 3: DoPack: –Incremental Re-Cluster Step 4,5: Fast Place/Route
14
14 Un/DoPack Flow: SIS/VPR Step 1: Traditional SIS/VPR
15
15 Un/DoPack Flow: SIS/VPR Step 1: Traditional SIS/VPR
16
16 Un/DoPack Flow: SIS/VPR Step 1: Traditional SIS/VPR
17
17 Un/DoPack Flow: UnPack Step 2: UnPack –Generate Congestion Map –CLB Label = Largest CW occ in 4 adjacent channels
18
18 Un/DoPack Flow: UnPack Step 2: UnPack: Depop Center = Largest CLB label M X M Array
19
19 Un/DoPack Flow: UnPack Step 2: UnPack: Depop Radius = M/4 Depop Amt: 1 new row/col in array M X M Array
20
20 Un/DoPack Flow: DoPack Step 3: DoPack: –Incremental Re-Cluster
21
21 Un/DoPack Flow: Fast P&R Step 4,5: Fast Place/Route Fast Placement –UBC Incremental Placer (under development) –VPR “–fast” option Router –Use full routed solution Slow but reliable
22
22 Before120/79/27 After100/79/20 Peak / Avg / Stddev
23
23 Normalized Area of GNL Benchmarks
24
24 Absolute Area of GNL Benchmarks
25
25 Interconnect Variation: Impact on FPGA Architecture Design High Variation Circuits Require Wide Channel Width
26
26 Contributions Two new FPGA benchmark circuit “suites” –Meta Circuit: mimic “System-on-Chip” design by randomly “stitching” real designs –Stdev: synthetic clones of Meta Circuit, used to vary interconnect demand Two new FPGA CAD flows –DHPack: Design Hierarchy Packing Identify congested IP blocks depopulate reduced interconnect demand Conference paper: “Logic Block Clustering…”, published at DAC 2005 –Un/DoPack: UnPack and DoPack Find “local” interconnect congestion depopulate reduced interconnect demand Conference paper, submitted to DAC 2006 Discoveries… –“Non-uniform” depopulation limits area inflation –“BLE limiting” gives better interconnect controllability than “Input limiting” –“Interconnect variation” important for area inflation and FPGA architecture design –“Routing closure” achieved by re-clustering and incremental place & route UNROUTABLE circuits made ROUTABLE buy an FPGA with MORE LOGIC!!!
27
End of Talk
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.