Routing Wire Optimization through Generic Synthesis on FPGA Carry Hadi P. Afshar Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne
FPGAs and ASICs Gaps* Performance – Ratio: 3-4 Area – Ratio: Power – Ratio: 7-15 *I. Kuon and J. Rose, "Measuring the gap between FPGAs and ASICs“, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 26, NO. 2, FEBRUARY 2007, pp. 203 – Routing resources consume ≈60-80% of the chip area and are significant contributors to circuit delay. Concerns: ✘ Lack of generality and flexibility ✘ Underutilization ✘ Change in routing structure How to narrow the gap? Specialized (DSP) blocks Coarser grained logic blocks Hard-wired connections
Carry Chains 3 4-LUT + + CLB 8 Inputs
Motivation Example 4
Problem Definition 5 LUT Mapped Flow Graph Step1: Logic Matching Step2: Chaining
Logic Matching Step1: Enumeration of Programmable Part Step2: Identifying regular and independent segments Step3: Developing alphabet library of the macro cell Step4: Mask division and library matching 6 B LUT + A C in C out
Logic Matching (Example) Step1: Enumeration 7 i3i3 i2i2 i1i1 i0i0 LUT 1 LUT A0A0 B0B0 0001A0A0 B1B1 0010A1A1 B0B0 0011A1A1 B1B1 0100A2A2 B2B2 0101A2A2 B3B3 0110A3A3 B2B2 0111A3A3 B3B3 1000A4A4 B4B4 1001A4A4 B5B5 1010A5A5 B4B4 1011A5A5 B5B5 1100A6A6 B6B6 1101A6A6 B7B7 1110A7A7 B6B6 1111A7A7 B7B7
Logic Matching (Example) Step2: Regular and Independent Segments 8 i3i3 i2i2 i1i1 i0i0 LUT 1 LUT A0A0 B0B0 0001A0A0 B1B1 0010A1A1 B0B0 0011A1A1 B1B1 0100A2A2 B2B2 0101A2A2 B3B3 0110A3A3 B2B2 0111A3A3 B3B3 1000A4A4 B4B4 1001A4A4 B5B5 1010A5A5 B4B4 1011A5A5 B5B5 1100A6A6 B6B6 1101A6A6 B7B7 1110A7A7 B6B6 1111A7A7 B7B7
Logic Matching (Example) Step3: Alphabet library of the cell 9 LUT 1 LUT 2 C in 8-bit alphabets of configuration mask dictionary A0A0 B0B … A0A0 B1B … A1A1 B0B … A1A1 B1B … A0A0 B0B … A0A0 B1B … A1A1 B0B … A1A1 B1B … A 0 = 0 A 1 = 0 B 0 = 0 B 1 = 0 A 0 = 1 A 1 = 0 B 0 = 0 B 1 = 0 A 0 = 0 A 1 = 1 B 0 = 0 B 1 = 0 A 0 = 1 A 1 = 1 B 0 = 0 B 1 = 0 A 0 = 0 A 1 = 0 B 0 = 1 B 1 = 0
Logic Matching (Example) Step4: Mask segmented matching 10 8-bit Library
How much we gain? Assume that mask is 32-bit – N Segments – M Patterns in each segment – Our Library Size = Bits – Num of all configurations = 11 Order of magnitudes less memory Order of magnitudes less comparisons
Chaining Heuristic 12 Input Output Input Output Input Output We need to find chains of functions, which are mappable to the macrocell, to be placed on the carry chains
Synthesis and Chaining Results BenchmarkChainableChained Max Chain Length Average Chain Length alu474%39%43.5 pdc69%35%63.9 misex368%42%43.1 ex101071%41%53.4 ex5p72%40%43.5 des*65%31%33.0 apex273%42%43.6 apex475%39%43.7 spla72%43%64.2 seq69%38%43.4 Average70%39% * The minimum threshold for the chain length is 4, except for “des” which is 3.
Experimental Methodology 14 Goal: Extract chains of eligible functions from the synthesized netlist in order to place them on the logic chains; the non- chained ones are remained unchanged. Our Synthesis Engine Logic Matching Chaining Heuristic Netlist Generation VQM Parser DAG Generation Quartus-II LUT Mapping & Syn Quartus-II Place & Route
Local Routing Wires 15 26% saving in local wires number
Total Wire Lengths 16 9% saving in total wire lengths
Delay 17 3% delay penalty due to large in-out delay of the adder
Conclusion 18 Narrow the FPGA and ASIC Gaps Lighten the stress on routing resources Hardwired connections + Dedicated logic Improved Routability with a Lighter Network
19 Thanks for your attention.