Download presentation
Presentation is loading. Please wait.
Published byHarvey Ford Modified over 9 years ago
1
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Improving Synthesis of Compressor Trees on FPGAs via Integer Linear Programming
2
March 14, 20082 Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion
3
March 14, 20083 Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion
4
March 14, 20084 Motivation: Why multi-input addition is important? Partial product reduction in parallel multiplication Wallace and Dadda in the 1960s Multi-input addition occurs in many multimedia and signal processing H.264/AVC Variable Block Size Motion Estimation FIR Filters 3G Wireless Base Station Channel Cards Flow graph transformations expose opportunities to use compresor trees in high-level synthesis [ Verma and Ienne, ICCAD 2004 ]
5
March 14, 20085 Multi Input Addition Implementation ASIC Compressor Trees + Final Adder Counters are the basic blocks Wallace/Dadda/3-Greedy FPGA Adder Trees Full Adder Implemented in CLB Structure Fast Carry-Chain (Xilinx and Altera) Reduces Routing Delay Compressor Trees have poor performance Fast carry chains can not be used Counters are inflexible GOAL: Better implementation of compressor trees on FPGAs
6
March 14, 20086 Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion
7
March 14, 20087 Generalized Parallel Counters (GPCs) Parallel Counter: Sum bits with the same rank Generalized Parallel Counter: Sum bits having different ranks Example GPCs are more flexible and reduce the number of logic levels GPCs are more complex, but the additional complexity is absorbed in LUTs! GPCs are perfect building blocks to create better compressors out of FPGA LUTs (3; 2) Counter(3, 3; 4) GPC
8
March 14, 20088 GPC Implementation K-LUT GPC N N K K
9
March 14, 20089 Goal How to best select GPC types and connect them to build a compressor tree 0 12 3 Rank
10
March 14, 200810 Goal How to best select GPC types and connect them to build a compressor tree 0 12 3 Rank
11
March 14, 200811 Goal How to best select GPC types and connect them to build a compressor tree 0 12 3 Rank
12
March 14, 200812 Goal How to best select GPC types and connect them to build a compressor tree 0 12 3 Rank
13
March 14, 200813 Goal How to best select GPC types and connect them to build a compressor tree 0 12 3 Rank
14
March 14, 200814 Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion
15
March 14, 200815 ILP Formulation GPC ki = 0 ki = 1 kj = 0 kj = 1 kj = 2 Objective Function Minimizing Levels of GPCs GPC Representation in ILP
16
March 14, 200816 ILP Formulation Variables p m,i,ki {0, 1} – True if there is a connection between the m-th input bit and an input of rank k i of GPC i. m0m0 m1m1 m2m2 GPC 1 e 1,2,0,1 e 0,2,1,0 p 0,0,0 p 1,0,1 p 2,1,0 q 0,0,0 q 2,1,1 q 1,2,2 n0n0 n2n2 n1n1 GPC 0 GPC 2 n3n3 m3m3 D 3,3
17
March 14, 200817 ILP Formulation Variables q i,ki,m {0, 1} – True if there is a connection between the ki-th output of GPC i and an output bit of rank m. m0m0 m1m1 m2m2 GPC 1 e 1,2,0,1 e 0,2,1,0 p 0,0,0 p 1,0,1 p 2,1,0 q 0,0,0 q 2,1,1 q 1,2,2 n0n0 n2n2 n1n1 GPC 0 GPC 2 n3n3 m3m3 D 3,3
18
March 14, 200818 ILP Formulation Variables e i,j,ki,kj {0, 1} – True if there is a connection from the ki-th output of GPC i and an input of rank k j of GPC j. m0m0 m1m1 m2m2 GPC 1 e 1,2,0,1 e 0,2,1,0 p 0,0,0 p 1,0,1 p 2,1,0 q 0,0,0 q 2,1,1 q 1,2,2 n0n0 n2n2 n1n1 GPC 0 GPC 2 n3n3 m3m3 D 3,3
19
March 14, 200819 ILP Formulation Variables D i,j {0, 1} – True if there is a direct connection from the ith input bit and an output bit of rank j. m0m0 m1m1 m2m2 GPC 1 e 1,2,0,1 e 0,2,1,0 p 0,0,0 p 1,0,1 p 2,1,0 q 0,0,0 q 2,1,1 q 1,2,2 n0n0 n2n2 n1n1 GPC 0 GPC 2 n3n3 m3m3 D 3,3
20
March 14, 200820 ILP Formulation Connection rules Circuit I/Os Each circuit input should be connected to either a GPC or the final adder Each output rank should be derived k-times (K=3, final adder is a ternary adder) GPC I/Os Satisfying number of allowable I/Os considering input ranks Wires Satisfying rank constraints of source and destination of each wire
21
March 14, 200821 ILP Formulation ILP Improvement Using [ Parandeh-Afshar et. al, APSDAC 2008 ] heuristic for estimating maximum number of GPCs at each Level GPC on level L can only connect to inputs of GPCs on levels L+1 and L+2
22
March 14, 200822 Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion
23
March 14, 200823 Experimental Methodology CPLEX ILP Solver Altera Stratix-II 90nm CMOS Technology Implementations of multi-input addition Adder Tree – Ternary adder tree State of the art for FPGAs Heuristic – Mapping heuristic described in [13] ILP – ILP formulation described here
24
March 14, 200824 Experimental results (Delay) ILP on average is: 32% faster than Adder Tree 5% faster than the Heuristic
25
March 14, 200825 Experimental Results (Area) ILP on average consumes: 3% less resources than Adder Tree 13% less resources than Heuristic
26
March 14, 200826 Outline Motivation Generalized Parallel Counters ILP Formulation Experimental Results Conclusion
27
March 14, 200827 Conclusion Conventional wisdom has held that adder trees outperform compressor trees on FPGAs Ternary adder trees were a major selling point of the Altera Stratix II architecture Conventional wisdom is wrong! GPCs map nicely onto LUTs Compressor trees on FPGAs, are faster than adder trees when built from GPCs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.