Download presentation
Presentation is loading. Please wait.
Published byEmil Hood Modified over 8 years ago
1
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia
2
Design Options – so far ‘Structural Options’ 1.Bit serial Most Space efficient Slow One bit of result produced per cycle Sometimes this isn’t a problem Example Small efficient adder Very small multiplier
3
Serial Circuits Bit serial adder ENTITY serial_add IS PORT( a, b, clk : IN std_logic; sum, cout : OUT std_logic ); END ENTITY serial_add; ARCHITECTURE df OF serial_add IS SIGNAL cint : std_logic; BEGIN PROCESS( clk ) BEGIN IF clk’EVENT AND clk = ‘1’ THEN sum <= a XOR b XOR cint; cint <= (a AND b) OR (b AND cint) OR (a AND cint ); END IF; END PROCESS; cout <= cint; END ARCHITECTURE df; 2-bit register c out sum a b c in FA Note: The synthesizer will insert the latch on the internal signals! clock Note: Reset or clear needed to frame operands!
4
Design Options – so far ‘Structural Options’ 1.Bit serial Most Space efficient 2.Sequential Combinatorial / bit-parallel block + register Example Sequential multiplier – adder + shifter + register
6
Design Options – so far ‘Structural Options’ 1.Bit serial 2.Sequential 3.Pipelined High throughput High latency too though! Need to achieve pipeline balance Every stage should have similar propagation delay More later! Example Pipelined multiplier 4.Examine communication patterns Example Eliminate horizontal carry chains in parallel array multiplier
7
Design Options – so far ‘Structural Options’ 1.Bit serial 2.Sequential 3.Pipelined 4.Examine communication patterns Example Eliminate horizontal carry chains in parallel array multiplier
8
Multipliers We can add the partial products with FA blocks b0b0 b1b1 a0a0 a1a1 a2a2 a3a3 FA 0 p0p0 p1p1 b2b2 product bits Try to use a more efficient adder in each row? A simpler scheme uses a ‘carry save’ adder – which pushes the carry out’s down to the next row! Note that an extra adder is needed below the last row to add the last partial products and the carries from the row above! Carry select adder
9
Design Options – so far ‘Structural Options’ 1.Bit serial 2.Sequential 3.Pipelined 4.Examine communication patterns 5.Tree structures Example Combine carries in level below Wallace Tree multiplier
10
Signed digit arithmetic – Avoiding the carries! If we use more than one bit to represent each bit of an operand In binary, the partial products are trivial – if multiplier bit = 1, copy the multiplicand else 0 Use an ‘and’ gate!
11
Residue Arithmetic Residue Number Systems A verse by the Chinese scholar, Sun Tsu, over 1500 years ago posed this problem What number has remainders 2, 3 and 2 when divided by the numbers 7, 5 and 3, respectively? This is probably the first documented use of number representations using multiple residues In a residue number system, a number, x, is represented by the list of its residues (remainders) with respect to k relatively prime moduli, m k-1, m k-2, …, m 0 Thus x is represented by (x k-1, x k-2, …, x 0 ) where x i = x mod m i So the puzzle may be re-written What is the decimal representation of (2,3,2) in RNS(7,5,3)?
12
Residue Number Systems The dynamic range of a RNS, M = m k-1 m k-2 … m 0 For example, in the system RNS(8,7,5,3) M = 8 7 5 3 = 840 Thus we have Any RNS can be viewed as a weighted representation In RNS(8,7,5,3), the weights are: 105 120 336 280 Thus (1,2,4,0) represents (105 1 + 120 2 336 4 + 280 0) 840 = (1689) 840 = 9 DecimalRNS(8,7,5,3) 0 or 840 or -840 or …(0,0,0,0) 1 or 841 or -839 or …(1,1,1,1) 2 or 842 or …(2,2,2,2) 8 or 848 or …(0,1,3,2)
13
Residue Number Systems - Operations Complement To find –x, complement each of the digits with respect to the modulus for that digit 21 = (5,0,1,0) so -21 = (8-5,0,5-1,0) = (3,0,4,0) Addition or subtraction is performed on each digit ( 5, 5, 0, 2 ) RNS = 5 10 ( 7, 6, 4, 2 ) RNS = -1 10 ( (5+7)=4 8, (5+6)=4 7, 4, (2+2)=1 3 ) RNS = 4 10 ( 4, 4, 4, 1 ) RNS = 4 10 Multiplication is also achieved by operations on each digit ( 5, 5, 0, 2 ) RNS = 5 10 ( 7, 6, 4, 2 ) RNS = -1 10 ( (5x7)=3 8, (5x6)=2 7, 0, (2x2)=1 3 ) RNS = -5 10 ( 3, 2, 0, 1 ) RNS = -5 10
14
Residue Arithmetic - Advantages Parallel independent operations on small numbers of digits Significant speed ups Especially for multiplication! 4 bit x 4 bit multiplier (moduli up to 15) much simpler than 16 bit x 16 bit one Carries are strictly confined to small numbers of bits Each modulus is only a small number of bits Can be implemented in Look Up Tables (LUTs) 6 bit residues (moduli up to 63) 64 x 64 x 6 bits required (<4Kbytes)
15
Residue Arithmetic – Choosing the moduli Largest modulus determines the overall speed – Try to make it as small as possible Simple strategy Choose sequence of prime numbers until the dynamic range, M, becomes large enough eg Application requires a range of at least 10 5, ie M 10 5 For RNS(13,11,7,5,3,2), M = 30,300 Range is too low, so add one more modulus: RNS(17,13,11,7,5,3,2), M = 510,510 Now each modulus requires a separate circuit and our range is now ~5 times as large as needed, so remove 5 : RNS(17,13,11,7,3,2), M = 102,102 Six residues, requiring 5 + 4 + 4 + 3 + 2 + 1 = 19 bits The largest modulus (17 requiring 5 bits) determines the speed, so …
16
Residue Arithmetic – Choosing the moduli Application requires a range of at least 10 5, ie M 10 5 … RNS(17,13,11,7,3,2), M = 102,102 Six residues, requiring 5 + 4 + 4 + 3 + 2 + 1 = 19 bits The largest modulus ( 17 requiring 5 bits) determines the speed, so combine some of the smaller moduli (Remember the requirement is that they be relatively prime!) Try to produce the largest modulus using only 5 bits – Pair 2 and 13, 3 and 7 RNS(26,21,17, 11), M = 102,102 Four residues, requiring 5 + 5 + 5 + 4 = 19 bits (no improvement in total bit count, but 2 fewer ALUs!) Better …?
17
Residue Arithmetic – Choosing the moduli Application requires a range of at least 10 5, ie M 10 5 … RNS(26,21,17, 11), M = 102,102 Four residues, requiring 5 + 5 + 5 + 4 = 19 bits (no improvement in total bit count, but 2 fewer ALUs!) Include powers of smaller primes before primes, starting with RNS(3,2), M = 6 Note that 2 2 is smaller than the next prime, 5, so move to RNS(2 2,3), M = 12 (trying to minimize the size of the largest modulus) After including 5 and 7, note that 2 3 and 3 2 are smaller than 11: RNS(3 2,2 3,7,5), M = 2,520 Add 11 RNS(11,3 2,2 3,7,5), M = 27,720 Add 13 RNS(13,11,3 2,2 3,7,5), M = 360,360
18
Residue Arithmetic – Choosing the moduli Application requires a range of at least 10 5, ie M 10 5 … Add 13 RNS(13,11,3 2,2 3,7,5), M = 360,360 M is now 3 larger than needed, so replace 9 with 3, then combine 5 and 3 RNS(15,13,11,2 3,7), M = 360,360 5 moduli, 4 + 4 + 4 + 3 + 3 = 18 bits, largest modulus has 4 bits You can actually do somewhat better than this! Reference: B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs, Oxford University Press, 2000
19
Residue Numbers - Conversion Inputs and outputs will invariably be in standard binary or decimal representations, conversion to and from them is required Conversion from binary | decimal to RNS Problem: Given a number, y, find its residues wrt moduli, m i Divisions would be too time-consuming! Use this equality: (y k-1 y k-2 …y 1 y 0 ) 2 mi = 2 k-1 y k-1 mi + … + 2y 1 mi + y 0 mi mi So we only need to precompute the residues 2 j mi for each of the moduli, m i, used by the RNS
20
Residue Numbers - Conversion 2 j 3 2 j 5 2 j 7 2 j j 11110 22221 14442 23183 112164 224325 141646 2321287 1142568 2215129 For RNS(8,7,5,3) : 8 is trivially calculated (3 LSB bits) For 7, 5 and 3, we need the powers of 2 modulus 7, 5 and 3
21
Residue Numbers - Conversion 2 j 3 2 j 5 2 j 7 2 j j 11110 22221 14442 23183 112164 224325 141646 2321287 1142568 2215129 Find 164 10 = 1010 0100 2 = 2 7 + 2 5 + 2 2 in RNS(8,7,5,3) : 8 is 100 2 = 4 10 Note that the additions are done in a modular adder! Worst case: k additions for each residue for a k -bit number 7 = 7 = 7 = 3
22
Residue Numbers - Conversion
23
Residue Arithmetic - Disadvantages Range is limited Division is hard! Comparison, sign (<0?) are hard Still suitable for some DSP applications Only use +, x Range is limited Result range is known Examples: digital filters, Fourier transforms
24
Multipliers ‘Long’ multiplication a 3 a 2 a 1 a 0 b 3 b 2 b 1 b 0 x x x x x x x x x In binary, the partial products are trivial – if multiplier bit = 1, copy the multiplicand else 0 Use an ‘and’ gate! b0b0 b1b1 b2b2 b3b3 a0a0 b0b0 a1a1 a2a2 a3a3 first row of partial products
25
Multipliers We can add the partial products with FA blocks b0b0 b1b1 a0a0 a1a1 a2a2 a3a3 FA 0 p0p0 p1p1 b2b2 product bits
26
Parallel Array Adder We can build this adder in VHDL with two GENERATE loops FOR j IN 0 TO n-1 GENERATE -- For each row FOR j IN 0 TO n-1 GENERATE –- Generate a row pjk : full_adder PORT MAP( … ); END GENERATE; This part is straight-forward! SIGNAL pa, pb, cout : ARRAY( 0 TO n-1 ) OF ARRAY( 0 TO n-1 ) OF std_logic; … but you need to fill in the PORT MAP using internal signals!
27
Multipliers We can add the partial products with FA blocks b0b0 b1b1 a0a0 a1a1 a2a2 a3a3 FA 0 p0p0 p1p1 b2b2 product bits Optimization 1: Replace this row of FAs Time? What’s the worst case propagation delay?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.