CSE 8351 Computer Arithmetic Fall 2005 Instructors: Peter-Michael Seidel
II. Simple Algorithms for Arithmetic Unit Design in Hardware Addition/Multiplication/ SRT Division/Square Root /Reciprocal Approximation
CSE 8351 Computer ArithmeticSeidel - Fall Arithmetic Unit Design in Hardware Input Interface: n-bit operands A, B Output Interface: n-bit result C Arithmetic Function: f: B n x B n B n What is different from other (“non-arithmetic”) Hardware Units n n n Arithmetic Hardware Unit Operand AOperand B Result C
CSE 8351 Computer ArithmeticSeidel - Fall Arithmetic Unit Design in Hardware Specification Truth tables not feasible (2 2n x n entries) for n=64 more than a Gogool Complexity there are 2^(2 2n x n) different functions for n=64 more than a Gogoolplex Only a handful Interesting/Used out of a Gogoolplex !!! Other forms of specification possible n n n Arithmetic Hardware Unit Operand AOperand B Result C
CSE 8351 Computer ArithmeticSeidel - Fall Arithmetic functions supported Functions interesting because of specific properties that arise at operand level: -> use mathematic formalism to specify functionality, e.g. For defined values,, in N: = f(, ) = + -> does not directly help in implementing or testing -> use tools from the (well established) science of mathematics to transform equation and extract local properties -> help use of limited global influence, local computation, reuse, recurrence …
CSE 8351 Computer ArithmeticSeidel - Fall Notations, Representations, Values Bit strings: sequences of bits (concatenation also by (..,..,..) ) a = = (001,10,10) For bit and natural numbers n : : string consisting of n copies of x Bits of strings are indexed from right (0) to left (n-1): or
CSE 8351 Computer ArithmeticSeidel - Fall Binary representation Natural number with binary representation : Range of numbers which have a binary representation of length n : n -bit binary representation of a natural number : with
CSE 8351 Computer ArithmeticSeidel - Fall Two’s complement representation Natural number with two’s complement representation : Range of numbers with two’s complement representation of length n : n -bit two’s complement representation of a natural number : with
CSE 8351 Computer ArithmeticSeidel - Fall Binary Addition Binary Addition (Specification): Coping with Complexity: Simple for n=1:
CSE 8351 Computer ArithmeticSeidel - Fall Addition (n=1) Half adder: adding two bits, sum represented by obvious equations: Full adder: adding three bits obvious equations:
CSE 8351 Computer ArithmeticSeidel - Fall Addition (n=1) Half adder & Full adder implementations:
CSE 8351 Computer ArithmeticSeidel - Fall Binary Addition Greedy Approach (right to left) -> Ripple Carry Adder Development/ Verification based on equivalence transform of Specification
CSE 8351 Computer ArithmeticSeidel - Fall Ripple Carry Adder
CSE 8351 Computer ArithmeticSeidel - Fall Basic properties (1) For : leading zeros do not change the value of a binary representation binary representations can be split for each two’s complement representations have a sign bit a[n-1] : construct two’s complement representation from binary representation: note, that two’s complement representation is longer by one bit
CSE 8351 Computer ArithmeticSeidel - Fall Basic properties(2) For : sign extension does not change the value negation of a number in two’s complement representation basis for subtraction algorithm ! congruencies modulo, :
CSE 8351 Computer ArithmeticSeidel - Fall Basic properties(3) Two’s complement addition based on binary addition: For : the result of the n -bit binary addition is useful for n -bit two’s complement addition:
CSE 8351 Computer ArithmeticSeidel - Fall Ripple Carry Adder
CSE 8351 Computer ArithmeticSeidel - Fall Ripple Carry Adder
CSE 8351 Computer ArithmeticSeidel - Fall Ripple Carry Adder
CSE 8351 Computer ArithmeticSeidel - Fall Ripple Carry Adder
CSE 8351 Computer ArithmeticSeidel - Fall Ripple Carry Adder
CSE 8351 Computer ArithmeticSeidel - Fall Ripple Carry Adder
CSE 8351 Computer ArithmeticSeidel - Fall Ripple Carry Adder
CSE 8351 Computer ArithmeticSeidel - Fall Ripple Carry Adder
CSE 8351 Computer ArithmeticSeidel - Fall Binary Addition Complexity: Delay, Cost, Power … Lower Bounds ? What computational model ? What assumptions ?
CSE 8351 Computer ArithmeticSeidel - Fall Faster Addition Challenge (KP95) numbers given as stacks of digits it takes 1 second to add two digits and put result digit on result stack one person can add two 5000 digit number in 5000 seconds ?! How? Can two people add two 5000 digit numbers in less than an hour? Observation (Notion of Carries) limited carry propagation -> pre-computing upper sums for all cases: c[k]=1 and c[k]=0 Divide and conquer, but also Ripple-carry approach is divide and conquer
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder Main observation: limited carry propagation -> pre-computing upper sums for all cases: c[k]=1 and c[k]=0 Assume n is power of 2:
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder Main principle: pre-computing upper sums for the cases: c[k]=1 and c[k]=0 Assume n is power of 2:
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder Main principle: pre-computing upper sums for the cases: c[k]=1 and c[k]=0 Assume n is power of 2:
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder Main principle: pre-computing upper sums for the cases: c[k]=1 and c[k]=0 Assume n is power of 2:
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder Main principle: pre-computing upper sums for the cases: c[k]=1 and c[k]=0 Assume n is power of 2:
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder Main principle: pre-computing upper sums for the cases: c[k]=1 and c[k]=0 Assume n is power of 2:
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder Main principle: pre-computing upper sums for the cases: c[k]=1 and c[k]=0 Assume n is power of 2:
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!
CSE 8351 Computer ArithmeticSeidel - Fall Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!
CSE 8351 Computer ArithmeticSeidel - Fall Binary Multiplication Binary Multiplication (Specification): Remember Binary Addition (Specification): n n 2n Binary Multiplier Operand AOperand B Result P representable with 2n bits !! nlog 2 (2 2n -1)log 2 (2 2n -2 n+1 +1)
CSE 8351 Computer ArithmeticSeidel - Fall Implementation – to cope with Complexity Strategies that worked for binary addition: - consideration of small n - property extraction from specification - greedy approach - divide & conquer Strategies for binary multiplication: - consideration of small n - reduction approach - divide & conquer - reduction to binary addition - rewriting of specification - considering logarithms (…European logarithmic processor)
CSE 8351 Computer ArithmeticSeidel - Fall Consideration of small n Binary multiplication… …even simpler than Addition for n=1: This also works for n x 1 -bit multiplication: Consider = 2 n ?
CSE 8351 Computer ArithmeticSeidel - Fall Reduction Approach Reduction n -> n-1 (n-1)-bit multiplication (n-1)-bit AND & additions 1-bit AND & addition (carry-in) Implementation, Complexity ?
CSE 8351 Computer ArithmeticSeidel - Fall 2005 Multiplication Reduction – in Sums Definition Partial Products (simple to compute in binary) (not affected by remaining sum)
CSE 8351 Computer ArithmeticSeidel - Fall 2005 Implementations similar to grade school algorithm 0010 (multiplicand) __x_1011 (multiplier) Negative numbers: convert and multiply –better technique: using Booth Recoding Binary Multiplication
CSE 8351 Computer ArithmeticSeidel - Fall Multiplier Implementation Stage i accumulates A * 2 i if B i == 1 What are the boxes ? How much hardware for n-bit multiplier? B0B0 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 B1B1 B2B2 B3B3 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 0000
CSE 8351 Computer ArithmeticSeidel - Fall Multiplication Complexity So far: Delay(n) = D AND + n D FA = O(n) Cost(n) = n 2 (C AND + C FA ) = O(n 2 ) Inherent problem: Adding n partial products (n-bit numbers) Addition (can be done in delay O(log(n)) )
CSE 8351 Computer ArithmeticSeidel - Fall Parallel Multiplication (PP adder tree) Partial Product Generations and Additions can be done in parallel PPG Binary Adder Binary Adder Binary Adder Binary Adder Operand A Operand B Product C Fanout? Precisions? Delay? Cost?
CSE 8351 Computer ArithmeticSeidel - Fall Redundant Adder Tree Redundant Addition (Carry-Save Adders): a3a3 a2a2 a1a1 a0a0 c3c3 c2c2 c1c1 c0c0 b3b3 b2b2 b1b1 b0b0 y3y3 y2y2 y1y1 y0y0 x3x3 x2x2 x1x1 x0x0 compression of 3 binary operands to 2 By the use of a line of full adders
CSE 8351 Computer ArithmeticSeidel - Fall Redundant Adder Tree Redundant Addition of 3 partial products to 2:
CSE 8351 Computer ArithmeticSeidel - Fall Redundant Adder Tree Redundant Addition of 4 partial products to 2:
CSE 8351 Computer ArithmeticSeidel - Fall Redundant Adder Tree Redundant Addition of 4 internal partial products to 2:
CSE 8351 Computer ArithmeticSeidel - Fall Redundant Adder Tree Tree structure of redundant compressors: Cost? Delay? See Wallace tree designsWallace tree
CSE 8351 Computer ArithmeticSeidel - Fall (Modified) Booth Recoding Operand recoding to: Allow for signed multiplication Reduce number of partial products Popular Recoding Choice based on:
CSE 8351 Computer ArithmeticSeidel - Fall (Modified) Booth Recoding
CSE 8351 Computer ArithmeticSeidel - Fall (Modified) Booth Recoding
CSE 8351 Computer ArithmeticSeidel - Fall (Modified) Booth Recoding Implementation of Recoding:
CSE 8351 Computer ArithmeticSeidel - Fall Recursive Multiplication What does Implementation require? Is it better than previous designs? Do improvements by Karatsuba (1962) in asymptotic complexity help ?
CSE 8351 Computer ArithmeticSeidel - Fall Division Multiplication specification For division:inputsoutput Not always a solution! Consider: so that remainder
CSE 8351 Computer ArithmeticSeidel - Fall Division Two simple approaches: Reduce to simpler operations –Subtractions –Multiplications
CSE 8351 Computer ArithmeticSeidel - Fall Subtractive Division Dividing Using Subtractions! Starting left or from right ? Considering ranges:
CSE 8351 Computer ArithmeticSeidel - Fall SRT division Consider: Choose largest k with : b[n-1:k+1] = ? Recurrence i radix-2: withq[i-1] = (1:0 ?) implies
CSE 8351 Computer ArithmeticSeidel - Fall SRT Division Recurrence: Implementation:
CSE 8351 Computer ArithmeticSeidel - Fall Multiplicative Division Approximation of A/B: Contemporary microprocessors implement: multiplicative Division with –Newton-Raphson’s Algorithm (e.g. INTEL IA-64) –Goldschmidt’s Algorithm (e.g. AMD K7) Steps in multiplicative Division: 1.Rough Approximation of 1/B 2.Iterative improvement of approximation accuracy of A/B or 1/B by Multiplications, Complementations Shifts 3.( Multiplication with A if 1/B was approximated in Step (2.) )
CSE 8351 Computer ArithmeticSeidel - Fall Newton’s Algorithm Newton-Raphson Approximation of 1/B: Initialization: with relative error: k iterations: Scaling with A: quadratic convergence of one sided relative approximation error Each iteration i: –requires two dependent multiplications –squares the relative approximation error
CSE 8351 Computer ArithmeticSeidel - Fall Goldschmidt’s Algorithm Goldschmidt’s Approximation of A/B: Initialization: Iteration i for : Approximation of A/B by after k iterationen Computation of like Newton iteration with B=1 => converges quadratically to 1 From initialization: => converges quadratically to A/B 2 independent multiplications per iteration
CSE 8351 Computer ArithmeticSeidel - Fall Multiplication Scheduling Newton-Raphson Goldschmidt-Powers For both: 2k+1 multiplications in total but: Newton: 2k+1 multiplications on critical path Goldschmidt: k+1 multiplications on critical path A B A B
CSE 8351 Computer ArithmeticSeidel - Fall Quadratic Convergence for exact computation Newton-Raphson: Goldschmidt-Powers: Iteration i relative approximation error for exact computation
CSE 8351 Computer ArithmeticSeidel - Fall Precision Problems for exact computations Example with a = bit-width( ) = 8, p = 64 Iteration i Goldschmidt-Powers (2 Mults with): 0 a x p 8 x 64 bits 1 (a+p) x (a+p) 72 x 72bits 2 2(a+p) x 2(a+p) 144 x 144bits 3 4(a+p) x 4(a+p) 288 x 288bits => Rounding of intermediate values required
CSE 8351 Computer ArithmeticSeidel - Fall Problems and State of the Art Newton Raphson is self correcting, –i.e. converges even with rounded intermediate results to 1/B Correction factor moves any rounded intermediate approximation in the direction 1/B. –rounding can even be chosen to maintain quadratic convergence, e.g. [Cook] Goldschmidt-Powers is not self correcting, –i.e. convergence to A/B is not granted with rounded intermediate results, because the following does not hold anymore –quadratic convergence can not be achieved with rounded intermediate results –Error analysis more complicated than for Newton-Raphson
CSE 8351 Computer ArithmeticSeidel - Fall R. Goldschmidt (1964): –Presentation of algorithm for exact computations –Implementation (IBM) with rough error analysis (absolute errors) E. Krishnamurthy (1970): –Goldschmidt’s Algorithm is NOT self correcting O. Spaniol in his Book “Computer Arithmetic” (1982): –claims that Goldschmidt’s Algorithm is self correcting R. Golliver, INTEL IA64 (1999): –INTEL implements Newton-Raphson for simpler error analysis (and for smaller multiplier) S. Oberman, AMD K7 (1999): –AMD uses 76x76 multiplier for Goldschmidt Division (68 bit), because mechanically checked correctness proof exists –Consideration of absolute errors Problems and State of the Art
CSE 8351 Computer ArithmeticSeidel - Fall Most work only considers variations of Newton-Raphson for –simpler error analysis –quadratic convergence –no interest in constant factors Practical implementations –constant acceleration through Goldschmidt’s Algorithm interesting Previous error analysis rough and limited to special cases General precise error analysis is important for cost, power and delay optimizations in practical implementations Problems and State of the Art