CSE 8351 Computer Arithmetic Fall 2005 Instructors: Peter-Michael Seidel.

CSE 8351 Computer Arithmetic Fall 2005 Instructors: Peter-Michael Seidel

II. Simple Algorithms for Arithmetic Unit Design in Hardware Addition/Multiplication/ SRT Division/Square Root /Reciprocal Approximation

CSE 8351 Computer ArithmeticSeidel - Fall 2005 3 Arithmetic Unit Design in Hardware Input Interface: n-bit operands A, B Output Interface: n-bit result C Arithmetic Function: f: B n x B n  B n What is different from other (“non-arithmetic”) Hardware Units n n n Arithmetic Hardware Unit Operand AOperand B Result C

CSE 8351 Computer ArithmeticSeidel - Fall 2005 4 Arithmetic Unit Design in Hardware Specification Truth tables not feasible (2 2n x n entries) for n=64 more than a Gogool Complexity there are 2^(2 2n x n) different functions for n=64 more than a Gogoolplex Only a handful Interesting/Used out of a Gogoolplex !!! Other forms of specification possible n n n Arithmetic Hardware Unit Operand AOperand B Result C

CSE 8351 Computer ArithmeticSeidel - Fall 2005 5 Arithmetic functions supported Functions interesting because of specific properties that arise at operand level: -> use mathematic formalism to specify functionality, e.g. For defined values,, in N: = f(, ) = + -> does not directly help in implementing or testing -> use tools from the (well established) science of mathematics to transform equation and extract local properties -> help use of limited global influence, local computation, reuse, recurrence …

CSE 8351 Computer ArithmeticSeidel - Fall 2005 6 Notations, Representations, Values Bit strings: sequences of bits (concatenation also by (..,..,..) ) a = 0011010 = (001,10,10) For bit and natural numbers n : : string consisting of n copies of x Bits of strings are indexed from right (0) to left (n-1): or

CSE 8351 Computer ArithmeticSeidel - Fall 2005 7 Binary representation Natural number with binary representation : Range of numbers which have a binary representation of length n : n -bit binary representation of a natural number : with

CSE 8351 Computer ArithmeticSeidel - Fall 2005 8 Two’s complement representation Natural number with two’s complement representation : Range of numbers with two’s complement representation of length n : n -bit two’s complement representation of a natural number : with

CSE 8351 Computer ArithmeticSeidel - Fall 2005 9 Binary Addition Binary Addition (Specification): Coping with Complexity: Simple for n=1:

CSE 8351 Computer ArithmeticSeidel - Fall 2005 10 Addition (n=1) Half adder: adding two bits, sum represented by obvious equations: Full adder: adding three bits obvious equations:

CSE 8351 Computer ArithmeticSeidel - Fall 2005 11 Addition (n=1) Half adder & Full adder implementations:

CSE 8351 Computer ArithmeticSeidel - Fall 2005 12 Binary Addition Greedy Approach (right to left) -> Ripple Carry Adder Development/ Verification based on equivalence transform of Specification

CSE 8351 Computer ArithmeticSeidel - Fall 2005 13 Ripple Carry Adder

CSE 8351 Computer ArithmeticSeidel - Fall 2005 14 Basic properties (1) For : leading zeros do not change the value of a binary representation binary representations can be split for each two’s complement representations have a sign bit a[n-1] : construct two’s complement representation from binary representation: note, that two’s complement representation is longer by one bit

CSE 8351 Computer ArithmeticSeidel - Fall 2005 15 Basic properties(2) For : sign extension does not change the value negation of a number in two’s complement representation basis for subtraction algorithm ! congruencies modulo, :

CSE 8351 Computer ArithmeticSeidel - Fall 2005 16 Basic properties(3) Two’s complement addition based on binary addition: For : the result of the n -bit binary addition is useful for n -bit two’s complement addition:

CSE 8351 Computer ArithmeticSeidel - Fall 2005 25 Binary Addition Complexity: Delay, Cost, Power … Lower Bounds ? What computational model ? What assumptions ?

CSE 8351 Computer ArithmeticSeidel - Fall 2005 26 Faster Addition Challenge (KP95) numbers given as stacks of digits it takes 1 second to add two digits and put result digit on result stack one person can add two 5000 digit number in 5000 seconds ?! How? Can two people add two 5000 digit numbers in less than an hour? Observation (Notion of Carries) limited carry propagation -> pre-computing upper sums for all cases: c[k]=1 and c[k]=0 Divide and conquer, but also Ripple-carry approach is divide and conquer

CSE 8351 Computer ArithmeticSeidel - Fall 2005 27 Conditional Sum Adder Main observation: limited carry propagation -> pre-computing upper sums for all cases: c[k]=1 and c[k]=0 Assume n is power of 2:

CSE 8351 Computer ArithmeticSeidel - Fall 2005 28 Conditional Sum Adder Main principle: pre-computing upper sums for the cases: c[k]=1 and c[k]=0 Assume n is power of 2:

CSE 8351 Computer ArithmeticSeidel - Fall 2005 34 Conditional Sum Adder

CSE 8351 Computer ArithmeticSeidel - Fall 2005 35 Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!

CSE 8351 Computer ArithmeticSeidel - Fall 2005 46 Binary Multiplication Binary Multiplication (Specification): Remember Binary Addition (Specification): n n 2n Binary Multiplier Operand AOperand B Result P representable with 2n bits !! nlog 2 (2 2n -1)log 2 (2 2n -2 n+1 +1) 121 244 488 816 32

CSE 8351 Computer ArithmeticSeidel - Fall 2005 47 Implementation – to cope with Complexity Strategies that worked for binary addition: - consideration of small n - property extraction from specification - greedy approach - divide & conquer Strategies for binary multiplication: - consideration of small n - reduction approach - divide & conquer - reduction to binary addition - rewriting of specification - considering logarithms (…European logarithmic processor)

CSE 8351 Computer ArithmeticSeidel - Fall 2005 48 Consideration of small n Binary multiplication… …even simpler than Addition for n=1: This also works for n x 1 -bit multiplication: Consider = 2 n ?

CSE 8351 Computer ArithmeticSeidel - Fall 2005 49 Reduction Approach Reduction n -> n-1 (n-1)-bit multiplication (n-1)-bit AND & additions 1-bit AND & addition (carry-in) Implementation, Complexity ?

CSE 8351 Computer ArithmeticSeidel - Fall 2005 Multiplication Reduction – in Sums Definition Partial Products (simple to compute in binary) (not affected by remaining sum)

CSE 8351 Computer ArithmeticSeidel - Fall 2005 Implementations similar to grade school algorithm 0010 (multiplicand) __x_1011 (multiplier) 0010 0010 0000 0010 00010110 Negative numbers: convert and multiply –better technique: using Booth Recoding Binary Multiplication

CSE 8351 Computer ArithmeticSeidel - Fall 2005 52 Multiplier Implementation Stage i accumulates A * 2 i if B i == 1 What are the boxes ? How much hardware for n-bit multiplier? B0B0 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 B1B1 B2B2 B3B3 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 0000

CSE 8351 Computer ArithmeticSeidel - Fall 2005 53 Multiplication Complexity So far: Delay(n) = D AND + n D FA = O(n) Cost(n) = n 2 (C AND + C FA ) = O(n 2 ) Inherent problem: Adding n partial products (n-bit numbers) Addition (can be done in delay O(log(n)) )

CSE 8351 Computer ArithmeticSeidel - Fall 2005 54 Parallel Multiplication (PP adder tree) Partial Product Generations and Additions can be done in parallel PPG Binary Adder Binary Adder Binary Adder Binary Adder Operand A Operand B Product C Fanout? Precisions? Delay? Cost?

CSE 8351 Computer ArithmeticSeidel - Fall 2005 55 Redundant Adder Tree Redundant Addition (Carry-Save Adders): a3a3 a2a2 a1a1 a0a0 c3c3 c2c2 c1c1 c0c0 b3b3 b2b2 b1b1 b0b0 y3y3 y2y2 y1y1 y0y0 x3x3 x2x2 x1x1 x0x0 compression of 3 binary operands to 2 By the use of a line of full adders

CSE 8351 Computer ArithmeticSeidel - Fall 2005 56 Redundant Adder Tree Redundant Addition of 3 partial products to 2:

CSE 8351 Computer ArithmeticSeidel - Fall 2005 57 Redundant Adder Tree Redundant Addition of 4 partial products to 2:

CSE 8351 Computer ArithmeticSeidel - Fall 2005 58 Redundant Adder Tree Redundant Addition of 4 internal partial products to 2:

CSE 8351 Computer ArithmeticSeidel - Fall 2005 59 Redundant Adder Tree Tree structure of redundant compressors: Cost? Delay? See Wallace tree designsWallace tree

CSE 8351 Computer ArithmeticSeidel - Fall 2005 60 (Modified) Booth Recoding Operand recoding to: Allow for signed multiplication Reduce number of partial products Popular Recoding Choice based on:

CSE 8351 Computer ArithmeticSeidel - Fall 2005 61 (Modified) Booth Recoding

CSE 8351 Computer ArithmeticSeidel - Fall 2005 62 (Modified) Booth Recoding

CSE 8351 Computer ArithmeticSeidel - Fall 2005 63 (Modified) Booth Recoding Implementation of Recoding:

CSE 8351 Computer ArithmeticSeidel - Fall 2005 64 Recursive Multiplication What does Implementation require? Is it better than previous designs? Do improvements by Karatsuba (1962) in asymptotic complexity help ?

CSE 8351 Computer ArithmeticSeidel - Fall 2005 65 Division Multiplication specification For division:inputsoutput Not always a solution! Consider: so that remainder

CSE 8351 Computer ArithmeticSeidel - Fall 2005 66 Division Two simple approaches: Reduce to simpler operations –Subtractions –Multiplications

CSE 8351 Computer ArithmeticSeidel - Fall 2005 67 Subtractive Division Dividing Using Subtractions! Starting left or from right ? Considering ranges:

CSE 8351 Computer ArithmeticSeidel - Fall 2005 68 SRT division Consider: Choose largest k with : b[n-1:k+1] = ? Recurrence i radix-2: withq[i-1] = (1:0 ?) implies

CSE 8351 Computer ArithmeticSeidel - Fall 2005 69 SRT Division Recurrence: Implementation:

CSE 8351 Computer ArithmeticSeidel - Fall 2005 70 Multiplicative Division Approximation of A/B: Contemporary microprocessors implement: multiplicative Division with –Newton-Raphson’s Algorithm (e.g. INTEL IA-64) –Goldschmidt’s Algorithm (e.g. AMD K7) Steps in multiplicative Division: 1.Rough Approximation of 1/B 2.Iterative improvement of approximation accuracy of A/B or 1/B by Multiplications, Complementations Shifts 3.( Multiplication with A if 1/B was approximated in Step (2.) )

CSE 8351 Computer ArithmeticSeidel - Fall 2005 71 Newton’s Algorithm Newton-Raphson Approximation of 1/B: Initialization: with relative error: k iterations: Scaling with A: quadratic convergence of one sided relative approximation error Each iteration i: –requires two dependent multiplications –squares the relative approximation error

CSE 8351 Computer ArithmeticSeidel - Fall 2005 72 Goldschmidt’s Algorithm Goldschmidt’s Approximation of A/B: Initialization: Iteration i for : Approximation of A/B by after k iterationen Computation of like Newton iteration with B=1 => converges quadratically to 1 From initialization: => converges quadratically to A/B 2 independent multiplications per iteration

CSE 8351 Computer ArithmeticSeidel - Fall 2005 73 Multiplication Scheduling Newton-Raphson Goldschmidt-Powers For both: 2k+1 multiplications in total but: Newton: 2k+1 multiplications on critical path Goldschmidt: k+1 multiplications on critical path A B A B

CSE 8351 Computer ArithmeticSeidel - Fall 2005 74 Quadratic Convergence for exact computation Newton-Raphson: Goldschmidt-Powers: Iteration i relative approximation error for exact computation

CSE 8351 Computer ArithmeticSeidel - Fall 2005 75 Precision Problems for exact computations Example with a = bit-width( ) = 8, p = 64 Iteration i Goldschmidt-Powers (2 Mults with): 0 a x p 8 x 64 bits 1 (a+p) x (a+p) 72 x 72bits 2 2(a+p) x 2(a+p) 144 x 144bits 3 4(a+p) x 4(a+p) 288 x 288bits => Rounding of intermediate values required

CSE 8351 Computer ArithmeticSeidel - Fall 2005 76 Problems and State of the Art Newton Raphson is self correcting, –i.e. converges even with rounded intermediate results to 1/B Correction factor moves any rounded intermediate approximation in the direction 1/B. –rounding can even be chosen to maintain quadratic convergence, e.g. [Cook] Goldschmidt-Powers is not self correcting, –i.e. convergence to A/B is not granted with rounded intermediate results, because the following does not hold anymore –quadratic convergence can not be achieved with rounded intermediate results –Error analysis more complicated than for Newton-Raphson

CSE 8351 Computer ArithmeticSeidel - Fall 2005 77 R. Goldschmidt (1964): –Presentation of algorithm for exact computations –Implementation (IBM) with rough error analysis (absolute errors) E. Krishnamurthy (1970): –Goldschmidt’s Algorithm is NOT self correcting O. Spaniol in his Book “Computer Arithmetic” (1982): –claims that Goldschmidt’s Algorithm is self correcting R. Golliver, INTEL IA64 (1999): –INTEL implements Newton-Raphson for simpler error analysis (and for smaller multiplier) S. Oberman, AMD K7 (1999): –AMD uses 76x76 multiplier for Goldschmidt Division (68 bit), because mechanically checked correctness proof exists –Consideration of absolute errors Problems and State of the Art

CSE 8351 Computer ArithmeticSeidel - Fall 2005 78 Most work only considers variations of Newton-Raphson for –simpler error analysis –quadratic convergence –no interest in constant factors Practical implementations –constant acceleration through Goldschmidt’s Algorithm interesting Previous error analysis rough and limited to special cases General precise error analysis is important for cost, power and delay optimizations in practical implementations Problems and State of the Art

CSE 8351 Computer Arithmetic Fall 2005 Instructors: Peter-Michael Seidel.

Similar presentations

Presentation on theme: "CSE 8351 Computer Arithmetic Fall 2005 Instructors: Peter-Michael Seidel."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 8351 Computer Arithmetic Fall 2005 Instructors: Peter-Michael Seidel.

Similar presentations

Presentation on theme: "CSE 8351 Computer Arithmetic Fall 2005 Instructors: Peter-Michael Seidel."— Presentation transcript:

Similar presentations

About project

Feedback