Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 8351 Computer Arithmetic Fall 2005 Instructors: Peter-Michael Seidel.

Similar presentations


Presentation on theme: "CSE 8351 Computer Arithmetic Fall 2005 Instructors: Peter-Michael Seidel."— Presentation transcript:

1 CSE 8351 Computer Arithmetic Fall 2005 Instructors: Peter-Michael Seidel

2 II. Simple Algorithms for Arithmetic Unit Design in Hardware Addition/Multiplication/ SRT Division/Square Root /Reciprocal Approximation

3 CSE 8351 Computer ArithmeticSeidel - Fall 2005 3 Arithmetic Unit Design in Hardware Input Interface: n-bit operands A, B Output Interface: n-bit result C Arithmetic Function: f: B n x B n  B n What is different from other (“non-arithmetic”) Hardware Units n n n Arithmetic Hardware Unit Operand AOperand B Result C

4 CSE 8351 Computer ArithmeticSeidel - Fall 2005 4 Arithmetic Unit Design in Hardware Specification Truth tables not feasible (2 2n x n entries) for n=64 more than a Gogool Complexity there are 2^(2 2n x n) different functions for n=64 more than a Gogoolplex Only a handful Interesting/Used out of a Gogoolplex !!! Other forms of specification possible n n n Arithmetic Hardware Unit Operand AOperand B Result C

5 CSE 8351 Computer ArithmeticSeidel - Fall 2005 5 Arithmetic functions supported Functions interesting because of specific properties that arise at operand level: -> use mathematic formalism to specify functionality, e.g. For defined values,, in N: = f(, ) = + -> does not directly help in implementing or testing -> use tools from the (well established) science of mathematics to transform equation and extract local properties -> help use of limited global influence, local computation, reuse, recurrence …

6 CSE 8351 Computer ArithmeticSeidel - Fall 2005 6 Notations, Representations, Values Bit strings: sequences of bits (concatenation also by (..,..,..) ) a = 0011010 = (001,10,10) For bit and natural numbers n : : string consisting of n copies of x Bits of strings are indexed from right (0) to left (n-1): or

7 CSE 8351 Computer ArithmeticSeidel - Fall 2005 7 Binary representation Natural number with binary representation : Range of numbers which have a binary representation of length n : n -bit binary representation of a natural number : with

8 CSE 8351 Computer ArithmeticSeidel - Fall 2005 8 Two’s complement representation Natural number with two’s complement representation : Range of numbers with two’s complement representation of length n : n -bit two’s complement representation of a natural number : with

9 CSE 8351 Computer ArithmeticSeidel - Fall 2005 9 Binary Addition Binary Addition (Specification): Coping with Complexity: Simple for n=1:

10 CSE 8351 Computer ArithmeticSeidel - Fall 2005 10 Addition (n=1) Half adder: adding two bits, sum represented by obvious equations: Full adder: adding three bits obvious equations:

11 CSE 8351 Computer ArithmeticSeidel - Fall 2005 11 Addition (n=1) Half adder & Full adder implementations:

12 CSE 8351 Computer ArithmeticSeidel - Fall 2005 12 Binary Addition Greedy Approach (right to left) -> Ripple Carry Adder Development/ Verification based on equivalence transform of Specification

13 CSE 8351 Computer ArithmeticSeidel - Fall 2005 13 Ripple Carry Adder

14 CSE 8351 Computer ArithmeticSeidel - Fall 2005 14 Basic properties (1) For : leading zeros do not change the value of a binary representation binary representations can be split for each two’s complement representations have a sign bit a[n-1] : construct two’s complement representation from binary representation: note, that two’s complement representation is longer by one bit

15 CSE 8351 Computer ArithmeticSeidel - Fall 2005 15 Basic properties(2) For : sign extension does not change the value negation of a number in two’s complement representation basis for subtraction algorithm ! congruencies modulo, :

16 CSE 8351 Computer ArithmeticSeidel - Fall 2005 16 Basic properties(3) Two’s complement addition based on binary addition: For : the result of the n -bit binary addition is useful for n -bit two’s complement addition:

17 CSE 8351 Computer ArithmeticSeidel - Fall 2005 17 Ripple Carry Adder

18 CSE 8351 Computer ArithmeticSeidel - Fall 2005 18 Ripple Carry Adder

19 CSE 8351 Computer ArithmeticSeidel - Fall 2005 19 Ripple Carry Adder

20 CSE 8351 Computer ArithmeticSeidel - Fall 2005 20 Ripple Carry Adder

21 CSE 8351 Computer ArithmeticSeidel - Fall 2005 21 Ripple Carry Adder

22 CSE 8351 Computer ArithmeticSeidel - Fall 2005 22 Ripple Carry Adder

23 CSE 8351 Computer ArithmeticSeidel - Fall 2005 23 Ripple Carry Adder

24 CSE 8351 Computer ArithmeticSeidel - Fall 2005 24 Ripple Carry Adder

25 CSE 8351 Computer ArithmeticSeidel - Fall 2005 25 Binary Addition Complexity: Delay, Cost, Power … Lower Bounds ? What computational model ? What assumptions ?

26 CSE 8351 Computer ArithmeticSeidel - Fall 2005 26 Faster Addition Challenge (KP95) numbers given as stacks of digits it takes 1 second to add two digits and put result digit on result stack one person can add two 5000 digit number in 5000 seconds ?! How? Can two people add two 5000 digit numbers in less than an hour? Observation (Notion of Carries) limited carry propagation -> pre-computing upper sums for all cases: c[k]=1 and c[k]=0 Divide and conquer, but also Ripple-carry approach is divide and conquer

27 CSE 8351 Computer ArithmeticSeidel - Fall 2005 27 Conditional Sum Adder Main observation: limited carry propagation -> pre-computing upper sums for all cases: c[k]=1 and c[k]=0 Assume n is power of 2:

28 CSE 8351 Computer ArithmeticSeidel - Fall 2005 28 Conditional Sum Adder Main principle: pre-computing upper sums for the cases: c[k]=1 and c[k]=0 Assume n is power of 2:

29 CSE 8351 Computer ArithmeticSeidel - Fall 2005 29 Conditional Sum Adder Main principle: pre-computing upper sums for the cases: c[k]=1 and c[k]=0 Assume n is power of 2:

30 CSE 8351 Computer ArithmeticSeidel - Fall 2005 30 Conditional Sum Adder Main principle: pre-computing upper sums for the cases: c[k]=1 and c[k]=0 Assume n is power of 2:

31 CSE 8351 Computer ArithmeticSeidel - Fall 2005 31 Conditional Sum Adder Main principle: pre-computing upper sums for the cases: c[k]=1 and c[k]=0 Assume n is power of 2:

32 CSE 8351 Computer ArithmeticSeidel - Fall 2005 32 Conditional Sum Adder Main principle: pre-computing upper sums for the cases: c[k]=1 and c[k]=0 Assume n is power of 2:

33 CSE 8351 Computer ArithmeticSeidel - Fall 2005 33 Conditional Sum Adder Main principle: pre-computing upper sums for the cases: c[k]=1 and c[k]=0 Assume n is power of 2:

34 CSE 8351 Computer ArithmeticSeidel - Fall 2005 34 Conditional Sum Adder

35 CSE 8351 Computer ArithmeticSeidel - Fall 2005 35 Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!

36 CSE 8351 Computer ArithmeticSeidel - Fall 2005 36 Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!

37 CSE 8351 Computer ArithmeticSeidel - Fall 2005 37 Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!

38 CSE 8351 Computer ArithmeticSeidel - Fall 2005 38 Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!

39 CSE 8351 Computer ArithmeticSeidel - Fall 2005 39 Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!

40 CSE 8351 Computer ArithmeticSeidel - Fall 2005 40 Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!

41 CSE 8351 Computer ArithmeticSeidel - Fall 2005 41 Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!

42 CSE 8351 Computer ArithmeticSeidel - Fall 2005 42 Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!

43 CSE 8351 Computer ArithmeticSeidel - Fall 2005 43 Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!

44 CSE 8351 Computer ArithmeticSeidel - Fall 2005 44 Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!

45 CSE 8351 Computer ArithmeticSeidel - Fall 2005 45 Conditional Sum Adder full adder implements adder for n=1: CSA(1) = FA !!!!

46 CSE 8351 Computer ArithmeticSeidel - Fall 2005 46 Binary Multiplication Binary Multiplication (Specification): Remember Binary Addition (Specification): n n 2n Binary Multiplier Operand AOperand B Result P representable with 2n bits !! nlog 2 (2 2n -1)log 2 (2 2n -2 n+1 +1) 121 244 488 816 32

47 CSE 8351 Computer ArithmeticSeidel - Fall 2005 47 Implementation – to cope with Complexity Strategies that worked for binary addition: - consideration of small n - property extraction from specification - greedy approach - divide & conquer Strategies for binary multiplication: - consideration of small n - reduction approach - divide & conquer - reduction to binary addition - rewriting of specification - considering logarithms (…European logarithmic processor)

48 CSE 8351 Computer ArithmeticSeidel - Fall 2005 48 Consideration of small n Binary multiplication… …even simpler than Addition for n=1: This also works for n x 1 -bit multiplication: Consider = 2 n ?

49 CSE 8351 Computer ArithmeticSeidel - Fall 2005 49 Reduction Approach Reduction n -> n-1 (n-1)-bit multiplication (n-1)-bit AND & additions 1-bit AND & addition (carry-in) Implementation, Complexity ?

50 CSE 8351 Computer ArithmeticSeidel - Fall 2005 Multiplication Reduction – in Sums Definition Partial Products (simple to compute in binary) (not affected by remaining sum)

51 CSE 8351 Computer ArithmeticSeidel - Fall 2005 Implementations similar to grade school algorithm 0010 (multiplicand) __x_1011 (multiplier) 0010 0010 0000 0010 00010110 Negative numbers: convert and multiply –better technique: using Booth Recoding Binary Multiplication

52 CSE 8351 Computer ArithmeticSeidel - Fall 2005 52 Multiplier Implementation Stage i accumulates A * 2 i if B i == 1 What are the boxes ? How much hardware for n-bit multiplier? B0B0 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 B1B1 B2B2 B3B3 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 0000

53 CSE 8351 Computer ArithmeticSeidel - Fall 2005 53 Multiplication Complexity So far: Delay(n) = D AND + n D FA = O(n) Cost(n) = n 2 (C AND + C FA ) = O(n 2 ) Inherent problem: Adding n partial products (n-bit numbers) Addition (can be done in delay O(log(n)) )

54 CSE 8351 Computer ArithmeticSeidel - Fall 2005 54 Parallel Multiplication (PP adder tree) Partial Product Generations and Additions can be done in parallel PPG Binary Adder Binary Adder Binary Adder Binary Adder Operand A Operand B Product C Fanout? Precisions? Delay? Cost?

55 CSE 8351 Computer ArithmeticSeidel - Fall 2005 55 Redundant Adder Tree Redundant Addition (Carry-Save Adders): a3a3 a2a2 a1a1 a0a0 c3c3 c2c2 c1c1 c0c0 b3b3 b2b2 b1b1 b0b0 y3y3 y2y2 y1y1 y0y0 x3x3 x2x2 x1x1 x0x0 compression of 3 binary operands to 2 By the use of a line of full adders

56 CSE 8351 Computer ArithmeticSeidel - Fall 2005 56 Redundant Adder Tree Redundant Addition of 3 partial products to 2:

57 CSE 8351 Computer ArithmeticSeidel - Fall 2005 57 Redundant Adder Tree Redundant Addition of 4 partial products to 2:

58 CSE 8351 Computer ArithmeticSeidel - Fall 2005 58 Redundant Adder Tree Redundant Addition of 4 internal partial products to 2:

59 CSE 8351 Computer ArithmeticSeidel - Fall 2005 59 Redundant Adder Tree Tree structure of redundant compressors: Cost? Delay? See Wallace tree designsWallace tree

60 CSE 8351 Computer ArithmeticSeidel - Fall 2005 60 (Modified) Booth Recoding Operand recoding to: Allow for signed multiplication Reduce number of partial products Popular Recoding Choice based on:

61 CSE 8351 Computer ArithmeticSeidel - Fall 2005 61 (Modified) Booth Recoding

62 CSE 8351 Computer ArithmeticSeidel - Fall 2005 62 (Modified) Booth Recoding

63 CSE 8351 Computer ArithmeticSeidel - Fall 2005 63 (Modified) Booth Recoding Implementation of Recoding:

64 CSE 8351 Computer ArithmeticSeidel - Fall 2005 64 Recursive Multiplication What does Implementation require? Is it better than previous designs? Do improvements by Karatsuba (1962) in asymptotic complexity help ?

65 CSE 8351 Computer ArithmeticSeidel - Fall 2005 65 Division Multiplication specification For division:inputsoutput Not always a solution! Consider: so that remainder

66 CSE 8351 Computer ArithmeticSeidel - Fall 2005 66 Division Two simple approaches: Reduce to simpler operations –Subtractions –Multiplications

67 CSE 8351 Computer ArithmeticSeidel - Fall 2005 67 Subtractive Division Dividing Using Subtractions! Starting left or from right ? Considering ranges:

68 CSE 8351 Computer ArithmeticSeidel - Fall 2005 68 SRT division Consider: Choose largest k with : b[n-1:k+1] = ? Recurrence i radix-2: withq[i-1] = (1:0 ?) implies

69 CSE 8351 Computer ArithmeticSeidel - Fall 2005 69 SRT Division Recurrence: Implementation:

70 CSE 8351 Computer ArithmeticSeidel - Fall 2005 70 Multiplicative Division Approximation of A/B: Contemporary microprocessors implement: multiplicative Division with –Newton-Raphson’s Algorithm (e.g. INTEL IA-64) –Goldschmidt’s Algorithm (e.g. AMD K7) Steps in multiplicative Division: 1.Rough Approximation of 1/B 2.Iterative improvement of approximation accuracy of A/B or 1/B by Multiplications, Complementations Shifts 3.( Multiplication with A if 1/B was approximated in Step (2.) )

71 CSE 8351 Computer ArithmeticSeidel - Fall 2005 71 Newton’s Algorithm Newton-Raphson Approximation of 1/B: Initialization: with relative error: k iterations: Scaling with A: quadratic convergence of one sided relative approximation error Each iteration i: –requires two dependent multiplications –squares the relative approximation error

72 CSE 8351 Computer ArithmeticSeidel - Fall 2005 72 Goldschmidt’s Algorithm Goldschmidt’s Approximation of A/B: Initialization: Iteration i for : Approximation of A/B by after k iterationen Computation of like Newton iteration with B=1 => converges quadratically to 1 From initialization: => converges quadratically to A/B 2 independent multiplications per iteration

73 CSE 8351 Computer ArithmeticSeidel - Fall 2005 73 Multiplication Scheduling Newton-Raphson Goldschmidt-Powers For both: 2k+1 multiplications in total but: Newton: 2k+1 multiplications on critical path Goldschmidt: k+1 multiplications on critical path A B A B

74 CSE 8351 Computer ArithmeticSeidel - Fall 2005 74 Quadratic Convergence for exact computation Newton-Raphson: Goldschmidt-Powers: Iteration i relative approximation error for exact computation

75 CSE 8351 Computer ArithmeticSeidel - Fall 2005 75 Precision Problems for exact computations Example with a = bit-width( ) = 8, p = 64 Iteration i Goldschmidt-Powers (2 Mults with): 0 a x p 8 x 64 bits 1 (a+p) x (a+p) 72 x 72bits 2 2(a+p) x 2(a+p) 144 x 144bits 3 4(a+p) x 4(a+p) 288 x 288bits => Rounding of intermediate values required

76 CSE 8351 Computer ArithmeticSeidel - Fall 2005 76 Problems and State of the Art Newton Raphson is self correcting, –i.e. converges even with rounded intermediate results to 1/B Correction factor moves any rounded intermediate approximation in the direction 1/B. –rounding can even be chosen to maintain quadratic convergence, e.g. [Cook] Goldschmidt-Powers is not self correcting, –i.e. convergence to A/B is not granted with rounded intermediate results, because the following does not hold anymore –quadratic convergence can not be achieved with rounded intermediate results –Error analysis more complicated than for Newton-Raphson

77 CSE 8351 Computer ArithmeticSeidel - Fall 2005 77 R. Goldschmidt (1964): –Presentation of algorithm for exact computations –Implementation (IBM) with rough error analysis (absolute errors) E. Krishnamurthy (1970): –Goldschmidt’s Algorithm is NOT self correcting O. Spaniol in his Book “Computer Arithmetic” (1982): –claims that Goldschmidt’s Algorithm is self correcting R. Golliver, INTEL IA64 (1999): –INTEL implements Newton-Raphson for simpler error analysis (and for smaller multiplier) S. Oberman, AMD K7 (1999): –AMD uses 76x76 multiplier for Goldschmidt Division (68 bit), because mechanically checked correctness proof exists –Consideration of absolute errors Problems and State of the Art

78 CSE 8351 Computer ArithmeticSeidel - Fall 2005 78 Most work only considers variations of Newton-Raphson for –simpler error analysis –quadratic convergence –no interest in constant factors Practical implementations –constant acceleration through Goldschmidt’s Algorithm interesting Previous error analysis rough and limited to special cases General precise error analysis is important for cost, power and delay optimizations in practical implementations Problems and State of the Art


Download ppt "CSE 8351 Computer Arithmetic Fall 2005 Instructors: Peter-Michael Seidel."

Similar presentations


Ads by Google