Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 7: Multiplication and Floating Point EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering.

Similar presentations


Presentation on theme: "Lecture 7: Multiplication and Floating Point EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering."— Presentation transcript:

1 Lecture 7: Multiplication and Floating Point EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr. Rozier (UM)

2 LAB 2

3 Lab Phases: Recursive Phase 1 – Factorial Phase 2 - Fibonacci

4 Lab Phases: Arrays Phase 4 – Sum Array Phase 5 – Find Item Phase 6 – Bubble Sort

5 Lab Phases: Trees Array representation: [1,2,3,4,5,6,7,0,0,0,0,0,0,0,0] Phase 7 – Tree Height Phase 8 – Tree Traversal [1,2,5,0,0,4,0,0,3,6,0,0,7,0,0] 1 2 3 4 5 6 7

6 INTEGER ARITHMETIC

7 Addition and Subtraction

8 Half adder Inputs Outputs ABCS 0000 1001 0101 1110

9 Full Adder

10 Inputs Outputs ABCinCoutS 00000 10001 01001 11010 00101 10110 01110 11111

11 In Groups, Implement a 1-bit Full Adder Inputs Outputs ABCinCoutS 00000 10001 01001 11010 00101 10110 01110 11111

12 Getting a Full Adder Half AdderFull Adder

13 Putting Together Multiple Bits

14 Making it Faster Carry Look Ahead Adder

15 Making it Even Faster Carry-Select Adder Kogge-Stone Adder

16 How do we get subtraction? XB2T(X)B2U(X) 00000 00011 00102 00113 01004 01015 01106 01117 –88 –79 –610 –511 –412 –313 –214 –115 1000 1001 1010 1011 1100 1101 1110 1111 0 1 2 3 4 5 6 7

17 How do we get subtraction? XB2T(X)B2U(X) 00000 00011 00102 00113 01004 01015 01106 01117 –88 –79 –610 –511 –412 –313 –214 –115 1000 1001 1010 1011 1100 1101 1110 1111 0 1 2 3 4 5 6 7 10010111 x 01101000 ~x+ 11111111

18 Multiplication Start with long-multiplication approach 1000 × 1001 1000 0000 1000 1001000 Length of product is the sum of operand lengths multiplicand multiplier product

19 Multiplication Hardware Initially 0

20 Optimized Multiplier Perform steps in parallel: add/shift One cycle per partial-product addition That’s ok, if frequency of multiplications is low

21 Faster Multiplier Uses multiple adders – Cost/performance tradeoff Can be pipelined Several multiplication performed in parallel

22 Multiplication Computing Exact Product of w-bit numbers x, y – Either signed or unsigned Ranges – Unsigned: 0 ≤ x * y ≤ (2 w – 1) 2 = 2 2w – 2 w+1 + 1 Up to 2w bits – Two’s complement min: x * y ≥ (–2 w–1 )*(2 w–1 –1) = –2 2w–2 + 2 w–1 Up to 2w–1 bits – Two’s complement max: x * y ≤ (–2 w–1 ) 2 = 2 2w–2 Up to 2w bits, but only for (TMin w ) 2 Maintaining Exact Results – Would need to keep expanding word size with each product computed – Done in software by “arbitrary precision” arithmetic packages

23 Unsigned Multiplication in C Standard Multiplication Function – Ignores high order w bits Implements Modular Arithmetic UMult w (u, v)=u · v mod 2 w u v * u · v True Product: 2*w bits Operands: w bits Discard w bits: w bits UMult w (u, v)

24 Code Security Example #2 SUN XDR library – Widely used library for transferring data between machines void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size); ele_src malloc(ele_cnt * ele_size)

25 XDR Code void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size) { /* * Allocate buffer for ele_cnt objects, each of ele_size bytes * and copy from locations designated by ele_src */ void *result = malloc(ele_cnt * ele_size); if (result == NULL) /* malloc failed */ return NULL; void *next = result; int i; for (i = 0; i < ele_cnt; i++) { /* Copy object i to destination */ memcpy(next, ele_src[i], ele_size); /* Move pointer to next memory region */ next += ele_size; } return result; }

26 XDR Vulnerability What if: – ele_cnt = 2 20 + 1 – ele_size = 4096 = 2 12 – Allocation= ?? How can I make this function secure? malloc(ele_cnt * ele_size)

27 Signed Multiplication in C Standard Multiplication Function – Ignores high order w bits – Some of which are different for signed vs. unsigned multiplication – Lower bits are the same u v * u · v True Product: 2*w bits Operands: w bits Discard w bits: w bits TMult w (u, v)

28 Power-of-2 Multiply with Shift Operation – u << k gives u * 2 k – Both signed and unsigned Examples – u << 3==u * 8 – u << 5 - u << 3==u * 24 – Most machines shift and add faster than multiply Compiler generates this code automatically 001000 u 2k2k * u · 2 k True Product: w+k bits Operands: w bits Discard k bits: w bits UMult w (u, 2 k ) k 000 TMult w (u, 2 k ) 000

29 Multiply on ARM MUL{ }{S} Rd, Rm, Rs Rd = Rm * Rs MLA{ }{S} Rd, Rm, Rs, Rn Rd = Rm * Rs + Rn

30 Division Check for 0 divisor Long division approach – If divisor ≤ dividend bits 1 bit in quotient, subtract – Otherwise 0 bit in quotient, bring down next dividend bit Restoring division – Do the subtract, and if remainder goes < 0, add divisor back Signed division – Divide using absolute values – Adjust sign of quotient and remainder as required 1001 1000 1001010 -1000 10 101 1010 -1000 10 n-bit operands yield n-bit quotient and remainder quotient dividend remainder divisor

31 Division Hardware Initially dividend Initially divisor in left half

32 Optimized Divider One cycle per partial-remainder subtraction Looks a lot like a multiplier! – Same hardware can be used for both

33 Faster Division Can’t use parallel hardware as in multiplier – Subtraction is conditional on sign of remainder Faster dividers (e.g. SRT devision) generate multiple quotient bits per step – Still require multiple steps

34 Division in ARM ARMv6 has no DIV instruction.

35 Division in ARM ARMv6 has no DIV instruction. N = D x Q + R with 0 <= |R| < |D| N/D = Q + R

36 An Algorithm for Division

37

38

39 FLOATING POINT

40 Carnegie Mellon Fractional binary numbers What is 1011.101 2 ?

41 2i2i 2 i-1 4 2 1 1/2 1/4 1/8 2 -j bibi b i-1 b2b2 b1b1 b0b0 b -1 b -2 b -3 b -j Carnegie Mellon Fractional Binary Numbers Representation – Bits to right of “binary point” represent fractional powers of 2 – Represents rational number:

42 Carnegie Mellon Fractional Binary Numbers: Examples ValueRepresentation 5 3/4101.11 2 2 7/8010.111 2 63/64001.0111 2 Observations  Divide by 2 by shifting right  Multiply by 2 by shifting left  Numbers of form 0.111111… 2 are just below 1.0  1/2 + 1/4 + 1/8 + … + 1/2 i + … ➙ 1.0  Use notation 1.0 – ε

43 Carnegie Mellon Representable Numbers Limitation – Can only exactly represent numbers of the form x/2 k – Other rational numbers have repeating bit representations ValueRepresentation – 1/3 0.0101010101[01]… 2 – 1/5 0.001100110011[0011]… 2 – 1/10 0.0001100110011[0011]… 2

44 Floating Point Standard Defined by IEEE Std 754-1985 Developed in response to divergence of representations – Portability issues for scientific code Now almost universally adopted Two representations – Single precision (32-bit) – Double precision (64-bit)

45 IEEE Floating-Point Format S: sign bit (0  non-negative, 1  negative) Normalize significand: 1.0 ≤ |significand| < 2.0 – Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) – Significand is Fraction with the “1.” restored Exponent: excess representation: actual exponent + Bias – Ensures exponent is unsigned – Single: Bias = 127; Double: Bias = 1203 SExponentFraction single: 8 bits double: 11 bits single: 23 bits double: 52 bits

46 Floating-Point Addition Consider a 4-digit decimal example – 9.999 × 10 1 + 1.610 × 10 –1 1. Align decimal points – Shift number with smaller exponent – 9.999 × 10 1 + 0.016 × 10 1 2. Add significands – 9.999 × 10 1 + 0.016 × 10 1 = 10.015 × 10 1 3. Normalize result & check for over/underflow – 1.0015 × 10 2 4. Round and renormalize if necessary – 1.002 × 10 2

47 Floating-Point Addition Now consider a 4-digit binary example – 1.000 2 × 2 –1 + –1.110 2 × 2 –2 (0.5 + –0.4375) 1. Align binary points – Shift number with smaller exponent – 1.000 2 × 2 –1 + –0.111 2 × 2 –1 2. Add significands – 1.000 2 × 2 –1 + –0.111 2 × 2 – 1 = 0.001 2 × 2 –1 3. Normalize result & check for over/underflow – 1.000 2 × 2 –4, with no over/underflow 4. Round and renormalize if necessary – 1.000 2 × 2 –4 (no change) = 0.0625

48 FP Adder Hardware Much more complex than integer adder Doing it in one clock cycle would take too long – Much longer than integer operations – Slower clock would penalize all instructions FP adder usually takes several cycles – Can be pipelined

49 FP Adder Hardware Step 1 Step 2 Step 3 Step 4

50 FP Arithmetic Hardware FP multiplier is of similar complexity to FP adder – But uses a multiplier for significands instead of an adder FP arithmetic hardware usually does – Addition, subtraction, multiplication, division, reciprocal, square-root – FP  integer conversion Operations usually takes several cycles – Can be pipelined

51 Floating Point Floating Point is handled by a FPU, floating point unit.

52 Pentium FDIV Bug Intel’s Pentium 5 – Professor Thomas Nicely noticed inconsistencies in calculations when adding Pentiums to his cluster – Floating-point division operations didn’t quite come out right. Off by 61 parts per million

53 Pentium FDIV Bug Intel acknowledged the flaw, but claimed it wasn’t serious. Wouldn’t affect most users. Byte magazine estimated only 1 in 9 billion floating point operations would suffer the error.

54 Pentium FDIV Bug Total cost to Intel? $450 million

55 WRAP UP

56 For next time Read Rest of Chapter 4.1-4.4 Midterm 1 Approaching! – February 13th


Download ppt "Lecture 7: Multiplication and Floating Point EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering."

Similar presentations


Ads by Google