Download presentation
Presentation is loading. Please wait.
Published byReynold McDaniel Modified over 9 years ago
1
Lecture 7: Multiplication and Floating Point EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr. Rozier (UM)
2
LAB 2
3
Lab Phases: Recursive Phase 1 – Factorial Phase 2 - Fibonacci
4
Lab Phases: Arrays Phase 4 – Sum Array Phase 5 – Find Item Phase 6 – Bubble Sort
5
Lab Phases: Trees Array representation: [1,2,3,4,5,6,7,0,0,0,0,0,0,0,0] Phase 7 – Tree Height Phase 8 – Tree Traversal [1,2,5,0,0,4,0,0,3,6,0,0,7,0,0] 1 2 3 4 5 6 7
6
INTEGER ARITHMETIC
7
Addition and Subtraction
8
Half adder Inputs Outputs ABCS 0000 1001 0101 1110
9
Full Adder
10
Inputs Outputs ABCinCoutS 00000 10001 01001 11010 00101 10110 01110 11111
11
In Groups, Implement a 1-bit Full Adder Inputs Outputs ABCinCoutS 00000 10001 01001 11010 00101 10110 01110 11111
12
Getting a Full Adder Half AdderFull Adder
13
Putting Together Multiple Bits
14
Making it Faster Carry Look Ahead Adder
15
Making it Even Faster Carry-Select Adder Kogge-Stone Adder
16
How do we get subtraction? XB2T(X)B2U(X) 00000 00011 00102 00113 01004 01015 01106 01117 –88 –79 –610 –511 –412 –313 –214 –115 1000 1001 1010 1011 1100 1101 1110 1111 0 1 2 3 4 5 6 7
17
How do we get subtraction? XB2T(X)B2U(X) 00000 00011 00102 00113 01004 01015 01106 01117 –88 –79 –610 –511 –412 –313 –214 –115 1000 1001 1010 1011 1100 1101 1110 1111 0 1 2 3 4 5 6 7 10010111 x 01101000 ~x+ 11111111
18
Multiplication Start with long-multiplication approach 1000 × 1001 1000 0000 1000 1001000 Length of product is the sum of operand lengths multiplicand multiplier product
19
Multiplication Hardware Initially 0
20
Optimized Multiplier Perform steps in parallel: add/shift One cycle per partial-product addition That’s ok, if frequency of multiplications is low
21
Faster Multiplier Uses multiple adders – Cost/performance tradeoff Can be pipelined Several multiplication performed in parallel
22
Multiplication Computing Exact Product of w-bit numbers x, y – Either signed or unsigned Ranges – Unsigned: 0 ≤ x * y ≤ (2 w – 1) 2 = 2 2w – 2 w+1 + 1 Up to 2w bits – Two’s complement min: x * y ≥ (–2 w–1 )*(2 w–1 –1) = –2 2w–2 + 2 w–1 Up to 2w–1 bits – Two’s complement max: x * y ≤ (–2 w–1 ) 2 = 2 2w–2 Up to 2w bits, but only for (TMin w ) 2 Maintaining Exact Results – Would need to keep expanding word size with each product computed – Done in software by “arbitrary precision” arithmetic packages
23
Unsigned Multiplication in C Standard Multiplication Function – Ignores high order w bits Implements Modular Arithmetic UMult w (u, v)=u · v mod 2 w u v * u · v True Product: 2*w bits Operands: w bits Discard w bits: w bits UMult w (u, v)
24
Code Security Example #2 SUN XDR library – Widely used library for transferring data between machines void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size); ele_src malloc(ele_cnt * ele_size)
25
XDR Code void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size) { /* * Allocate buffer for ele_cnt objects, each of ele_size bytes * and copy from locations designated by ele_src */ void *result = malloc(ele_cnt * ele_size); if (result == NULL) /* malloc failed */ return NULL; void *next = result; int i; for (i = 0; i < ele_cnt; i++) { /* Copy object i to destination */ memcpy(next, ele_src[i], ele_size); /* Move pointer to next memory region */ next += ele_size; } return result; }
26
XDR Vulnerability What if: – ele_cnt = 2 20 + 1 – ele_size = 4096 = 2 12 – Allocation= ?? How can I make this function secure? malloc(ele_cnt * ele_size)
27
Signed Multiplication in C Standard Multiplication Function – Ignores high order w bits – Some of which are different for signed vs. unsigned multiplication – Lower bits are the same u v * u · v True Product: 2*w bits Operands: w bits Discard w bits: w bits TMult w (u, v)
28
Power-of-2 Multiply with Shift Operation – u << k gives u * 2 k – Both signed and unsigned Examples – u << 3==u * 8 – u << 5 - u << 3==u * 24 – Most machines shift and add faster than multiply Compiler generates this code automatically 001000 u 2k2k * u · 2 k True Product: w+k bits Operands: w bits Discard k bits: w bits UMult w (u, 2 k ) k 000 TMult w (u, 2 k ) 000
29
Multiply on ARM MUL{ }{S} Rd, Rm, Rs Rd = Rm * Rs MLA{ }{S} Rd, Rm, Rs, Rn Rd = Rm * Rs + Rn
30
Division Check for 0 divisor Long division approach – If divisor ≤ dividend bits 1 bit in quotient, subtract – Otherwise 0 bit in quotient, bring down next dividend bit Restoring division – Do the subtract, and if remainder goes < 0, add divisor back Signed division – Divide using absolute values – Adjust sign of quotient and remainder as required 1001 1000 1001010 -1000 10 101 1010 -1000 10 n-bit operands yield n-bit quotient and remainder quotient dividend remainder divisor
31
Division Hardware Initially dividend Initially divisor in left half
32
Optimized Divider One cycle per partial-remainder subtraction Looks a lot like a multiplier! – Same hardware can be used for both
33
Faster Division Can’t use parallel hardware as in multiplier – Subtraction is conditional on sign of remainder Faster dividers (e.g. SRT devision) generate multiple quotient bits per step – Still require multiple steps
34
Division in ARM ARMv6 has no DIV instruction.
35
Division in ARM ARMv6 has no DIV instruction. N = D x Q + R with 0 <= |R| < |D| N/D = Q + R
36
An Algorithm for Division
39
FLOATING POINT
40
Carnegie Mellon Fractional binary numbers What is 1011.101 2 ?
41
2i2i 2 i-1 4 2 1 1/2 1/4 1/8 2 -j bibi b i-1 b2b2 b1b1 b0b0 b -1 b -2 b -3 b -j Carnegie Mellon Fractional Binary Numbers Representation – Bits to right of “binary point” represent fractional powers of 2 – Represents rational number:
42
Carnegie Mellon Fractional Binary Numbers: Examples ValueRepresentation 5 3/4101.11 2 2 7/8010.111 2 63/64001.0111 2 Observations Divide by 2 by shifting right Multiply by 2 by shifting left Numbers of form 0.111111… 2 are just below 1.0 1/2 + 1/4 + 1/8 + … + 1/2 i + … ➙ 1.0 Use notation 1.0 – ε
43
Carnegie Mellon Representable Numbers Limitation – Can only exactly represent numbers of the form x/2 k – Other rational numbers have repeating bit representations ValueRepresentation – 1/3 0.0101010101[01]… 2 – 1/5 0.001100110011[0011]… 2 – 1/10 0.0001100110011[0011]… 2
44
Floating Point Standard Defined by IEEE Std 754-1985 Developed in response to divergence of representations – Portability issues for scientific code Now almost universally adopted Two representations – Single precision (32-bit) – Double precision (64-bit)
45
IEEE Floating-Point Format S: sign bit (0 non-negative, 1 negative) Normalize significand: 1.0 ≤ |significand| < 2.0 – Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) – Significand is Fraction with the “1.” restored Exponent: excess representation: actual exponent + Bias – Ensures exponent is unsigned – Single: Bias = 127; Double: Bias = 1203 SExponentFraction single: 8 bits double: 11 bits single: 23 bits double: 52 bits
46
Floating-Point Addition Consider a 4-digit decimal example – 9.999 × 10 1 + 1.610 × 10 –1 1. Align decimal points – Shift number with smaller exponent – 9.999 × 10 1 + 0.016 × 10 1 2. Add significands – 9.999 × 10 1 + 0.016 × 10 1 = 10.015 × 10 1 3. Normalize result & check for over/underflow – 1.0015 × 10 2 4. Round and renormalize if necessary – 1.002 × 10 2
47
Floating-Point Addition Now consider a 4-digit binary example – 1.000 2 × 2 –1 + –1.110 2 × 2 –2 (0.5 + –0.4375) 1. Align binary points – Shift number with smaller exponent – 1.000 2 × 2 –1 + –0.111 2 × 2 –1 2. Add significands – 1.000 2 × 2 –1 + –0.111 2 × 2 – 1 = 0.001 2 × 2 –1 3. Normalize result & check for over/underflow – 1.000 2 × 2 –4, with no over/underflow 4. Round and renormalize if necessary – 1.000 2 × 2 –4 (no change) = 0.0625
48
FP Adder Hardware Much more complex than integer adder Doing it in one clock cycle would take too long – Much longer than integer operations – Slower clock would penalize all instructions FP adder usually takes several cycles – Can be pipelined
49
FP Adder Hardware Step 1 Step 2 Step 3 Step 4
50
FP Arithmetic Hardware FP multiplier is of similar complexity to FP adder – But uses a multiplier for significands instead of an adder FP arithmetic hardware usually does – Addition, subtraction, multiplication, division, reciprocal, square-root – FP integer conversion Operations usually takes several cycles – Can be pipelined
51
Floating Point Floating Point is handled by a FPU, floating point unit.
52
Pentium FDIV Bug Intel’s Pentium 5 – Professor Thomas Nicely noticed inconsistencies in calculations when adding Pentiums to his cluster – Floating-point division operations didn’t quite come out right. Off by 61 parts per million
53
Pentium FDIV Bug Intel acknowledged the flaw, but claimed it wasn’t serious. Wouldn’t affect most users. Byte magazine estimated only 1 in 9 billion floating point operations would suffer the error.
54
Pentium FDIV Bug Total cost to Intel? $450 million
55
WRAP UP
56
For next time Read Rest of Chapter 4.1-4.4 Midterm 1 Approaching! – February 13th
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.