Lecture 12: Integer Arithmetic and Floating Point CS 2011 Fall 2014, Dr. Rozier
FULL ADDER SOLUTIONS
INTEGER ARITHMETIC
Putting Together Multiple Bits
Making it Faster Carry Look Ahead Adder
Making it Even Faster Carry-Select Adder Kogge-Stone Adder
How do we get subtraction? XB2T(X)B2U(X) –88 –79 –610 –511 –412 –313 –214 –
How do we get subtraction? XB2T(X)B2U(X) –88 –79 –610 –511 –412 –313 –214 – x ~x
FLOATING POINT
Carnegie Mellon Fractional binary numbers What is ?
2i2i 2 i /2 1/4 1/8 2 -j bibi b i-1 b2b2 b1b1 b0b0 b -1 b -2 b -3 b -j Carnegie Mellon Fractional Binary Numbers Representation – Bits to right of “binary point” represent fractional powers of 2 – Represents rational number:
Carnegie Mellon Fractional Binary Numbers: Examples ValueRepresentation 5 3/ / / Observations Divide by 2 by shifting right Multiply by 2 by shifting left Numbers of form … 2 are just below 1.0 1/2 + 1/4 + 1/8 + … + 1/2 i + … ➙ 1.0 Use notation 1.0 – ε
Carnegie Mellon Representable Numbers Limitation – Can only exactly represent numbers of the form x/2 k – Other rational numbers have repeating bit representations ValueRepresentation – 1/ [01]… 2 – 1/ [0011]… 2 – 1/ [0011]… 2
Floating Point Standard Defined by IEEE Std Developed in response to divergence of representations – Portability issues for scientific code Now almost universally adopted Two representations – Single precision (32-bit) – Double precision (64-bit)
IEEE Floating-Point Format S: sign bit (0 non-negative, 1 negative) Normalize significand: 1.0 ≤ |significand| < 2.0 – Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) – Significand is Fraction with the “1.” restored Exponent: excess representation: actual exponent + Bias – Ensures exponent is unsigned – Single: Bias = 127; Double: Bias = 1203 SExponentFraction single: 8 bits double: 11 bits single: 23 bits double: 52 bits
Floating-Point Addition Consider a 4-digit decimal example – × × 10 –1 1. Align decimal points – Shift number with smaller exponent – × × Add significands – × × 10 1 = × Normalize result & check for over/underflow – × Round and renormalize if necessary – × 10 2
Floating-Point Addition Now consider a 4-digit binary example – × 2 –1 + – × 2 –2 (0.5 + –0.4375) 1. Align binary points – Shift number with smaller exponent – × 2 –1 + – × 2 –1 2. Add significands – × 2 –1 + – × 2 –1 = × 2 –1 3. Normalize result & check for over/underflow – × 2 –4, with no over/underflow 4. Round and renormalize if necessary – × 2 –4 (no change) =
FP Adder Hardware Much more complex than integer adder Doing it in one clock cycle would take too long – Much longer than integer operations – Slower clock would penalize all instructions FP adder usually takes several cycles – Can be pipelined
FP Adder Hardware Step 1 Step 2 Step 3 Step 4
FP Arithmetic Hardware FP multiplier is of similar complexity to FP adder – But uses a multiplier for significands instead of an adder FP arithmetic hardware usually does – Addition, subtraction, multiplication, division, reciprocal, square-root – FP integer conversion Operations usually takes several cycles – Can be pipelined
Floating Point Floating Point is handled by a FPU, floating point unit.
Pentium FDIV Bug Intel’s Pentium 5 – Professor Thomas Nicely noticed inconsistencies in calculations when adding Pentiums to his cluster – Floating-point division operations didn’t quite come out right. Off by 61 parts per million
Pentium FDIV Bug Intel acknowledged the flaw, but claimed it wasn’t serious. Wouldn’t affect most users. Byte magazine estimated only 1 in 9 billion floating point operations would suffer the error.
Pentium FDIV Bug Total cost to Intel? $450 million
WRAP UP
For next time Read Chapter
For next time Read Chapter 3 Sections 3.1 – 3.5