CSC 4250 Computer Architectures September 5, 2006 Appendix H. Computer Arithmetic
Integers Using an 8-bit pattern, how many different integers can we represent? 2 8 Correct? How can we represent negative integers? Use a sign bit How many different integers are there?
Small Numbers How can we represent numbers that are smaller than 1?
Fixed Point Data Type Bit pattern: Binary point just to the right of sign bit (in text) Can be anywhere inside bit pattern Binary point to far right of bit pattern What do we get? Integer Value of bit pattern as integer: Value of bit patt. as fixed point:
Blocked Floating Point Fixed point → Low cost floating point Exponent kept in separate variable Exponent shared by a set of fixed point variables Applications? Overflow?
Saturating Arithmetic In DSP, if result is too large to be represented, it is set to the largest representable number with the appropriate sign What happens when we get a floating-point overflow?
IEEE 754 Floating-Point (32 bit) Value=(–1) s (1+f) 2 (e–127) Significand =1+f Compare: scientific notation, say 3.45 × 10 6 Base=2 So, 13 is represented by1.101 × 2 3 withs=0, e= , f=10100…00 Can you give one advantage of the implicit 1? Can you give one disadvantage of the implicit 1? … … se: exponentf: fraction 1 bit 8 bits 23 bits
IEEE 754 Floating-Point (64 bit) 1 bit 11 bits 52 bits Value=(–1) s (1+f) 2 (e–1023) Significand =1+f … … se : exponentf : fraction
Base IEEE:2 Other bases:16IBM 8Burroughs Normalize such that the first digit is nonzero For base 16, the first digit could be 1, 2, …, 14, 15 Need new symbols to represent 10,11, 12, 13, 14, 15 What is base 16 called?
Hexadecimal What is hexa? Greek for six What is decimal? Latin for ten What is Latin prefix for six? Sexa What is old name for base 16? Sexidecimal Which company changed the name?
Hexadecimal System For base 16, we have 1 = 1/16 × 16 1 withf = …00, 2= 2/16 × 16 1 withf = …00, 4 = 4/16 × 16 1 withf = …00, 8 = 8/16 × 16 1 withf = …00, 16= 1/16 × 16 2 withf = …00, Disadvantage:As many as three leading zero bits Advantage:Base larger → Range larger 32 bit FP rep.:1 sign bit, 7 exp. bits, 24 fraction bits (exponent bias = 64).
Floating Point Representation of 1 IEEE: s = 0, e = , f = 00000…00; Value=(1) 2 (127–127) = 1. IBM Hexadecimal: s = 0, e = , f = …00; Value=(1/16) 16 (65–64) = 1.
IEEE FP Representation of One to Sixteen Numbersef
Integer Comparisons of FP Number Ease to use Sorting Sign bit is most significant: Easy to check if number is greater than, less than, or equal to zero Exponent before fraction: Larger exponent → larger number Use bias such that exponent values ≥ 0 1 ≤ e ≤ 254 → –126 ≤ e – bias ≤ 127 e = 0 and e = 255?
Representation of Zero All bits (except sign bit) equal zero e = 0 → zero exponent field Zero fraction field (no implicit 1) Plus and minus zero
Denormal Numbers e = 0 → zero exponent field f ≠ 0 → no implicit 1 Value = (–1) s *f *2 –126 Gradual underflow
IEEE Representation Fill in the blanks Numbersef –1 2 –2 2 –125 2 –126 2 –127 2 –128 2 –129 2 –149
Underflow to Zero In old days, a FP number that underflowed could be set to zero An old “fast” test for equality: If a − b = 0, then a = b Test would fail if a − b underflowed
Infinity e = 11…1 → maximum exponent field Zero fraction field Plus and minus infinity 1/0 = ∞; 1+∞ = ∞ Useful in trigonometry: sin(tan –1 ∞) = sin π/2 = 1
NaN Not a Number e = 11…1 → maximum exponent field Nonzero fraction field Can you give an operation that generates NaN? What is 1+NaN?
Examples Largest positive number: s = 0, e = , f = 11111…11; Value=(1+2 –1 +…+2 –23 ) 2 (254–127) =(2−2 –23 ) ≈ ≈ 2.56×10 38 One: s = 0, e = , f = 00000…00; Value=(1) 2 (127–127) = 1. Smallest positive normal number: s = 0, e = , f = 00000…00; Value=(1) 2 (1–127) =2 –126 ≈ 1.6×10 –38
Examples (2) Largest positive denormal number: s = 0, e = , f = 11111…11; Value=(2 –1 +…+2 –23 ) 2 –126 =(1−2 –23 ) 2 –126 ≈ 2 –126 Differs from smallest normal by 2 –149 Smallest positive denormal number: s = 0, e = , f = 00000…01; Value=(2 –23 ) 2 –126 = 2 –149 ≈ 2×10 –45 Plus zero: s = 0, e = , f = 00000…00.