Floating Points & IEEE 754
Column Pattern What goes to the right of 1’s column? 23 8 22 4 21 2 20
2-3 = 1 2 3 = 1 8 = 0.125 Column Pattern Negative powers of two: 23 8 2-3 = 1 2 3 = 1 8 = 0.125 23 8 22 4 21 2 20 1 2-1 0.5 2-2 0.25 2-3 0.125 2-4 0.0625
10.112 = 2 + 0.5 + 0.25 = 2.7510 Number Binary decimal: 23 8 22 4 21 2 20 1 2-1 0.5 2-2 0.25 2-3 0.125 2-4 0.0625
Fixed Decimal Problems Fixed decimal points waste space: 400,000,000,000,000,000 vs 4.0 x 1017 0.000000000000025 vs 2.5 x 10-14 In computers, space is precious Computers use a floating decimal point (Like scientific notation)
Floating Point 1 Bits used to represent 3 parts: Sign Exponent Fraction (or Mantissa) Sign Exponent Mantissa 1
Sign 0 = positive, 1 = negative Sign Exponent Mantissa 1
Exponent 1 Binary integer in excess notation Gives power of 2 to multiply by 100 = 0 So 20 Sign Exponent Mantissa 1 Binary Value 000 -4 001 -3 010 -2 011 -1 100 101 1 110 2 111 3
Mantissa 1 Fractional Value Always a decimal 1000 = 0.5 Sign Exponent 1 2-1 0.5 2-2 0.25 2-3 0.125 2-4 0.0625 1
IEEE 754 Standards for 32bit and 64 bit floats 32 bit : float or single 64 bit : double
The Range The floating point number range: + -Normalized -Denorm NaN NaN 0 +0
Sign Sign bit 1 = negative 0 = positive
Exponent Exponent X bits = 2x different values Stored as biased value 8 bits = 256 values = 0-255 Stored as biased value Bias value subtracted to get exponent IEEE 32-bit float Biased 127: Exponent = value - 127 Binary Value Exponent 0000 0001 1 -126 1000 0000 128 1000 1001 137 10 1111 1111 255
Mantissa Mantissa Normalized Assumed 1.XXXXX Value in range [1-2) Binary Represents Value 0000 0000 … 1 + 0 + 0 + 0… 1 1000 0000 … 1 + 0.5 + 0 + 0… 1.5 0100 0000 … 1 + 0 + 0.25 + 0… 1.25 0110 0000 … 1 + 0 + 0.25 + 0.125… 1.375
Special Patterns Exponent of all 1's/0's is special:
Mantissa If exponent 0 Mantissa is Denormalized Assumed 0.XXXXX Value in range [0-1) Binary Represents Value 0000 0000 … 0 + 0 + 0 + 0… 1000 0000 … 0 + 0.5 + 0 + 0… 0.5 0100 0000 … 0 + 0 + 0.25 + 0… 0.25 0110 0000 … 0 + 0 + 0.25 + 0.125… 0.375
Special Patterns Exponent of all 1's/0's is special:
Issues Can't count on absolute precision:
Issues Small values closer than large values Accuracy expressed in digits not decimal places 32 bit : 7-8 decimal digits 64 bit : 15-16 digits
Issues Can't count on absolute precision: Proper epsilon depends on magnitude of x
Issues No associativity or commutitive property in floating point math a*(b*c) and a*b*c can give different results
Issues Errors compound with repeated calculations
Floating Point ARM VFP unit optional in lower end chips
Floating Point ARM Special registers for coprocessor 32 32-bi registers s0-s31 Pairs can be used as 64-bit registers
Floating Point Instructions Can declare float/double data values VLDR to load VFP registers from address in ARM register
Moving and Converting VMOV can move bits from regular to VFP registers: Special instructions to: Convert float word Convert float double
x86 x86 processors had optional floating point coprocessor (x87) All floating point functions stack based:
SSE SSE : Intel extension to Pentium chips Added addressable registers for floating points
Integer vs FP Performance Intel Haswell architecture:
Integer vs FP Performance Intel Haswell architecture:
Awesomeness Fast Inverse Square Root http://en.wikipedia.org/wiki/Fast_inverse_square_root