Floating Points & IEEE 754.

Floating Points & IEEE 754

Column Pattern What goes to the right of 1’s column? 23 8 22 4 21 2 20

2-3 = 1 2 3 = 1 8 = 0.125 Column Pattern Negative powers of two: 23 8
2-3 = = = 0.125 23 8 22 4 21 2 20 1 2-1 0.5 2-2 0.25 2-3 0.125 2-4 0.0625

10.112 = 2 + 0.5 + 0.25 = 2.7510 Number Binary decimal: 23 8 22 4 21 2
20 1 2-1 0.5 2-2 0.25 2-3 0.125 2-4 0.0625

Fixed Decimal Problems
Fixed decimal points waste space: 400,000,000,000,000, vs x 1017 vs x 10-14 In computers, space is precious Computers use a floating decimal point (Like scientific notation)

Floating Point 1 Bits used to represent 3 parts: Sign Exponent
Fraction (or Mantissa) Sign Exponent Mantissa 1

Sign 0 = positive, 1 = negative Sign Exponent Mantissa 1

Exponent 1 Binary integer in excess notation
Gives power of 2 to multiply by 100 = 0 So 20 Sign Exponent Mantissa 1 Binary Value 000 -4 001 -3 010 -2 011 -1 100 101 1 110 2 111 3

Mantissa 1 Fractional Value Always a decimal 1000 = 0.5 Sign Exponent
1 2-1 0.5 2-2 0.25 2-3 0.125 2-4 0.0625 1

IEEE 754 Standards for 32bit and 64 bit floats
32 bit : float or single 64 bit : double

The Range The floating point number range: +  -Normalized -Denorm
NaN NaN 0 +0

Sign Sign bit 1 = negative 0 = positive

Exponent Exponent X bits = 2x different values Stored as biased value
8 bits = 256 values = 0-255 Stored as biased value Bias value subtracted to get exponent IEEE 32-bit float Biased 127: Exponent = value - 127 Binary Value Exponent 1 -126 128 137 10 255

Mantissa Mantissa Normalized Assumed 1.XXXXX Value in range [1-2)
Binary Represents Value … … 1 … … 1.5 … … 1.25 … … 1.375

Special Patterns Exponent of all 1's/0's is special:

Mantissa If exponent 0 Mantissa is Denormalized Assumed 0.XXXXX
Value in range [0-1) Binary Represents Value … … … … 0.5 … … 0.25 … … 0.375

Special Patterns Exponent of all 1's/0's is special:

Issues Can't count on absolute precision:

Issues Small values closer than large values
Accuracy expressed in digits not decimal places 32 bit : 7-8 decimal digits 64 bit : digits

Issues Can't count on absolute precision:
Proper epsilon depends on magnitude of x

Issues No associativity or commutitive property in floating point math
a*(b*c) and a*b*c can give different results

Issues Errors compound with repeated calculations

Floating Point ARM VFP unit optional in lower end chips

Floating Point ARM Special registers for coprocessor
32 32-bi registers s0-s31 Pairs can be used as 64-bit registers

Floating Point Instructions
Can declare float/double data values VLDR to load VFP registers from address in ARM register

Moving and Converting VMOV can move bits from regular to VFP registers: Special instructions to: Convert float   word Convert float  double

x86 x86 processors had optional floating point coprocessor (x87)
All floating point functions stack based:

SSE SSE : Intel extension to Pentium chips
Added addressable registers for floating points

Integer vs FP Performance
Intel Haswell architecture:

Awesomeness Fast Inverse Square Root

Floating Points & IEEE 754.

Similar presentations

Presentation on theme: "Floating Points & IEEE 754."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Floating Points & IEEE 754.

Similar presentations

Presentation on theme: "Floating Points & IEEE 754."— Presentation transcript:

Similar presentations

About project

Feedback