IEEE floating point format

IEEE floating point format
V1.0 (22/10/2005) IEEE floating point format Most computers use a standard format known as the IEEE floating-point format defined by IEEE standard for binary floating point arithmetic.

Single Precision The IEEE single precision floating point standard representation requires a 32 bit word S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF Sign Exponent Fraction A sign bit of 0 indicates a positive number, and a 1 is negative The exponent field is represented by "excess 127 notation“ The 23 fraction bits actually represent 24 bits of precision, as a leading 1 in front of the decimal point is implied (hidden bit). In 32 bit IEEE format, 1 bit is allocated as the sign bit, the next 8 bits are allocated as the exponent field, and the last 23 bits are the fractional parts of the normalized number.

The value may be determined as:
E > 0 and E < 255 V = (-1)^S*1.F* 2^(E-127) S=sign bit E = exponent in excess 127 representation F = fractional part in binary notation "1.F" is intended to represent the binary number created by prefixing F with an implicit leading 1 and a binary point hidden bit = (-1)S (1.F )2E-127

There are some exceptions

E= 255: Specials: Not a Number, +Infinity, -Infinity
F <> 0, then V=NaN ("Not a number"), Overflow, error..... F =0 and S is 1, then V=-Infinity F =0 and S is 0, then V=Infinity

E= 0: Denormals; F=0 and S is 1, then V=-0 F=0 and S is 0, then V=0
F <>0, then V = denormalized, tiny number, smaller than smallest allowed “denormalized” values V = (-1)^S * 2 ^ (-126) * (0.F) = (-1)S (0.F )2-126

The range As exponent field 0 and 255 is reserved, the range would be restricted to to 2 127

Example

Finding the IEEE 32 bit floating point representation :
For the decimal number : STEPS: Convert to binary: = Convert to normalized binary scientific notation (Shift number to the form of 1.F* 2^E): = x23 Use only the fractional part (remove the “1.” preceding the fractional part: hidden bit) F = Add 127 (excess 127 code) to exponent field, convert to binary: = 130 = E = Determine the sign bit: Negative number, set to 1; otherwise set to S = 1 Assemble the 32 bits (S & E & F): V =

Finding the IEEE 32 bit floating point representation :
For the decimal number 9+97/128 : STEPS: Convert to binary: 9+97/128 = Convert to normalized binary scientific notation (Shift number to the form of 1.F* 2^E): = x23 Use only the fractional part (remove the “1.” preceding the fractional part: hidden bit) F = Add 127 (excess 127 code) to exponent field, convert to binary: = 130 = E = Determine the sign bit: Negative number, set to 1; otherwise set to S = 0 Assemble the 32 bits (S & E & F): V =

IEEE floating point format example
= 0 = -0 = Infinity = -Infinity = NaN = NaN = +1*2^( )*1.0 = 2 = +1 *2^( )*1.101 = 6.5 = -1 *2^( )*1.101 = -6.5 = +1 *2^(1-127)*1.0 = 2**(-126) = +1 *2^(-126)*0.1 = 2**(-127) = +1 *2^(-126)* = 2^(-149) =2-149 (Smallest positive value)

Precision Fixed-point representations: the number of digits before and after the decimal point is set Floating point: there is no fixed number of digits before and after the decimal point; that is, the decimal point can float Floating-point representations are slower and less accurate than fixed-point representations, but they can handle a larger range of numbers As computers are integer machines, complex codes are used to represent real numbers

Floating-point numbers are just approximations
small discrepancies in the approximations can return meaningless results One of the challenges in programming with floating-point values is ensuring that the approximations lead to reasonable results

Precision: the number of bits used to hold the fractional part
Floating-point numbers are often classified as single precision or double precision floating-point number that has more precision than a single-precision number requires more digits to the right of the decimal point Precision: the number of bits used to hold the fractional part The more the precision, the more exactly it can represent fractional quantities A double-precision number uses twice as many bits as a single-precision value, so it can represent fractional quantities much more exactly For example, if a single-precision number requires 32 bits, its double-precision counterpart will be 64 bits long. The extra bits increase not only the precision but also the range of magnitudes that can be represented The exact amount by which the precision and range of magnitudes are increased depends on what format the program is using to represent floating-point values The term double precision is something of a misnomer because the precision is not really double

Double Precision The IEEE double precision floating point standard representation requires a 64 bit word S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF Sign Exponent Fraction A sign bit of 0 indicates a positive number, and a 1 is negative The exponent field is represented by "excess 1023 notation“ The 52 fraction bits actually represent 53 bits of precision, as a leading 1 in front of the decimal point is implied (hidden bit). In 32 bit IEEE format, 1 bit is allocated as the sign bit, the next 8 bits are allocated as the exponent field, and the last 23 bits are the fractional parts of the normalized number.

The value may be determined as follows:
E > 0 and E < V = (-1)^S*1.F* 2^(E-1023) S=sign bit E = exponent in excess 1023 representation F = fractional part in binary notation "1.F" is intended to represent the binary number created by prefixing F with an implicit leading 1 and a binary point hidden bit = (-1)S (1.F )2E-1023

There are some exceptions

E= 2047: Specials: Not a Number, +Infinity, -Infinity
F <> 0, then V=NaN ("Not a number"), Overflow, error..... F=0 and S is 1, then V=-Infinity F =0 and S is 0, then V=Infinity

E= 0: Denormals; F=0 and S is 1, then V=-0 F=0 and S is 0, then V=0
F <>0, then V = denormalized, tiny number, smaller than smallest allowed “denormalized” values V = (-1)^S * 2 ^ (-1022) * (0.F) = (-1)S (0.F )2-1022

The range As exponent field 0 and 2047 is reserved, the range would be restricted to to

Reference: http://www.psc.edu/general/software/packages/ieee/ieee.html
Mathematical and statistical software packages installed on PSC machines Distributed Computing Utilities software packages and libraries installed on PSC machines.

IEEE floating point format

Similar presentations

Presentation on theme: "IEEE floating point format"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IEEE floating point format

Similar presentations

Presentation on theme: "IEEE floating point format"— Presentation transcript:

Similar presentations

About project

Feedback