Floating Point Representations

Name: Floating Point Representations
Uploaded: 2017-12-23T01:01:18+00:00
Duration: PTM44S36
Channel: Shannon Harvey
Description: Floating Point Representations

Floating Point Representations
Updated: 06/03/2010 Floating Point Representations

Decimal Floating Point Numbers
1.5 Decimal fractions ¾ -> 0.75 1/100 -> 0.01

Scientific Notation How can we more compactly represent these values?
1,000,000,000 -> 1.0 x 10^9 -> 2.5 x 10^-5 What are the two parts of a scientific notation value called? Mantissa and Exponent Normalize mantissa so it has one digit left of the decimal point

Floating point numbers
Floating point numbers such as 7.519, -0.01, and 4.3x108 are represented using the IEEE 754 standard format Floating point is represented using a mantissa and exponent Example: 7.51x25 The mantissa is 7.51 The exponent is > Note: exponent is a power of 2 A set number of bits is assigned to represent the mantissa and exponent 1 8 bits 23 bits mantissa exponent sign bit 32 bit single precision 1 11 bits 52 bits mantissa exponent sign bit 64 bit double precision

Rounding Not every floating point value can be represented exactly in binary using a finite number of bits Question: What are some examples? 1/3 = … PI = 3.141…. In these cases, must round to the nearest number that can be represented If a number is halfway between two possible representable values, then round to the one whose least-significant digit is even

Examples of Rounding Round each of these numbers to two significant digits. > 1.3 Choose 1.3 since is nearer to 1.3 than 1.4 > 79 Choose 79 since it’s nearer than 78 > is halfway between 12 and 13 Choose 12 since its least significant digit is even > is halfway between 13 and 14 Choose 14 since its least significant digit is even

Fractional binary numbers
Fractional binary numbers use the familiar decimal place-value representation, but with a base of 2 instead of 10 Example: 11.101b = 1x21 + 1x20 + 1x x x2-3 = / / /8 = = 3.625

Exercises Convert this binary fraction into decimal
Answer: 5 + 0*1/2 + 1*1/4 + 1*1/8 = 5.375 Express the decimal value 6.5 as a binary fraction Answer: 110.1 Express the decimal value as a binary fraction Answer:

Normalized Mantissa for Scientific Notation
Scientific notation numbers express the mantissa with one digit to the left of the decimal point. Given your original number Shift the decimal point left or right until one non-zero digit is to the left of the decimal point For each shift left increase the power of ten exponent by 1 For each shift right decrease power of ten exponent by 1 Examples: 102.5 x 104 = x 106 7589 x 105 = x 108 x 100 = 4.5 x 10-3

Normalized Binary Mantissa
Binary Fraction Normalized IEEE 754 Mantissa (shift binary point left 1 place) (no shift) (shift binary point left 2) (shift binary point right 3) Question: What do all of the normalized binary mantissas have in common?

Fractional representation of mantissa
What do all of the normalized binary mantissas have in common? The one bit to the left of the binary point is always a 1 So if we use 23 bits for the single-precision mantissa, we can “save” a bit by not storing this leading 1 So simply discard the lead 1 after normalizing the binary mantissa

Mantissa Example What is the binary representation of the mantissa in IEEE 754 for 6.25? Solution: 6.25 = 1x22 + 1x21 + 0x20 + 0x x2-2 = b Shift the binary point as far to the left as possible until the bit to the left of the binary point is 1 110.01b --> b (Shift left by 2 places) This shift gives us the assumed 1 bit in the integer part of the mantissa fractional representation...effectively gains one additional bit of representation Mantissa encodes only bits to the right of the binary point 1001b

Mantissa Example Continued...
What is the binary representation of the mantissa in IEEE 754 for 6.25? Solution...: Keeping only the bits to the right of the binary point... 1001b Sign extend the the 4-bits into 23 bits for single precision Append the extra bits to the right for a binary fraction Our imaginary binary point

Updating the Exponent 6.25 = 110.01b has an implied exponent of 20
Following the IEEE 754 convention of shifting the binary point to the left, in this case by 2 positions has the effect of updating the exponent 1.1001b (Following shift left of binary point by 2 positions) For each left shift binary point = Add 1 to binary exponent 6.25 = b x 22

Updating the Binary Exponent
Binary Fraction Normalized Binary Exponents 1.01 x x 20 x x 21 x x 23 x x 2-3 x x 2-5

Representing the exponent in IEEE 754
The exponent is represented as a biased integer For single precision add 127 to the value of the normalized base ten integer exponent For double precision add 1023 to the value of the normalized base ten integer exponent

Representing the exponent in IEEE 754
The exponent is represented as a biased integer For single precision add 127 to the value of the exponent For double precision add 1023 to the value of the exponent Example: How would the values -45 and 123 be represented in the 8-bit biased format for single precision? Answer: = = b = 250 = b

Encoding the Biased Binary Exponent
Binary Fraction Normalized Exponents Biased Exponent 1.01 x x = 127 x x = 128 x x = 130 x x = 124 x x = 122 Encode each biased exponent as an unsigned 8-bit number. Encode each biased exponent in 8-bit two’s complement. Suppose you had to rapidly sort by exponents, which format would be more efficient?

Floating Point Example #1
Recall that 6.25 = b x 22 Encode 6.25 as a 32-bit single precision binary number Sign bit = 0 Mantissa = (encoding omits assumed lead 1) Exponent = = 129 = Encode using 32-bit single precision binary format Sign bit Exponent Mantissa

Floating point example #2
What is the value of the single-precision floating-point number represented by the following 32-bit binary encoding? Sign bit = 0 Encoded Exponent = = 128 Encoded Mantissa = Subtract the added bias of 127 to reveal an exponent = 1 Mantissa = Mantissa = 1.11 (Replace the assumed 1 before the binary point) Mantissa = 1.11 = 1x20 + 1x x2-2 = 1.75 Value = 1.75 x 21 = 3.5

Floating Point Example #3
-6.25 = b x 22 Encode as a 32-bit single precision binary number Sign bit = 1 (Use signed magnitude for mantissa) Mantissa = (encoding omits assumed lead 1) Exponent = = 129 = Encode using 32-bit single precision binary format Sign bit Exponent Mantissa

Exercise Exercise 2.18 (a) on page 42 of Computer Architecture by N. Carter What value is represented by this IEEE single precision value?

Exercise: Solution What value is represented by this IEEE single precision value? Sign bit = 1 Encoded Exponent = = 122 Encoded Mantissa = Subtract added bias of 127 from encoded exponent Actual exponent is -5 Mantissa = = .1 Mantissa = (Add back the assumed 1 before the binary point) Mantissa = - 1 x x2-1 = -1.5 Value = -1.5 x 2-5 = -1.5 x (1/32) =

IEEE 754 Single Precision Range
Smallest positive normalized number x 2-126 Largest normalized number x 2127

Representing 1.0

Representing 0.0 The assumed 1 bit in the mantissa gains an extra bit of precision But zero cannot be represented exactly since a mantissa of 0 is interpreted as 1.0 The IEEE 754 standard specifies that zero is represented using an exponent of 0 with a mantissa of 0.

NaN NaN = Not a Number Special value used to represent a value produced by an error condition such as overflow, underflow, or divide by zero NaN is represented by all 1’s in the exponent field and a non-zero mantissa field Any math operation using NaN results in NaN Example: NaN = NaN

Infinity IEEE 754 represents infinity using all 1’s in the exponent and a fraction field of 0. The sign bit designates positive or negative infinity

Floating Point Addition (Decimal Example)
Example: x x10-1 Step 1: Shift decimal point of smaller number to the left until its updated exponent matches the exponent of the larger number 1.610x10-1  x101 Step 2: Add the mantissas (Assume only 4 significant digits) 9.999 x 101 x 101 x 101 Step 3: Re-normalize to get one non-zero digit left of decimal point x 101  x 102 Step 4: Round the mantissa to 4 significant digits x 102  x 102

Floating Point Addition Example
Use single-precision floating point to compute 0.25 (base 10) = (1/4) = 0.01 = 1.0 x 2-2 1.5 (base 10) = 1 + (1/2) = 1.1 x 20 Shift binary point of smaller number to the left so exponents match 1.0 x 2-2  0.01 x 20

Floating Point Addition Example (continued)
Use single-precision floating point to compute Next, add the mantissas, both with exponent of 0 0.01 x 20 +1.10 x 20 1.11 x 20

Floating Point Addition Example (Continued)
Use single-precision floating point to compute 0.01 x 20 +1.10 x 20 1.11 x 20 Encode result using 32-bit single precision Sign bit = 0 Mantissa = (23 bits) Exponent = = 127 = The 32-bit single precision encoding is...

Floating Point Addition Exercise: Solution
2.20 (b) Use single precision to compute 147.5 (base 10) = (1/2) = = Convert to normalized mantissa format x 20  x 27 Shifted binary point 7 places to the left See Computer Architecture by N. Carter, page 43

2.20 (b) Use single precision to compute 0.25 (base 10) = (1/4) = 0.01 Convert to normalized mantissa format 0.01 x 20  1.0 x 2-2 Shift binary point 2 places to the right

2.20 (b) Use single precision to compute x 27 + 1.0 x 2-2 Shift binary point of smaller number to left to match exponent (7) of the larger number 1.0 x 2-2  x 27 Shift binary point 9 places to the left to go from exponent of -2 to 7

2.20 (b) Use single precision to compute Add the mantissas, both expressed with exponent 7 x 27 x 27 x 27

2.20 (b) Use single precision to compute Encode the result x 27 in single precision Sign bit = 0 since result is positive Mantissa = (23 bits) Exponent = = 134 = The 32-bit single precision encoding is...

Addition with Negative Values
If a value is negative, you must first convert the negative value into two’s complement Example: -0.111 Convert to two’s complement by... 1.000 inverting all bits adding 1 1.001 Use the two’s complement version of the value when adding the mantissas. Discard the carry overflow bit.

Addition with Negative Value(s)
1.000 x 20 (1.0 in base ten) x 2-1 (-0.5 in base ten) Move binary point of the smaller number so exponents match x 20 (-0.5 in base ten) Convert mantissa of -0.5 into two’s complement then add x 2-1 (-0.5 in base ten) x 2-1 Two’s complement addition discards carry overflow bit The sum is x 2-1 Normalize the exponent to get sum of x 2-1 (0.5 base ten)

Floating Point Addition (page 282 of H&P)
Example: Compute (base 10) using binary arithmetic. 0.5 (base 10) = 0.1 x 20 Normalize to get 1 to left of the binary point 0.5 = 0.1 x 20 = 1.0x2-1 = = - ((1/4) + (1/8) + (1/16)) Normalize to get 1 to the left of the binary point  1.11 x 2-2

Compute 0.5 = 0.1 x 20 = 1.0x2-1 + = 1.11 x 2-2 Step 1: Shift binary point of smaller number to the left until its updated exponent matches the exponent of the larger number 1.11 x 2-2  x 2-1 Step 2: Add the mantissas * Convert negative value to two’s 1.000 (1.0 decimal) complement then add ( decimal) * Discard carry overflow bit 0.001 (0.125 decimal) 0.001 x 2-1

Step 3: Normalize to get 1 to left of binary point 0.001 x 2-1  1.0 x 2-4 Exponent of -4 lies between 127 and -126 (range of single precision exponents)...therefore no overflow or underflow Express exponent in biased notation by adding 127 Encoded exponent = = 123 Step 4: Round to 23 binary digits of mantissa precision 1.0 x 2-4 (no rounding needed)

Floating point multiplication
Multiply the mantissas and add the exponents Result = (mantissa1 x mantissa2) + 2(exp1 + exp2) Example (in decimal) 5x103 x 2x106 = 10x109 If the mantissa is >= 10 then shift the mantissa down 1 place (divide by 10) and increment the result exponent 10x109 = 1x1010

Floating point multiplication
Since the IEEE 754 uses biased integers to represent the exponent, the bias must be considered when adding the exponents Add the two biased integer exponents, then subtract the bias value from the result Example: Add biased +127 exponents of 150 and 45 Break down the exponents to see the bias values of 127 150 = ( ) 45 = ( ) Add the biased exponents: = 195 Subtract the bias of 127: – 127 = 68 result biased exponent Check it: 68 – 127 = Actual exponent of -59 =

Floating Point Multiplication: Example
Exercise 2.20 (a) Use IEEE single precision to compute 32 x 16. 32 (base 10) = x 20 (binary) Convert to normalized binary mantissa format x 20  1.0 x 25 (Shift binary point 5 places to left) Exponent = = 132 16 (base 10) = x 20 (binary) x 20  1.0 x 24 (Shift binary point 4 places to left) Exponent = = 131 See Computer Architecture by N. Carter, page 43

Exercise 2.20 (a) Use IEEE single precision to compute 32 x 16. 1.0 x 25 Exponent = = 132 1.0 x 24 Exponent = = 131 Multiply mantissas: 1.0 x 1.0 1.0 x1.0 Count number of bits right of binary point of operands 0 0 Place binary point two places from left of product 1.0 0 Add +127 biased exponents: – 127 = 136 Actual unbiased exponent = 136 – 127 = 9 Product = 1.0 x 29 = 512

Exercise 2.20 (a) Use IEEE single precision to compute 32 x 16. 1.0 x 25 Exponent = = 132 1.0 x 24 Exponent = = 131 Multiply mantissas: 1.0 x 1.0 = 1.0 (binary) Add +127 biased exponents: – 127 = 136 Actual unbiased exponent = 136 – 127 = 9 Product = 1.0 x 29 = 512 Sign Bit = 0 Mantissa = Exponent = 136 = The encoded IEEE 754 single precision number is...

Floating Point Multiplication Exercise: Solution
2.20 (c) Compute x 8 using single-precision binary. 0.125 (base 10) = x 20 = 1.0 x 2-3 (Normalized binary mantissa) 8 (base 10) = x 20 = 1.0 x 23 (Normalized binary mantissa) See Computer Architecture by N. Carter, page 43

Floating Point Multiplication Exercise: Solution
2.20 (c) Compute x 8 using single-precision binary. 1.0 x 2-3 Biased exponent = = 124 1.0 x 23 Biased exponent = = 130 Multiply mantissas: 1.0 x 1.0 = 1.0 (binary) Add biased exponents: – 127 = 127 Actual exponent: 127 – 127 = 0 Sign Bit = 0 Mantissa = Exponent = 127 = is the encoded binary number

Floating Point Multiplication Exercise
Multiply 0.75 x 32 using IEEE 754 single-precision format 0.75 = 0.11 x 20 Normalized 1.1 x 2-1 Biased exponent = = 126 32 = x 20 Normalized 1.0 x 25 Biased exponent = = 132 Multiply the mantissas 1.1 To place the binary point... x1.0 Count number of bits to right of binary points 0 0 of the two operands 1.1 and 1.0 Total of 2 places so place binary point two places from the left in the product

Multiply 0.75 x 32 using IEEE 754 single-precision format Multiply the mantissas 1.1 x1.0 0 0 + 1 1 1.1 0 Add the biased exponents: – 127 = 131 (unbiased exponent is 4)

Multiply 0.75 x 32 using IEEE 754 single-precision format Product of the mantissas 1.10 Add the biased exponents: – 127 = 131 (unbiased exponent is 4) The product is already normalized Encode the product using IEEE 32-bit format Sign bit = 0 Exponent = 131 = Mantissa =

Rounding of Floating Point Numbers
Accurate rounding requires the hardware to use a few extra bits to hold intermediate results Use these extra bits to decide how to round when the final result is stored in the 32-bit single precision or 64-bit double precision format The IEEE 754 standard uses up to three additional bits called the guard, round, and sticky bits to assist in accurate rounding See pages of Computer Organization and Design

Compute this base ten addition rounding all intermediate values to three significant digits 2.56 x 100 x 102 First shift the decimal point of the top number to align the exponents 0.02 x 102 Rounding to three digits looses information 2.36 x 102

Compute this base ten addition using intermediate values that keep an extra two digits 2.56 x 100 x 102 First shift the decimal point of the top number to align the exponents x 102 Intermediate values use two extra bits x 102 x 102 Use extra two bits to round the result down to three significant digits 2.37 x 102

Java Applets for IEEE Floating Point
A Java applet that converts decimal numbers to IEEE single or double precision encodings can be found at... This applet may be used to make up your own sample problems to convert between decimal and IEEE format and to check the result of other calculations in IEEE floating point format Interactive floating point addition demo

Floating Point Representations

Similar presentations

Presentation on theme: "Floating Point Representations"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Floating Point Representations

Similar presentations

Presentation on theme: "Floating Point Representations"— Presentation transcript:

Similar presentations

About project

Feedback