Floating Point Representations

Slides:

Advertisements

Similar presentations

Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.

Advertisements

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.

Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3: IT Students.

Floating Point Numbers

1 IEEE Floating Point Revision Guide for Phase Test Week 5.

CSE 378 Floating-point1 How to represent real numbers In decimal scientific notation –sign –fraction –base (i.e., 10) to some power Most of the time, usual.

Floating Point Numbers

Binary Number Systems.

The Binary Number System

Computer Organization and Architecture Computer Arithmetic Chapter 9.

Computer Arithmetic Nizamettin AYDIN

Number Systems II Prepared by Dr P Marais (Modified by D Burford)

1 Lecture 5 Floating Point Numbers ITEC 1000 “Introduction to Information Technology”

Number Systems So far we have studied the following integer number systems in computer Unsigned numbers Sign/magnitude numbers Two’s complement numbers.

Computing Systems Basic arithmetic for computers.

ECE232: Hardware Organization and Design

Floating Point. Agenda  History  Basic Terms  General representation of floating point  Constructing a simple floating point representation  Floating.

Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large numbers,

Oct. 18, 2007SYSC 2001* - Fall SYSC2001-Ch9.ppt1 See Stallings Chapter 9 Computer Arithmetic.

ECEG-3202: Computer Architecture and Organization, Dept of ECE, AAU 1 Floating-Point Arithmetic Operations.

CSC 221 Computer Organization and Assembly Language

Floating Point Arithmetic

Dale Roberts Department of Computer and Information Science, School of Science, IUPUI CSCI N305 Information Representation: Floating Point Representation.

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:

Computer Arithmetic Floating Point. We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large.

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.

Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.

Cosc 2150: Computer Organization Chapter 9, Part 3 Floating point numbers.

Representing Positive and Negative Numbers

1 CE 454 Computer Architecture Lecture 4 Ahmed Ezzat The Digital Logic, Ch-3.1.

Chapter 9 Computer Arithmetic

William Stallings Computer Organization and Architecture 8th Edition

Floating Point Numbers

Lecture 6. Fixed and Floating Point Numbers

Backgrounder: Binary Math

Binary Numbers The arithmetic used by computers differs in some ways from that used by people. Computers perform operations on numbers with finite and.

Integer Division.

Lecture 9: Floating Point

Topics IEEE Floating Point Standard Rounding Floating Point Operations

Floating Point Numbers: x 10-18

Floating Point Number system corresponding to the decimal notation

CS 232: Computer Architecture II

CS/COE0447 Computer Organization & Assembly Language

William Stallings Computer Organization and Architecture 7th Edition

Chapter 6 Floating Point

Topic 3d Representation of Real Numbers

Luddy Harrison CS433G Spring 2007

CSCE 350 Computer Architecture

Number Representations

Data Representation Data Types Complements Fixed Point Representation

CSCI206 - Computer Organization & Programming

How to represent real numbers

How to represent real numbers

Approximations and Round-Off Errors Chapter 3

Computer Arithmetic Multiplication, Floating Point

ECEG-3202 Computer Architecture and Organization

Chapter 3 DataStorage Foundations of Computer Science ã Cengage Learning.

Floating Point Numbers

Review In last lecture, done with unsigned and signed number representation. Introduced how to represent real numbers in float format.

Topic 3d Representation of Real Numbers

Floating Point Numbers

Chapter3 Fixed Point Representation

Computer Organization and Assembly Language

Number Representations

Lecture 9: Shift, Mult, Div Fixed & Floating Point

Presentation transcript:

Floating Point Representations Updated: 06/03/2010 Floating Point Representations

Decimal Floating Point Numbers 1.5 102.450 Decimal fractions ¾ -> 0.75 1/100 -> 0.01

Scientific Notation How can we more compactly represent these values? 1,000,000,000 -> 1.0 x 10^9 0.000025 -> 2.5 x 10^-5 What are the two parts of a scientific notation value called? Mantissa and Exponent Normalize mantissa so it has one digit left of the decimal point

Floating point numbers Floating point numbers such as 7.519, -0.01, and 4.3x108 are represented using the IEEE 754 standard format Floating point is represented using a mantissa and exponent Example: 7.51x25 The mantissa is 7.51 The exponent is 5 ---> Note: exponent is a power of 2 A set number of bits is assigned to represent the mantissa and exponent 1 8 bits 23 bits mantissa exponent sign bit 32 bit single precision 1 11 bits 52 bits mantissa exponent sign bit 64 bit double precision

Rounding Not every floating point value can be represented exactly in binary using a finite number of bits Question: What are some examples? 1/3 = 0.3333… PI = 3.141…. In these cases, must round to the nearest number that can be represented If a number is halfway between two possible representable values, then round to the one whose least-significant digit is even

Examples of Rounding Round each of these numbers to two significant digits. 1.345 --> 1.3 Choose 1.3 since 1.345 is nearer to 1.3 than 1.4 78.953 --> 79 Choose 79 since it’s nearer than 78 12.5 --> 12 12.5 is halfway between 12 and 13 Choose 12 since its least significant digit is even 13.5 --> 14 13.5 is halfway between 13 and 14 Choose 14 since its least significant digit is even

Fractional binary numbers Fractional binary numbers use the familiar decimal place-value representation, but with a base of 2 instead of 10 Example: 11.101b = 1x21 + 1x20 + 1x2-1 + 0x2-2 + 1x2-3 = 2 + 1 + 1/2 + 0/4 + 1/8 = 3 + 0.5 + 0.0 + 0.125 = 3.625

Exercises Convert this binary fraction into decimal 101.011 Answer: 5 + 0*1/2 + 1*1/4 + 1*1/8 = 5.375 Express the decimal value 6.5 as a binary fraction Answer: 110.1 Express the decimal value 11.75 as a binary fraction Answer: 1011.11

Normalized Mantissa for Scientific Notation Scientific notation numbers express the mantissa with one digit to the left of the decimal point. Given your original number Shift the decimal point left or right until one non-zero digit is to the left of the decimal point For each shift left increase the power of ten exponent by 1 For each shift right decrease power of ten exponent by 1 Examples: 102.5 x 104 = 1.025 x 106 7589 x 105 = 7.589 x 108 0.0045 x 100 = 4.5 x 10-3

Normalized Binary Mantissa Binary Fraction Normalized IEEE 754 Mantissa 11.110 1.1110 (shift binary point left 1 place) 1.01 1.01 (no shift) 101.01 1.0101 (shift binary point left 2) 0.001011 1.011 (shift binary point right 3) Question: What do all of the normalized binary mantissas have in common?

Fractional representation of mantissa What do all of the normalized binary mantissas have in common? The one bit to the left of the binary point is always a 1 So if we use 23 bits for the single-precision mantissa, we can “save” a bit by not storing this leading 1 So simply discard the lead 1 after normalizing the binary mantissa

Mantissa Example What is the binary representation of the mantissa in IEEE 754 for 6.25? Solution: 6.25 = 1x22 + 1x21 + 0x20 + 0x2-1 + 1x2-2 = 110.01b Shift the binary point as far to the left as possible until the bit to the left of the binary point is 1 110.01b --> 1.1001b (Shift left by 2 places) This shift gives us the assumed 1 bit in the integer part of the mantissa fractional representation...effectively gains one additional bit of representation Mantissa encodes only bits to the right of the binary point 1001b

Mantissa Example Continued... What is the binary representation of the mantissa in IEEE 754 for 6.25? Solution...: Keeping only the bits to the right of the binary point... 1001b Sign extend the the 4-bits into 23 bits for single precision Append the extra bits to the right for a binary fraction 1001 0000 0000 0000 0000 000 Our imaginary binary point

Updating the Exponent 6.25 = 110.01b has an implied exponent of 20 Following the IEEE 754 convention of shifting the binary point to the left, in this case by 2 positions has the effect of updating the exponent 1.1001b (Following shift left of binary point by 2 positions) For each left shift binary point = Add 1 to binary exponent 6.25 = 1.1001b x 22

Updating the Binary Exponent Binary Fraction Normalized Binary Exponents 1.01 x 20 1.01 x 20 11.110 x 20 1.1110 x 21 1101.01 x 20 1.10101 x 23 0.001011 x 20 1.011 x 2-3 0.0000101 x 20 1.01 x 2-5

Representing the exponent in IEEE 754 The exponent is represented as a biased integer For single precision add 127 to the value of the normalized base ten integer exponent For double precision add 1023 to the value of the normalized base ten integer exponent

Representing the exponent in IEEE 754 The exponent is represented as a biased integer For single precision add 127 to the value of the exponent For double precision add 1023 to the value of the exponent Example: How would the values -45 and 123 be represented in the 8-bit biased format for single precision? Answer: -45 + 127 = 82 = 01010010b 123 + 127 = 250 = 11111010b

Encoding the Biased Binary Exponent Binary Fraction Normalized Exponents Biased Exponent 1.01 x 20 1.01 x 20 0 + 127 = 127 11.110 x 20 1.1110 x 21 1 + 127 = 128 1101.01 x 20 1.10101 x 23 3 + 127 = 130 0.001011 x 20 1.011 x 2-3 -3 + 127 = 124 0.0000101 x 20 1.01 x 2-5 -5 + 127 = 122 Encode each biased exponent as an unsigned 8-bit number. Encode each biased exponent in 8-bit two’s complement. Suppose you had to rapidly sort by exponents, which format would be more efficient?

Floating Point Example #1 Recall that 6.25 = 1.1001b x 22 Encode 6.25 as a 32-bit single precision binary number Sign bit = 0 Mantissa = 1.1001 (encoding omits assumed lead 1) Exponent = 2 + 127 = 129 = 10000001 Encode using 32-bit single precision binary format 0 10000001 10010000000000000000000 Sign bit Exponent Mantissa

Floating point example #2 What is the value of the single-precision floating-point number represented by the following 32-bit binary encoding? 0 10000000 110 0000 0000 0000 0000 0000 Sign bit = 0 Encoded Exponent = 10000000 = 128 Encoded Mantissa = 110 0000 0000 0000 0000 0000 Subtract the added bias of 127 to reveal an exponent = 1 Mantissa = . 110 0000 0000 0000 0000 0000 Mantissa = 1.11 (Replace the assumed 1 before the binary point) Mantissa = 1.11 = 1x20 + 1x2-1 + 1x2-2 = 1.75 Value = 1.75 x 21 = 3.5

Floating Point Example #3 -6.25 = -1.1001b x 22 Encode -6.25 as a 32-bit single precision binary number Sign bit = 1 (Use signed magnitude for mantissa) Mantissa = 1.1001 (encoding omits assumed lead 1) Exponent = 2 + 127 = 129 = 10000001 Encode using 32-bit single precision binary format 1 10000001 10010000000000000000000 Sign bit Exponent Mantissa

Exercise Exercise 2.18 (a) on page 42 of Computer Architecture by N. Carter What value is represented by this IEEE single precision value? 1 01111010 100 0000 0000 0000 0000 0000

Exercise: Solution What value is represented by this IEEE single precision value? 1 01111010 100 0000 0000 0000 0000 0000 Sign bit = 1 Encoded Exponent = 01111010 = 122 Encoded Mantissa = 100 0000 0000 0000 0000 0000 Subtract added bias of 127 from encoded exponent Actual exponent is -5 Mantissa = . 100 0000 0000 0000 0000 0000 = .1 Mantissa = 1.1 (Add back the assumed 1 before the binary point) Mantissa = - 1 x 20 + 1x2-1 = -1.5 Value = -1.5 x 2-5 = -1.5 x (1/32) = -0.046875

IEEE 754 Single Precision Range Smallest positive normalized number 1.00000000000000000000000 x 2-126 Largest normalized number 1.11111111111111111111111 x 2127

Representing 1.0

Representing 0.0 The assumed 1 bit in the mantissa gains an extra bit of precision But zero cannot be represented exactly since a mantissa of 0 is interpreted as 1.0 The IEEE 754 standard specifies that zero is represented using an exponent of 0 with a mantissa of 0.

NaN NaN = Not a Number Special value used to represent a value produced by an error condition such as overflow, underflow, or divide by zero NaN is represented by all 1’s in the exponent field and a non-zero mantissa field Any math operation using NaN results in NaN Example: NaN + 4.5 = NaN

Infinity IEEE 754 represents infinity using all 1’s in the exponent and a fraction field of 0. The sign bit designates positive or negative infinity

Floating Point Addition (Decimal Example) Example: 9.999 x 101 + 1.610x10-1 Step 1: Shift decimal point of smaller number to the left until its updated exponent matches the exponent of the larger number 1.610x10-1  0.01610x101 Step 2: Add the mantissas (Assume only 4 significant digits) 9.999 x 101 +0.016 x 101 10.015 x 101 Step 3: Re-normalize to get one non-zero digit left of decimal point 10.015 x 101  1.0015 x 102 Step 4: Round the mantissa to 4 significant digits 1.0015 x 102  1.002 x 102

Floating Point Addition Example Use single-precision floating point to compute 0.25 + 1.5 0.25 (base 10) = (1/4) = 0.01 = 1.0 x 2-2 1.5 (base 10) = 1 + (1/2) = 1.1 x 20 Shift binary point of smaller number to the left so exponents match 1.0 x 2-2  0.01 x 20

Floating Point Addition Example (continued) Use single-precision floating point to compute 0.25 + 1.5 Next, add the mantissas, both with exponent of 0 0.01 x 20 +1.10 x 20 1.11 x 20

Floating Point Addition Example (Continued) Use single-precision floating point to compute 0.25 + 1.5 0.01 x 20 +1.10 x 20 1.11 x 20 Encode result using 32-bit single precision Sign bit = 0 Mantissa = 11000000000000000000000 (23 bits) Exponent = 0 + 127 = 127 = 01111111 The 32-bit single precision encoding is... 0 01111111 11000000000000000000000

Floating Point Addition Exercise: Solution 2.20 (b) Use single precision to compute 147.5 + 0.25 147.5 (base 10) = 128 + 16 + 2 + 1 + (1/2) = = 10010011.1 Convert to normalized mantissa format 10010011.1 x 20  1.00100111 x 27 Shifted binary point 7 places to the left See Computer Architecture by N. Carter, page 43

Floating Point Addition Exercise: Solution 2.20 (b) Use single precision to compute 147.5 + 0.25 0.25 (base 10) = (1/4) = 0.01 Convert to normalized mantissa format 0.01 x 20  1.0 x 2-2 Shift binary point 2 places to the right

Floating Point Addition Exercise: Solution 2.20 (b) Use single precision to compute 147.5 + 0.25 1.00100111 x 27 + 1.0 x 2-2 Shift binary point of smaller number to left to match exponent (7) of the larger number 1.0 x 2-2  0.000000001 x 27 Shift binary point 9 places to the left to go from exponent of -2 to 7

Floating Point Addition Exercise: Solution 2.20 (b) Use single precision to compute 147.5 + 0.25 Add the mantissas, both expressed with exponent 7 1.001001110 x 27 + 0.000000001 x 27 1.001001111 x 27

Floating Point Addition Exercise: Solution 2.20 (b) Use single precision to compute 147.5 + 0.25 Encode the result 1.001001111 x 27 in single precision Sign bit = 0 since result is positive Mantissa = 00100111100000000000000 (23 bits) Exponent = 7 + 127 = 134 = 10000110 The 32-bit single precision encoding is... 0 10000110 00100111100000000000000

Addition with Negative Values If a value is negative, you must first convert the negative value into two’s complement Example: -0.111 Convert to two’s complement by... 1.000 inverting all bits + 0.001 adding 1 1.001 Use the two’s complement version of the value when adding the mantissas. Discard the carry overflow bit.

Addition with Negative Value(s) 1.000 x 20 (1.0 in base ten) -1.000 x 2-1 (-0.5 in base ten) Move binary point of the smaller number so exponents match -0.100 x 20 (-0.5 in base ten) Convert mantissa of -0.5 into two’s complement then add +1.100 x 2-1 (-0.5 in base ten) 10.100 x 2-1 Two’s complement addition discards carry overflow bit The sum is 0.100 x 2-1 Normalize the exponent to get sum of 1.000 x 2-1 (0.5 base ten)

Floating Point Addition (page 282 of H&P) Example: Compute 0.5 + -0.4375 (base 10) using binary arithmetic. 0.5 (base 10) = 0.1 x 20 Normalize to get 1 to left of the binary point 0.5 = 0.1 x 20 = 1.0x2-1 -0.4375 = -0.0111 = - ((1/4) + (1/8) + (1/16)) Normalize to get 1 to the left of the binary point -0.0111  1.11 x 2-2

Floating Point Addition (page 282 of H&P) Compute 0.5 = 0.1 x 20 = 1.0x2-1 + -0.4375 = 1.11 x 2-2 Step 1: Shift binary point of smaller number to the left until its updated exponent matches the exponent of the larger number 1.11 x 2-2  0.111 x 2-1 Step 2: Add the mantissas * Convert negative value to two’s 1.000 (1.0 decimal) complement then add -0.111 (-0.875 decimal) * Discard carry overflow bit 0.001 (0.125 decimal) 0.001 x 2-1

Floating Point Addition (page 282 of H&P) Step 3: Normalize to get 1 to left of binary point 0.001 x 2-1  1.0 x 2-4 Exponent of -4 lies between 127 and -126 (range of single precision exponents)...therefore no overflow or underflow Express exponent in biased notation by adding 127 Encoded exponent = -4 + 127 = 123 Step 4: Round to 23 binary digits of mantissa precision 1.0 x 2-4 (no rounding needed)

Floating point multiplication Multiply the mantissas and add the exponents Result = (mantissa1 x mantissa2) + 2(exp1 + exp2) Example (in decimal) 5x103 x 2x106 = 10x109 If the mantissa is >= 10 then shift the mantissa down 1 place (divide by 10) and increment the result exponent 10x109 = 1x1010

Floating point multiplication Since the IEEE 754 uses biased integers to represent the exponent, the bias must be considered when adding the exponents Add the two biased integer exponents, then subtract the bias value from the result Example: Add biased +127 exponents of 150 and 45 Break down the exponents to see the bias values of 127 150 = (23 + 127) 45 = (-82 + 127) Add the biased exponents: 150 + 45 = 195 Subtract the bias of 127: 150 + 45 – 127 = 68 result biased exponent Check it: 68 – 127 = Actual exponent of -59 = 23 + -82

Floating Point Multiplication: Example Exercise 2.20 (a) Use IEEE single precision to compute 32 x 16. 32 (base 10) = 100000.0 x 20 (binary) Convert to normalized binary mantissa format 100000.0 x 20  1.0 x 25 (Shift binary point 5 places to left) Exponent = 5 + 127 = 132 16 (base 10) = 10000.0 x 20 (binary) 10000.0 x 20  1.0 x 24 (Shift binary point 4 places to left) Exponent = 4 + 127 = 131 See Computer Architecture by N. Carter, page 43

Floating Point Multiplication: Example Exercise 2.20 (a) Use IEEE single precision to compute 32 x 16. 1.0 x 25 Exponent = 5 + 127 = 132 1.0 x 24 Exponent = 4 + 127 = 131 Multiply mantissas: 1.0 x 1.0 1.0 x1.0 Count number of bits right of binary point of operands 0 0 Place binary point two places from left of product + 1 0 0 1.0 0 Add +127 biased exponents: 132 + 131 – 127 = 136 Actual unbiased exponent = 136 – 127 = 9 Product = 1.0 x 29 = 512

Floating Point Multiplication: Example Exercise 2.20 (a) Use IEEE single precision to compute 32 x 16. 1.0 x 25 Exponent = 5 + 127 = 132 1.0 x 24 Exponent = 4 + 127 = 131 Multiply mantissas: 1.0 x 1.0 = 1.0 (binary) Add +127 biased exponents: 132 + 131 – 127 = 136 Actual unbiased exponent = 136 – 127 = 9 Product = 1.0 x 29 = 512 Sign Bit = 0 Mantissa = 1.00000000000000000000000 Exponent = 136 = 10001000 The encoded IEEE 754 single precision number is... 0 10001000 00000000000000000000000

Floating Point Multiplication Exercise: Solution 2.20 (c) Compute 0.125 x 8 using single-precision binary. 0.125 (base 10) = 0.001 x 20 = 1.0 x 2-3 (Normalized binary mantissa) 8 (base 10) = 1000.0 x 20 = 1.0 x 23 (Normalized binary mantissa) See Computer Architecture by N. Carter, page 43

Floating Point Multiplication Exercise: Solution 2.20 (c) Compute 0.125 x 8 using single-precision binary. 1.0 x 2-3 Biased exponent = -3 + 127 = 124 1.0 x 23 Biased exponent = 3 + 127 = 130 Multiply mantissas: 1.0 x 1.0 = 1.0 (binary) Add biased exponents: 124 + 130 – 127 = 127 Actual exponent: 127 – 127 = 0 Sign Bit = 0 Mantissa = 1.00000000000000000000000 Exponent = 127 = 01111111 0 01111111 00000000000000000000000 is the encoded binary number

Floating Point Multiplication Exercise Multiply 0.75 x 32 using IEEE 754 single-precision format 0.75 = 0.11 x 20 Normalized 1.1 x 2-1 Biased exponent = -1 + 127 = 126 32 = 100000.0 x 20 Normalized 1.0 x 25 Biased exponent = 5 + 127 = 132 Multiply the mantissas 1.1 To place the binary point... x1.0 Count number of bits to right of binary points 0 0 of the two operands 1.1 and 1.0 + 1 1 Total of 2 places so place binary point 1.1 0 two places from the left in the product

Floating Point Multiplication Exercise Multiply 0.75 x 32 using IEEE 754 single-precision format Multiply the mantissas 1.1 x1.0 0 0 + 1 1 1.1 0 Add the biased exponents: 126 + 132 – 127 = 131 (unbiased exponent is 4)

Floating Point Multiplication Exercise Multiply 0.75 x 32 using IEEE 754 single-precision format Product of the mantissas 1.10 Add the biased exponents: 126 + 132 – 127 = 131 (unbiased exponent is 4) The product is already normalized Encode the product using IEEE 32-bit format Sign bit = 0 Exponent = 131 = 10000011 Mantissa = 10000000000000000000000 0 10000011 10000000000000000000000

Rounding of Floating Point Numbers Accurate rounding requires the hardware to use a few extra bits to hold intermediate results Use these extra bits to decide how to round when the final result is stored in the 32-bit single precision or 64-bit double precision format The IEEE 754 standard uses up to three additional bits called the guard, round, and sticky bits to assist in accurate rounding See pages 297-298 of Computer Organization and Design

Rounding of Floating Point Numbers Compute this base ten addition rounding all intermediate values to three significant digits 2.56 x 100 + 2.34 x 102 First shift the decimal point of the top number to align the exponents 0.02 x 102 Rounding to three digits looses information 2.36 x 102

Rounding of Floating Point Numbers Compute this base ten addition using intermediate values that keep an extra two digits 2.56 x 100 + 2.34 x 102 First shift the decimal point of the top number to align the exponents 0.0256 x 102 Intermediate values use two extra bits + 2.3400 x 102 2.3656 x 102 Use extra two bits to round the result down to three significant digits 2.37 x 102

Java Applets for IEEE Floating Point A Java applet that converts decimal numbers to IEEE single or double precision encodings can be found at... http://babbage.cs.qc.edu/courses/cs341/IEEE-754.html This applet may be used to make up your own sample problems to convert between decimal and IEEE format and to check the result of other calculations in IEEE floating point format Interactive floating point addition demo http://tima-cmp.imag.fr/~guyot/Cours/Oparithm/english/Flottan.htm