IEEE floating point format

Slides:



Advertisements
Similar presentations
Spring 2013 Advising Starts this week! CS2710 Computer Organization1.
Advertisements

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.
Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Lecture 16: Computer Arithmetic Today’s topic –Floating point numbers –IEEE 754 representations –FP arithmetic Reminder –HW 4 due Monday 1.
Floating Point Numbers
Faculty of Computer Science © 2006 CMPUT 229 Floating Point Representation Operating with Real Numbers.
COMP3221: Microprocessors and Embedded Systems Lecture 14: Floating Point Numbers Lecturer: Hui Wu Session 2, 2004.
Floating Point Numbers. CMPE12cCyrus Bazeghi 2 Floating Point Numbers Registers for real numbers usually contain 32 or 64 bits, allowing 2 32 or 2 64.
Major Numeric Data Types Unsigned Integers Signed Integer Alphanumeric Data – ASCII & UNICODE Floating Point Numbers.
Number Systems Standard positional representation of numbers:
Integer Arithmetic Floating Point Representation Floating Point Arithmetic Topics.
Floating Point Numbers
Computer ArchitectureFall 2008 © August 27, CS 447 – Computer Architecture Lecture 4 Computer Arithmetic (2)
Computer Science 210 Computer Organization Floating Point Representation.
Numbers and number systems
Information Representation (Level ISA3) Floating point numbers.
Computer Organization and Architecture Computer Arithmetic Chapter 9.
Number Systems II Prepared by Dr P Marais (Modified by D Burford)
Dale Roberts Department of Computer and Information Science, School of Science, IUPUI CSCI 230 Information Representation: Negative and Floating Point.
Introduction to Numerical Analysis I
Computer Architecture
Floating Point. Agenda  History  Basic Terms  General representation of floating point  Constructing a simple floating point representation  Floating.
Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large numbers,
Figure 1.1 Block diagram of a digital computer. Functional Units.
Lecture 9: Floating Point
CSC 221 Computer Organization and Assembly Language
Floating Point Arithmetic
COMP201 Computer Systems Floating Point Numbers. Floating Point Numbers  Representations considered so far have a limited range dependent on the number.
Computer Arithmetic Floating Point. We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large.
Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.
Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.
Data Representation: Floating Point for Real Numbers Computer Organization and Assembly Language: Module 11.
CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic.
Introduction to Numerical Analysis I
Floating Point Representations
Department of Computer Science Georgia State University
Fundamentals of Computer Science
Computer Science 210 Computer Organization
Binary Numbers The arithmetic used by computers differs in some ways from that used by people. Computers perform operations on numbers with finite and.
2.4. Floating Point Numbers
Floating Point Representations
CSCI206 - Computer Organization & Programming
Integer Division.
Floating Point Math & Representation
Lecture 9: Floating Point
Topics IEEE Floating Point Standard Rounding Floating Point Operations
Floating Point Number system corresponding to the decimal notation
CS 232: Computer Architecture II
PRESENTED BY J.SARAVANAN. Introduction: Objective: To provide hardware support for floating point arithmetic. To understand how to represent floating.
Chapter 6 Floating Point
Arithmetic for Computers
Topic 3d Representation of Real Numbers
Luddy Harrison CS433G Spring 2007
CSCI206 - Computer Organization & Programming
Number Representations
Data Representation Data Types Complements Fixed Point Representation
Floating Point Representation
Computer Science 210 Computer Organization
CS 105 “Tour of the Black Holes of Computing!”
How to represent real numbers
ECEG-3202 Computer Architecture and Organization
Floating Point Arithmetic August 31, 2009
CS 105 “Tour of the Black Holes of Computing!”
CS213 Floating Point Topics IEEE Floating Point Standard Rounding
Topic 3d Representation of Real Numbers
CS 286 Computer Architecture & Organization
Overview Fractions & Sign Extension Floating Point Representations
CS 105 “Tour of the Black Holes of Computing!”
Computer Organization and Assembly Language
Number Representations
Presentation transcript:

IEEE floating point format V1.0 (22/10/2005) IEEE floating point format Most computers use a standard format known as the IEEE floating-point format defined by IEEE 754-1990 standard for binary floating point arithmetic.

Single Precision The IEEE single precision floating point standard representation requires a 32 bit word S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF 0 1 8 9 31 Sign Exponent Fraction A sign bit of 0 indicates a positive number, and a 1 is negative The exponent field is represented by "excess 127 notation“ The 23 fraction bits actually represent 24 bits of precision, as a leading 1 in front of the decimal point is implied (hidden bit). In 32 bit IEEE format, 1 bit is allocated as the sign bit, the next 8 bits are allocated as the exponent field, and the last 23 bits are the fractional parts of the normalized number.

The value may be determined as: E > 0 and E < 255 V = (-1)^S*1.F* 2^(E-127) S=sign bit E = exponent in excess 127 representation F = fractional part in binary notation "1.F" is intended to represent the binary number created by prefixing F with an implicit leading 1 and a binary point hidden bit = (-1)S (1.F )2E-127

There are some exceptions

E= 255: Specials: Not a Number, +Infinity, -Infinity F <> 0, then V=NaN ("Not a number"), Overflow, error..... F =0 and S is 1, then V=-Infinity F =0 and S is 0, then V=Infinity

E= 0: Denormals; F=0 and S is 1, then V=-0 F=0 and S is 0, then V=0 F <>0, then V = denormalized, tiny number, smaller than smallest allowed “denormalized” values V = (-1)^S * 2 ^ (-126) * (0.F) = (-1)S (0.F )2-126

The range As exponent field 0 and 255 is reserved, the range would be restricted to 2 -126 to 2 127

Example

Finding the IEEE 32 bit floating point representation : For the decimal number -11.5 : STEPS: Convert to binary: -11.510 = -1011.12 Convert to normalized binary scientific notation (Shift number to the form of 1.F* 2^E): -1011.1 = -1.0111x23 Use only the fractional part (remove the “1.” preceding the fractional part: hidden bit) F = 01110000000000000000000 Add 127 (excess 127 code) to exponent field, convert to binary: 3+127 = 130 = 10000010 E = 10000010 Determine the sign bit: Negative number, set to 1; otherwise set to 0. S = 1 Assemble the 32 bits (S & E & F): 1 10000010 01110000000000000000000 V = 11000001001110000000000000000000

Finding the IEEE 32 bit floating point representation : For the decimal number 9+97/128 : STEPS: Convert to binary: 9+97/128 = 1001.11000012 Convert to normalized binary scientific notation (Shift number to the form of 1.F* 2^E): 1001.1100001= 1.0011100001x23 Use only the fractional part (remove the “1.” preceding the fractional part: hidden bit) F = 00111000010000000000000 Add 127 (excess 127 code) to exponent field, convert to binary: 3+127 = 130 = 10000010 E = 10000010 Determine the sign bit: Negative number, set to 1; otherwise set to 0. S = 0 Assemble the 32 bits (S & E & F): 0 10000010 00111000010000000000000 V = 01000001000111000010000000000000

IEEE floating point format example 0 00000000 00000000000000000000000 = 0 1 00000000 00000000000000000000000 = -0 0 11111111 00000000000000000000000 = Infinity 1 11111111 00000000000000000000000 = -Infinity 0 11111111 00000100000000000000000 = NaN 1 11111111 01100010001001010001010 = NaN 0 10000000 00000000000000000000000 = +1*2^(128-127)*1.0 = 2 0 10000001 10100000000000000000000 = +1 *2^(129-127)*1.101 = 6.5 1 10000001 10100000000000000000000 = -1 *2^(129-127)*1.101 = -6.5 0 00000001 00000000000000000000000 = +1 *2^(1-127)*1.0 = 2**(-126) 0 00000000 10000000000000000000000 = +1 *2^(-126)*0.1 = 2**(-127) 0 00000000 00000000000000000000001 = +1 *2^(-126)*0.00000000000000000000001 = 2^(-149) =2-149 (Smallest positive value)

Precision Fixed-point representations: the number of digits before and after the decimal point is set Floating point: there is no fixed number of digits before and after the decimal point; that is, the decimal point can float Floating-point representations are slower and less accurate than fixed-point representations, but they can handle a larger range of numbers As computers are integer machines, complex codes are used to represent real numbers

Floating-point numbers are just approximations small discrepancies in the approximations can return meaningless results One of the challenges in programming with floating-point values is ensuring that the approximations lead to reasonable results

Precision: the number of bits used to hold the fractional part Floating-point numbers are often classified as single precision or double precision floating-point number that has more precision than a single-precision number requires more digits to the right of the decimal point Precision: the number of bits used to hold the fractional part The more the precision, the more exactly it can represent fractional quantities A double-precision number uses twice as many bits as a single-precision value, so it can represent fractional quantities much more exactly For example, if a single-precision number requires 32 bits, its double-precision counterpart will be 64 bits long. The extra bits increase not only the precision but also the range of magnitudes that can be represented The exact amount by which the precision and range of magnitudes are increased depends on what format the program is using to represent floating-point values The term double precision is something of a misnomer because the precision is not really double

Double Precision The IEEE double precision floating point standard representation requires a 64 bit word S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 0 1 11 12 63 Sign Exponent Fraction A sign bit of 0 indicates a positive number, and a 1 is negative The exponent field is represented by "excess 1023 notation“ The 52 fraction bits actually represent 53 bits of precision, as a leading 1 in front of the decimal point is implied (hidden bit). In 32 bit IEEE format, 1 bit is allocated as the sign bit, the next 8 bits are allocated as the exponent field, and the last 23 bits are the fractional parts of the normalized number.

The value may be determined as follows: E > 0 and E < 2047 V = (-1)^S*1.F* 2^(E-1023) S=sign bit E = exponent in excess 1023 representation F = fractional part in binary notation "1.F" is intended to represent the binary number created by prefixing F with an implicit leading 1 and a binary point hidden bit = (-1)S (1.F )2E-1023

There are some exceptions

E= 2047: Specials: Not a Number, +Infinity, -Infinity F <> 0, then V=NaN ("Not a number"), Overflow, error..... F=0 and S is 1, then V=-Infinity F =0 and S is 0, then V=Infinity

E= 0: Denormals; F=0 and S is 1, then V=-0 F=0 and S is 0, then V=0 F <>0, then V = denormalized, tiny number, smaller than smallest allowed “denormalized” values V = (-1)^S * 2 ^ (-1022) * (0.F) = (-1)S (0.F )2-1022

The range As exponent field 0 and 2047 is reserved, the range would be restricted to 2 -1022 to 2 1023

Reference: http://www.psc.edu/general/software/packages/ieee/ieee.html http://www2.cs.uh.edu/~johnson2/ieee.html http://www.mimosa.org/documents/IEEE%20Floating%20Point%20Information.html Mathematical and statistical software packages installed on PSC machines Distributed Computing Utilities software packages and libraries installed on PSC machines.