Floating Point Numbers

Slides:



Advertisements
Similar presentations
Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.
Advertisements

Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Lecture 16: Computer Arithmetic Today’s topic –Floating point numbers –IEEE 754 representations –FP arithmetic Reminder –HW 4 due Monday 1.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3: IT Students.
Datorteknik FloatingPoint bild 1 Floating point Number system corresponding to the decimal notation 1,837 * 10 significand exponent a great number of corresponding.
1 Lecture 9: Floating Point Today’s topics:  Division  IEEE 754 representations  FP arithmetic Reminder: assignment 4 will be posted later today.
Chapter 3 Arithmetic for Computers. Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's.
CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (2)
Floating Point Numbers. CMPE12cGabriel Hugh Elkaim 2 Floating Point Numbers Registers for real numbers usually contain 32 or 64 bits, allowing 2 32 or.
Computer ArchitectureFall 2007 © September 5, 2007 Karem Sakallah CS 447 – Computer Architecture.
Floating Point Numbers. CMPE12cCyrus Bazeghi 2 Floating Point Numbers Registers for real numbers usually contain 32 or 64 bits, allowing 2 32 or 2 64.
Booth’s Algorithm.
CMPE12cCyrus Bazeghi 1 What do floating-point numbers represent? Rational numbers with non-repeating expansions in the given base within the specified.
CMPE12cGabriel Hugh Elkaim 1 What do floating-point numbers represent? Rational numbers with non-repeating expansions in the given base within the specified.
Signed Numbers.
CSE 378 Floating-point1 How to represent real numbers In decimal scientific notation –sign –fraction –base (i.e., 10) to some power Most of the time, usual.
Floating Point Numbers
Computer ArchitectureFall 2008 © August 27, CS 447 – Computer Architecture Lecture 4 Computer Arithmetic (2)
Simple Data Type Representation and conversion of numbers
Binary Real Numbers. Introduction Computers must be able to represent real numbers (numbers w/ fractions) Two different ways:  Fixed-point  Floating-point.
Computer Organization and Architecture Computer Arithmetic Chapter 9.
Computer Arithmetic Nizamettin AYDIN
Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.
1 Lecture 5 Floating Point Numbers ITEC 1000 “Introduction to Information Technology”
CEN 316 Computer Organization and Design Computer Arithmetic Floating Point Dr. Mansour AL Zuair.
Number Systems So far we have studied the following integer number systems in computer Unsigned numbers Sign/magnitude numbers Two’s complement numbers.
ECE232: Hardware Organization and Design
Floating Point. Agenda  History  Basic Terms  General representation of floating point  Constructing a simple floating point representation  Floating.
S. Rawat I.I.T. Kanpur. Floating-point representation IEEE numbers are stored using a kind of scientific notation. ± mantissa * 2 exponent We can represent.
CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.
Oct. 18, 2007SYSC 2001* - Fall SYSC2001-Ch9.ppt1 See Stallings Chapter 9 Computer Arithmetic.
Fixed and Floating Point Numbers Lesson 3 Ioan Despi.
Lecture 9: Floating Point
CSC 221 Computer Organization and Assembly Language
Floating Point Arithmetic
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Lecture notes Reading: Section 3.4, 3.5, 3.6 Multiplication
Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.
Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CE-321: Computer.
Chapter 9 Computer Arithmetic
William Stallings Computer Organization and Architecture 8th Edition
Floating Point Representations
Integer Division.
Lecture 9: Floating Point
Topics IEEE Floating Point Standard Rounding Floating Point Operations
Floating Point Number system corresponding to the decimal notation
William Stallings Computer Organization and Architecture 7th Edition
Topic 3d Representation of Real Numbers
Floating Point Arithmetic
How to represent real numbers
ECEG-3202 Computer Architecture and Organization
Chapter 8 Computer Arithmetic
Topic 3d Representation of Real Numbers
Presentation transcript:

Floating Point Numbers Chapters 4 and 6 Floating Point Numbers

Which bit patterns for reals selected? Answer: use scientific notation Floating Point Numbers Registers for real numbers usually contain 32 or 64 bits, allowing 232 or 264 numbers to be represented. Which reals to represent? There are an infinite number between 2 adjacent integers. (or two reals!!) Which bit patterns for reals selected? Answer: use scientific notation Consider: A x 10B, where A is one digit Scientific notation in binary Standard: IEEE 754 Floating-Point A B A x 10B 0 any 0 1 .. 9 0 1 .. 9 1 .. 9 1 10 .. 90 1 .. 9 2 100 .. 900 1 .. 9 -1 0.1 .. 0.9 1 .. 9 -2 0.01 .. 0.09

S is one bit representing the sign of the number Floating Point Representation: S E F S is one bit representing the sign of the number E is an 8 bit biased integer representing the exponent F is an unsigned integer The true value represented is: (-1)S x f x 2e e = E – bias f = F/2n + 1 for single precision numbers n=23, bias=127

Just a sign bit for signed magnitude E is the exponent field Floating Point S, E, F all represent fields within a representation. Each is just a bunch of bits. S is the sign bit (-1)S  (-1)0 = +1 and (-1)1 = -1 Just a sign bit for signed magnitude E is the exponent field The E field is a biased-127 representation. True exponent is (E – bias) The base (radix) is always 2 (implied). Some early machines used radix 4 or 16 (IBM) F (or M) is the fractional or mantissa field. It is in a strange form. There are 23 bits for F. A normalized FP number always has a leading 1. No need to store the one, just assume it. This MSB is called the HIDDEN BIT.

Get a binary representation for 64.2 Floating Point: 64.2 into IEEE SP Get a binary representation for 64.2 Binary of left of radix point 64 is: Binary of right of radix .2 x 2 = 0.4 0 .4 x 2 = 0.8 0 .8 x 2 = 1.6 1 .6 x 2 = 1.2 1 Binary for .2: 64.2 is: Normalize binary form Produces: Turn true exponent into bias-127 Put it together: 23-bit F is: S E F is: In hex:

What numbers cannot be represented because of this? Floating Point Since floating point numbers are always stored in normal form, how do we represent 0? 0x0000 0000 and 0x8000 0000 represent 0. What numbers cannot be represented because of this?

0/0 or + + - =NaN (Not a number) NaN ? 11111111 ?????… Floating Point Other special values: + 5 / 0 = + + 0 11111111 00000… (0x7f80 0000) -7/0 = - - 1 11111111 00000… (0xff80 0000) 0/0 or + + - =NaN (Not a number) NaN ? 11111111 ?????… (S is either 0 or 1, E=0xff, and F is anything but all zeroes) Also de-normalized numbers (beyond scope)

What is 0x4228 0000 if in a SP FP notation? Floating Point What is 0x4228 0000 if in a SP FP notation?

Floating Point What is 47.625 as a SP FP?

What do floating-point numbers represent? Rational numbers with non-repeating expansions in the given base within the specified exponent range. They do not represent repeating rational or irrational numbers, or any number too small (0 < |x|<Bemin) – underflow, or too large (|x| > Bemax) – overflow IEEE Double Precision is similar to SP 52-bit M 53 bits of precision with hidden bit 11-bit E, excess 1023, representing –1022:1023 One sign bit Always use DP unless memory/file size is important SP ~ 10-38 … 1038 DP ~ 10-308 … 10308 Be very careful of these ranges in numeric computation

Floating Point operations include ? Floating Point: Chapter 6 Floating Point operations include ? They are complicated because…

Often already normalized Otherwise move one digit 1.0001631 x 103 Floating Point Addition 9.997 x 102 + 4.631 x 10-1 Align decimal points Add Normalize the result Often already normalized Otherwise move one digit 1.0001631 x 103 Round result 1.000 x103 9.997 x 102 + 0.004631 x 102 10.001631 x 102

Shifting F left by 1 bit, decreasing e by 1 Floating Point Addition .25 = 0 01111101 00000000000000000000000 100 = 0 10000101 10010000000000000000000 Or with hidden bit .25 = 0 01111101 1 00000000000000000000000 100 = 0 10000101 1 10010000000000000000000 Align radix points Shifting F left by 1 bit, decreasing e by 1 Shifting F right by 1 bit, increasing e by 1 Shift mantissa right so least significant bits fall off Which should we shift?

Shift the .25 to increase its exponent Floating Point Addition Shift the .25 to increase its exponent 01111101 – 1111111 (127) = 10000101 – 1111111 (127) = Shift smaller by: Bias cancels with subtraction, so 10000101 - 01111101 00001000 Carefully shifting, 0 01111101 1 00000000000000000000000 (original value) 0 01111110 0 10000000000000000000000 (shifted by 1) 0 01111111 0 01000000000000000000000 (shifted by 2) 0 10000000 0 00100000000000000000000 (shifted by 3) 0 10000001 0 00010000000000000000000 (shifted by 4) 0 10000010 0 00001000000000000000000 (shifted by 5) 0 10000011 0 00000100000000000000000 (shifted by 6) 0 10000100 0 00000010000000000000000 (shifted by 7) 0 10000101 0 00000001000000000000000 (shifted by 8)

Get a ‘1’ back in hidden bit Already normalized 4. Round result Floating Point Addition 2. Add with hidden bit 0 10000101 1 10010000000000000000000 (100) + 0 10000101 0 00000001000000000000000 (.25) 0 10000101 1 10010001000000000000000 3. Normalize the result Get a ‘1’ back in hidden bit Already normalized 4. Round result Renormalize if needed 0 10000101 10010001000000000000000 Post-normalization example: s exp h frtn 0 011 1 1100 + 0 011 1 1011 0 011 11 0111 0 100 1 1011 1 discarded

Mantissa’s are sign-magnitude Watch out when the numbers are close Floating Point Subtraction Mantissa’s are sign-magnitude Watch out when the numbers are close 1.23456 x 102 - 1.23455 x 102 A many-digit normalization is possible This is why FP addition is in many ways more difficult than FP multiplication

Perform sign-magnitude operand swap Floating Point Subtraction Align radix points Perform sign-magnitude operand swap Compare magnitudes (with hidden bit) Change sign bit if order of operands is changed. Subtract Normalize Round Renormalize if needed s exp h frtn 0 011 1 1011 smaller - 0 011 1 1101 bigger switch and make difference negative 0 011 1 1101 bigger - 0 011 1 1011 smaller 1 011 0 0010 1 000 1 0000 switch sign

Multiplication Example 3.0 x 101 x 5.0 x 102 Multiply mantissas 3.0 Floating Point Multiplication Multiplication Example 3.0 x 101 x 5.0 x 102 Multiply mantissas 3.0 x 5.0 15.00 Add exponents 1 + 2 = 3 15.00 x 103 Normalize if needed 1.50 x 104

Multiplication in binary (4-bit F) Floating Point Multiplication Multiplication in binary (4-bit F) 0 10000100 0100 x 1 00111100 1100 Multiply mantissas 1.0100 x 1.1100 00000 10100 1000110000 10.00110000

Add exponents, subtract extra bias. Biased: 10000100 + 00111100 Floating Point Multiplication Add exponents, subtract extra bias. Biased: 10000100 + 00111100 Check by converting to true exponents and adding Renormalize, correcting exponent 1 01000001 10.00110000 Becomes 1 01000010 1.000110000 Drop the hidden bit 1 01000010 000110000

Unsigned, full-precision division on mantissas Floating Point Division Division True division Unsigned, full-precision division on mantissas This is much more costly (e.g. 4x) than mult. Subtract exponents Faster division Newton’s method to find reciprocal Multiply dividend by reciprocal of divisor May not yield exact result without some work Similar speed as multiplication