Floating Point Arithmetic

Slides:



Advertisements
Similar presentations
Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.
Advertisements

Lecture 11 Oct 12 Circuits for floating-point operations addition multiplication division (only sketchy)
Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.
Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3: IT Students.
Floating Point Representation in Computers Floating Point Numbers - What are they? Floating Point Representation Floating Point Operations Where Things.
Datorteknik FloatingPoint bild 1 Floating point Number system corresponding to the decimal notation 1,837 * 10 significand exponent a great number of corresponding.
Floating Point Numbers
1 IEEE Floating Point Revision Guide for Phase Test Week 5.
CMPE12cCyrus Bazeghi 1 What do floating-point numbers represent? Rational numbers with non-repeating expansions in the given base within the specified.
CMPE12cGabriel Hugh Elkaim 1 What do floating-point numbers represent? Rational numbers with non-repeating expansions in the given base within the specified.
Number Systems Standard positional representation of numbers:
CHAPTER 5: Floating Point Numbers
CSE 378 Floating-point1 How to represent real numbers In decimal scientific notation –sign –fraction –base (i.e., 10) to some power Most of the time, usual.
CPSC 321 Computer Architecture ALU Design – Integer Addition, Multiplication & Division Copyright 2002 David H. Albonesi and the University of Rochester.
ECEN 248 Integer Multiplication, Number Format Adopted from Copyright 2002 David H. Albonesi and the University of Rochester.
Operations on data CHAPTER 4.
Simple Data Type Representation and conversion of numbers
Numbers and number systems
Binary Real Numbers. Introduction Computers must be able to represent real numbers (numbers w/ fractions) Two different ways:  Fixed-point  Floating-point.
Information Representation (Level ISA3) Floating point numbers.
Computer Organization and Architecture Computer Arithmetic Chapter 9.
Computer Arithmetic Nizamettin AYDIN
Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.
1 Lecture 5 Floating Point Numbers ITEC 1000 “Introduction to Information Technology”
NUMBER REPRESENTATION CHAPTER 3 – part 3. ONE’S COMPLEMENT REPRESENTATION CHAPTER 3 – part 3.
Computing Systems Basic arithmetic for computers.
ECE232: Hardware Organization and Design
Data Representation in Computer Systems
S. Rawat I.I.T. Kanpur. Floating-point representation IEEE numbers are stored using a kind of scientific notation. ± mantissa * 2 exponent We can represent.
Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large numbers,
CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.
Oct. 18, 2007SYSC 2001* - Fall SYSC2001-Ch9.ppt1 See Stallings Chapter 9 Computer Arithmetic.
ECEG-3202: Computer Architecture and Organization, Dept of ECE, AAU 1 Floating-Point Arithmetic Operations.
CSC 221 Computer Organization and Assembly Language
Conversion to Larger Number of Bits Ex: Immediate Field (signed 16 bit) to 32 bit Positive numbers have implied 0’s to the left. So, put 16 bit number.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Lecture notes Reading: Section 3.4, 3.5, 3.6 Multiplication
1 Floating Point Operations - Part II. Multiplication Do unsigned multiplication on the mantissas including the hidden bits Do unsigned multiplication.
Integer and Fixed Point P & H: Chapter 3
Computer Architecture Lecture 22 Fasih ur Rehman.
Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.
Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.
CS1Q Computer Systems Lecture 2 Simon Gay. Lecture 2CS1Q Computer Systems - Simon Gay2 Binary Numbers We’ll look at some details of the representation.
Binary Arithmetic.
Module 2.2 Errors 03/08/2011. Sources of errors Data errors Modeling Implementation errors Absolute and relative errors Round off errors Overflow and.
Cosc 2150: Computer Organization Chapter 9, Part 3 Floating point numbers.
1 CE 454 Computer Architecture Lecture 4 Ahmed Ezzat The Digital Logic, Ch-3.1.
William Stallings Computer Organization and Architecture 8th Edition
Introduction to Numerical Analysis I
Floating Point Representations
Chapter 4 Operations on Bits.
Integer Division.
Floating Point Numbers: x 10-18
Floating Point Number system corresponding to the decimal notation
IEEE floating point format
William Stallings Computer Organization and Architecture 7th Edition
CSCE 350 Computer Architecture
How to represent real numbers
How to represent real numbers
ECEG-3202 Computer Architecture and Organization
Lecture 9: Shift, Mult, Div Fixed & Floating Point
Presentation transcript:

Floating Point Arithmetic

Hardware vs. Software Can build the ALU (Arithmetic Logic Unit) to perform Floating Point Arithmetic Faster More expensive Less of an issue as technology improves Can simulate the operations of Floating Point with multiple integer operations Done by the compiler Slower Cheaper Hardware

IEEE Floating Point Layout Single Precision – 32 bits Left bit is a sign bit Next 8 are exponent Next 23 are mantissa Double Presicion – 64 bits Left bit is a sign bit Next 11 are exponent Next 52 are mantissa

Floating Point Addition Performed in several steps Line up the decimal points Now the exponents are the same Add the mantissas Exponent of the result is the same as the exponents of the operands Normalize if necessary Place in proper scientific notation

Equalizing Exponents In math, we can shift the value with the larger exponent left while decreasing the exponent until the exponents are equal But the hardware has no place to shift the value left into. There is the implied decimal point We must shift the value with the smaller exponent right and increase the exponent The right values are lost Insignificant – low order bits – won’t affect the answer much Some hardware has extra bits just for computation, not for answer

Adding Now the bits for the mantissa can be added. Just like adding integers (but with fewer than 32 bits) The exponent of the answer is the same exponent as the operands.

Normalizing In scientific notation, the mantissa of the operands is between 1 and 2. After getting the exponents equal the mantissa is between 0 and 2. So, the result is between 1 and 4 Unless one of the operands is negative, then the result can be between 0 and 4 (in absolute value) We may need to shift the result left to get a 1 bit into the leftmost bit of the answer We may need to shift the result right to get the result in the proper range

Correct Results What happens when we add two values of very different magnitude? We must shift one of the values many places The rightmost bits “fall off” the end The answer will not be “exact”, but very close. When would this happen? What if we are summing many, many values. Sum=Sum + A[I] Sum can get so big compared to A[I] that Sum does not change.

Multiplication Actually a little easier Do unsigned multiplication with the mantissas Add the exponents Normalize the result Set the sign bit of the result

Multiplication Details We have already done unsigned multiplication. To add the exponents we need to look at the notation. The exponents use excess 127 notation fpe1=reale1+127 Result = fpe1+fpe2 = reale1+reale2+127+127 Need to subtract 127 from the result to get appropriate value

Sign The sign of the result depends on the sign of the operands If both operands have the same sign, the result is positive, otherwise the result is negative. S1 S2 R 1 This is the XOR function Of course, must normalize the result May have many more shifts

True Division Do unsigned division on the mantissas Discussed with integers. Subtract the exponents Now need to add 127 to get the correct representation of the value Normalize the result Same as previous methods Set the sign Same as with multiplication

Division by Reciprocal Calculate a/b as a* (1/b) This is useful only if we can compute (1/b) without using division. Use a Newton-Raphson technique (discussed in CSCI 381) Repeat r = r * (2 – r*b) Until r does not change r starts with a first guess at the reciprocal and gets closer with each iteration

Errors Floating point numbers are not exact Do NOT compare floating point numbers for equality. 0.1 * 10 ≠ 1. Instead of using “if (a == b)” when a and b are floating point, use if (abs(a-b) < .0001) or some other reasonable measure of “close enough”

Rounding in Base 2 Round to the nearest. Round towards 0 Ties are such that the least significant bit is 0 Round towards 0 Truncation Round towards positive infinity Round up (careful with negative values) Round towards negative infinity Round down

Overflow and Underflow Overflow for integers is when the result is too big to be held with the number of bits allocated. The same is true for Floating Point. However, this is determined more by the size of the exponent field than the size of the mantissa field. Underflow is when a value becomes so small that it becomes 0. Again, this is related to the exponent field but with negative exponents