Floating Point Arithmetic

Slides:



Advertisements
Similar presentations
Set 16 FLOATING POINT ARITHMETIC. TOPICS Binary representation of floating point Numbers Computer representation of floating point numbers Floating point.
Advertisements

1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 2: Data types and addressing modes dr.ir. A.C. Verschueren.
Spring 2013 Advising Starts this week! CS2710 Computer Organization1.
Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.
Fabián E. Bustamante, Spring 2007 Floating point Today IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.
Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.
Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
COMP3221: Microprocessors and Embedded Systems Lecture 14: Floating Point Numbers Lecturer: Hui Wu Session 2, 2004.
Number Systems Standard positional representation of numbers:
Floating Point Numbers
CSE 378 Floating-point1 How to represent real numbers In decimal scientific notation –sign –fraction –base (i.e., 10) to some power Most of the time, usual.
1 Module 2: Floating-Point Representation. 2 Floating Point Numbers ■ Significant x base exponent ■ Example:
ECEN 248 Integer Multiplication, Number Format Adopted from Copyright 2002 David H. Albonesi and the University of Rochester.
Assembly Language for x86 Processors 6th Edition
Computer Organization and Architecture Computer Arithmetic Chapter 9.
Computer Arithmetic Nizamettin AYDIN
Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.
Number Systems II Prepared by Dr P Marais (Modified by D Burford)
Computer Arithmetic.
Fixed-Point Arithmetics: Part II
Computing Systems Basic arithmetic for computers.
ECE232: Hardware Organization and Design
Computer Architecture and Operating Systems CS 3230 :Assembly Section Lecture 10 Department of Computer Science and Software Engineering University of.
Floating Point. Agenda  History  Basic Terms  General representation of floating point  Constructing a simple floating point representation  Floating.
S. Rawat I.I.T. Kanpur. Floating-point representation IEEE numbers are stored using a kind of scientific notation. ± mantissa * 2 exponent We can represent.
Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large numbers,
CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.
Oct. 18, 2007SYSC 2001* - Fall SYSC2001-Ch9.ppt1 See Stallings Chapter 9 Computer Arithmetic.
9.4 FLOATING-POINT REPRESENTATION
CSC 221 Computer Organization and Assembly Language
Dale Roberts Department of Computer and Information Science, School of Science, IUPUI CSCI N305 Information Representation: Floating Point Representation.
FLOATING POINT ARITHMETIC. TOPICS Binary representation of floating point Numbers Computer representation of floating point numbers Floating point instructions.
The x87 FPU Lecture 19 Fri, Mar 26, 2004.
Computer Arithmetic Floating Point. We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large.
Computer Arithmetic See Stallings Chapter 9 Sep 10, 2009
Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CE-321: Computer.
Real Numbers SignExponentMantissa.
10/7/2004Comp 120 Fall October 7 Read 5.1 through 5.3 Register! Questions? Chapter 4 – Floating Point.
o History of Floating Point o Defining Floating Point Arithmetic o Floating Point Representation o Floating Point Format o Floating Point Precisions o.
ECE291 Computer Engineering II Lecture 11 Dr. Zbigniew Kalbarczyk University of Illinois at Urbana- Champaign.
Chapter 9 Computer Arithmetic
William Stallings Computer Organization and Architecture 8th Edition
Floating Point Representations
Integer Division.
Topics IEEE Floating Point Standard Rounding Floating Point Operations
Floating Point Numbers: x 10-18
Floating Point Number system corresponding to the decimal notation
William Stallings Computer Organization and Architecture 7th Edition
Arithmetic for Computers
Number Representations
How to represent real numbers
ECEG-3202 Computer Architecture and Organization
Computer Organization and Assembly Language
Number Representations
Computer Architecture and System Programming Laboratory
Presentation transcript:

Floating Point Arithmetic The goal of floating point representation is represent a large range of numbers Important Terms Given the number -123.154 x 105 Sign = negative Mantissa = 123.154 Exponent = 5

IEEE Binary Floating-Point Representation

Storage of Floating Point Binary Numbers (Short Real or Single Precision Format)   31 30 23 22 1 11111111 11111111111111111111111 Sign Exponent Mantissa Long Real(double precision – 64 bits) – 1 bit for sign, 11 bits for exponent, 52 bits for mantissa

Storage Components The Sign The Mantissa (Significand) The Exponent The sign is positive(a 0 bit) or negative (a 1 bit) The Mantissa (Significand) The bits to the right of decimal point is the mantissa or significand. The numeral to the left of the decimal point is ALWAYS 1 (normalized notation). The Exponent The exponent can be either positive or negative. The exponent is biased by +127.

The Significand (Positional Notation)

The Significand Must be Normalized 1234.567 = 1.234567 x 103 Numbers are normalized by moving the decimal point so that only one digit appears to the left of the decimal point. 1101.101 = 1.101101 exponent = 3 0.00101 = 1.01 exponent = -3 Note that the leading 1 is omitted from storage

IEEE Bit Representation

The Exponent is Biased by +127

Exponent Encoding Exponent encoding is bias 127. To get the encoding, take the exponent and add 127 to it. If exponent is –1, then exponent field = -1 + 127 = 126 = 7Eh If exponent is 10, then exponent field = 10 + 127 = 137 = 89h Smallest allowed exponent is –126, largest allowed exponent is +127. This leaves the encodings 00H, FFH unused for normal numbers. BR 6/00

Floating Point Encoding The number of bits allocated for exponent will determine the maximum, minimum floating point numbers (range) 1.0 x 2 –max (small number) to 1.0 x 2 +max (large number) The number of bits allocated for the significand will determine the precision of the floating point number The sign bit only needs one bit (negative:1, positive: 0) BR 6/00

Convert Floating Point Binary Format to Decimal 1 10000001 01000000000000000000000 What is the number shown? Sign bit = 1, so negative. Exponent field = 81h = 129. Actual exponent = Exponent field – 127 = 129 – 127 = 2. Number is: -1 . (01000...000) x 22 = -1 . (0 x 2-1 + 1 x 2-2 + 0 x 2-3 .. +0) x 4 = -1 . (0 + 0.25 + 0 +..0) x 4 = -1.25 x 4 = -5.0. BR 6/00

Convert FP Decimal to binary encoding What is the number -28 Convert FP Decimal to binary encoding What is the number -28.75 in Single Precision Floating Point? 1. Ignore the sign, convert integer and fractional part to binary representation first: a. 28 = 1Ch = 0001 1100 b. .75 = .5 + .25 = 2-1 + 2-2 = .11 -28.75 in binary is - 00011100.11 (ignore leading zeros) 2. Now NORMALIZE the number to the format 1.mmmm x 2exp Normalize by shifting. Each shift right add one to exponent, each shift left subtract one from exponent: - 11100.11 x 20 = - 1110.011 x 21 = - 111.0011 x 22 = - 1.110011 x 24 BR 6/00

Convert Decimal FP to binary encoding (cont) Normalized number is: - 1.110011 x 24 Sign bit = 1 Significand field = 110011000...000 Exponent field = 4 + 127 = 131 = 83h = 1000 0011 Complete 32-bit number is: 1 10000011 110011000….000 Sign exponent mantissa BR 6/00

Algorithm for converting fractional decimal to Binary An algorithm for converting any fractional decimal number to its binary representation is successive multiplication by two (results in shifting left). Determines bits from MSB to LSB. Multiply fraction by 2. If number >= 1.0, then current bit = 1, else current bit = 0. Take fractional part of number and go to ‘a’. Continue until fractional number is 0 or desired precision is reached. Example: Convert .5625 to binary .5625 x 2 = 1.125 ( >= 1.0, so MSB bit = ‘1’). .125 x 2 = .25 ( < 1.0 so bit = ‘0’) .25 x 2 = .5 (< 1.0 so bit = ‘0’) .5 x 2 = 1.0 ( >= 1.0 bit = 1), finished. .5625 = .1001b BR 6/00

Overflow/Underflow, Double Precision Overflow in floating point means producing a number that is too big or too small (underflow) Depends on Exponent size Min/Max exponents are 2 –126 to 2 +127 is 10 -38 to 10 +38 . To increase the range, need to increase number of bits in exponent field. Double precision numbers are 64 bits - 1 bit sign bit, 11 bits exponent, 52 bits for significand Extra bits in significand gives more precision, not extended range. BR 6/00

Special Numbers Min/Max exponents are 2 –126 to 2 +127 . This corresponds to exponent field values of of 1 to 254. The exponent field values 0 and 255 are reserved for special numbers . Special Numbers are zero, +/- infinity, and NaN (not a number) Zero is represented by ALL FIELDS = 0. +/- Infinity is Exponent field = 255 = FFh, significand = 0. +/- Infinity is produced by anything divided by 0. NaN (Not A Number) is Exponent field = 255 = FFh, significand = nonzero. NaN is produced by invalid operations like zero divided by zero, or infinity – infinity. BR 6/00

Comments on IEEE Format Sign bit is placed in MSB for a reason – a quick test can be used to sort floating point numbers by sign, just test MSB If sign bits are the same, then extracting and comparing the exponent fields can be used to sort Floating point numbers. A larger exponent field means a larger number since the ‘bias’ encoding is used. All microprocessors that support Floating point use the IEEE 754 standard. Only a few supercomputers still use different formats. BR 6/00

Assigning Storage for Large Numbers Dd (define doubleword) – 4-byte storage; Real number stored as a doubleword is called a short real. Dd 12345.678 Dd +1.5E+02 Dd 2.56E+38 ;largest positive exponent Dd 3.3455E-39 ;largest negative exponent Dq (Define quadword) -8-byte storage; long real number (double in C,C++ and Visual) Dq 2.56E+307 ;largest exponent JM 11/02

Floating Point Architecture (8087 Coprocessor) So far we have only dealt with integers The 8087 was the math coprocessor for the original PC. With the 486, the FPU (floating point unit) became part of the CPU chip. We will only look at the instruction set of the original 8087 chip. Handles both integer and floating point calculations. Jm 11/02

Floating Point Registers ST(0) = ST Instruction Pointer ST(1) Operand Pointer ST(2) 32-bit Registers ST(3) ST(4) Control Word ST(5) Status Word ST(6) Tag Word ST(7) 16-bit Registers 80-bit Registers JM 11/02

Floating Point Unit (Coprocessor) Data Registers 8 individually addressable 80-bit registers (ST(0), ST(1), ST(2)…ST(7)) Arranged in stack format ST(0) = ST -> top of stack Control Registers 3 16-bit registers (control, status, tag) 2 32-bit registers (instruction pointer, operand pointer) JM 11/02

Floating Point Data Register Stack

Floating Point Registers ST(0) = ST Instruction Pointer ST(1) Operand Pointer ST(2) 32-bit Registers ST(3) ST(4) Control Word ST(5) Status Word ST(6) Tag Word ST(7) 16-bit Registers 80-bit Registers JM 11/02

Transfer of Data Data must be in memory to be sent to the coprocessor (not in the CPU) The coprocessor loads the number from memory into its register stack, performs an arithmetic operation, stores the result in memory, and signals the CPU that it has finished. JM 11/02

Instruction Formats Begins with the letter F (to distinguish from CPU instructions) 2nd letter B binary coded decimal operand I binary integer operand neither assume real number format. FBLD - load bcd number FILD - load integer number FMUL – real number multiply Can not use CPU registers (such as AX, BX) as operands JM 11/02

Floating Point Operations Add Add source to destination Sub Subtract source from destination Subr Subtract destination from source Mul Multiply source by destination Div Divide destination by source Divr Divide source by destination JM 11/02

Basic Arithmetic Instructions Instruction Form Mnemonic Form Operands (Dest,Source) Example Classical Stack Fop {ST(1), ST} FADD Classical Stack, Extra Pop FopP FSUBP Register ST(n), ST ST, ST(n) FMUL ST(1),ST FDIV ST,ST(3) Register, pop ST(n), ST FADDP ST(2),ST Real Memory {ST}, memReal FDIVR Integer Memory FIop {ST}, memInt FSUBR hours JM 11/02

Instruction Forms Classical stack No explicit operands needed (ST, source; ST(1) destination) FADD ; ST(1)=ST(1) + ST ; pop ST FSUB ;ST(1) = ST(1) – ST; pop ST 100.0 120.0 ST 20.0 ST(1) Before After JM 11/02

Instruction Forms Register Uses coprocessor registers as ordinary operands (one must ST) FADD st, st(1) ;st = st + st(1) FDIVR st, st(3) ;st = st / st(3) FIMUL st(2), st ;st(2) = st(2) * st JM 11/02

Instruction Forms Register Pop Identical to register except st is popped at end FADDP st(1), st ; ST(1)=ST(1)+ST ; pop ST ; ST(0) = ST(1) 200.0 200.0 232.0 ST 32.0 232.0 ST(1) Before Intermediate After JM 11/02

Instruction Forms Real Memory and Integer Memory Have an implied first operand, ST Second operand, explicit, is an integer or real FADD Myreal_op ;st = st + myreal_op FIADD MyInteger_op ;st = st + myinteger_op JM 11/02

Initialize Instruction finit initialize floating point processor Should come first in code Clears registers JM 11/02

Load Instructions fld, fild Fld – load a real memory operand into ST(0) Fild – load an integer memory operand into ST(0) .data op1 dd 6.0 ;floating point value op2 dw 3 ;integer value .code finit fld op1 fld op2 6.0 3.0 ?? 6.0 JM 11/02

Store Instructions fst, fstp fst mem_location (Float store) Store value in ST into memory fstp mem_location (Float store, and pop) Store value in ST(0) into memory and then pop stack JM 11/02

Reverse Polish Notation (operands are keyed in before their operators) Evaluating a postfix expression 6 2 * 5 + When reading an operand from input push it on stack When reading an operator from input pop the two operands located at the top of the stack perform the selected operation on the operands push the result back on the stack. JM 11/02

TITLE FPU Expression Evaluation (Expr.asm) ; Implementation of the following expression: ; (6.0 * 2.0) + (4.5 * 3.2) ; FPU instructions used. ; Last update: 10/8/01 INCLUDE Irvine32.inc ; 32-bit Protected mode program. .data array REAL4 6.0, 2.0, 4.5, 3.2 dotProduct REAL4 ? .code main PROC finit ; initialize FPU fld array ; push 6.0 onto the stack fmul array+4 ; ST(0) = 6.0 * 2.0 fld array+8 ; push 4.5 onto the stack fmul array+12 ; ST(0) = 4.5 * 3.2 fadd ; ST(0) = ST(0) + ST(1) fstp dotProduct ; pop stack into memory operand exit main ENDP END main

Register Stack Example Instruction Register Stack fld op1 ST = 6.0 fld op2 ST = 2.0 ST(1) = 6.0 fmul ST = 12.0 fld op3 ST = 5.0 ST(1) = 12.0 fsub ST = 7.0 JM 11/02

Other Instructions fmul ;st(1) = st(1)* st(0), pop fdiv ;st(1) = st(1)/ st(0), pop fdivr ;st(1) = st(0)/ st(1), pop fsqrt ;st(0) = square root(st(0)) fsin ;st(0) = sine(st(0)); fcos ;st(0) = fcos(st(0)); BR 6/00