The IEEE Floating Point Standard and execution units for it

Slides:

Advertisements

Similar presentations

Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.

Advertisements

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.

Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

Binary Arithmetic Binary addition Binary subtraction

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3: IT Students.

Floating Point Numbers

Number Systems Standard positional representation of numbers:

Integer Arithmetic Floating Point Representation Floating Point Arithmetic Topics.

Floating Point Numbers

CSE 378 Floating-point1 How to represent real numbers In decimal scientific notation –sign –fraction –base (i.e., 10) to some power Most of the time, usual.

Floating Point Numbers

Computer ArchitectureFall 2008 © August 27, CS 447 – Computer Architecture Lecture 4 Computer Arithmetic (2)

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution.

Simple Data Type Representation and conversion of numbers

Computer Organization and Architecture Computer Arithmetic Chapter 9.

Computer Architecture Lecture 3: Logical circuits, computer arithmetics Piotr Bilski.

Number Systems II Prepared by Dr P Marais (Modified by D Burford)

ECE232: Hardware Organization and Design

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution.

Computer Architecture and Operating Systems CS 3230 :Assembly Section Lecture 10 Department of Computer Science and Software Engineering University of.

CSC 221 Computer Organization and Assembly Language

1 Number Systems Lecture 10 Digital Design and Computer Architecture Harris & Harris Morgan Kaufmann / Elsevier, 2007.

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:

Lecture notes Reading: Section 3.4, 3.5, 3.6 Multiplication

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution.

Computer Architecture Lecture 22 Fasih ur Rehman.

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.

Data Representation: Floating Point for Real Numbers Computer Organization and Assembly Language: Module 11.

Computer Architecture Lecture 11 Arithmetic Ralph Grishman Oct NYU.

1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point Adder Using the IEEE Floating Point Standard for an.

Floating Point Arithmetic – Part I

William Stallings Computer Organization and Architecture 8th Edition

Floating Point Representations

2.4. Floating Point Numbers

Floating Point Representations

CSCI206 - Computer Organization & Programming

Computer Architecture & Operations I

Recitation 4&5 and review 1 & 2 & 3

Integer Division.

Lecture 9: Floating Point

Floating Point Numbers: x 10-18

NxN Crossbar design for Barrel Shifter

Floating Point Number system corresponding to the decimal notation

CS 232: Computer Architecture II

CS/COE0447 Computer Organization & Assembly Language

Arithmetic for Computers

Outline Introduction Floating Point Arithmetic Adder Multiplier.

Luddy Harrison CS433G Spring 2007

CSCE 350 Computer Architecture

Number Representations

CSCI206 - Computer Organization & Programming

The IEEE Floating Point Standard and execution units for it

How to represent real numbers

How to represent real numbers

Computer Arithmetic Multiplication, Floating Point

ECEG-3202 Computer Architecture and Organization

Computer Architecture

IEEE Floating Point Adder

A floating point multiplier behavior model.

IEEE Floating Point Adder Verification

A floating point multiplier behavior model.

A floating point multiplier behavior model.

Number Representations

Chapter 1 Introduction.

Lecture 9: Shift, Mult, Div Fixed & Floating Point

Presentation transcript:

The IEEE Floating Point Standard and execution units for it 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Copyright 2006 - Joanne DeGroat, ECE, OSU Lecture overview The standard Floating Point Basics A floating point adder design 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

The floating point standard Single Precision Value of bits stored in representation is: If e=255 and f /= 0, then v is NaN regardless of s If e=255 and f = 0, then v = (-1)s ¥ If 0 < e < 255, then v = (-1)s 2e-127 (1.f) – normalized number If e = 0 and f /= 0, the v = (-1)s 2-126 (0.f) Denormalized numbers – allow for graceful underflow If e = 0 and f = 0 the v = (-1)s 0 (zero) 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

The floating point standard Double Precision Value of bits in word representation is: If e=2047 and f /= 0, then v is NaN regardless of s If e=2047 and f = 0, then v = (-1)s ¥ If 0 < e < 2047, then v = (-1)s 2e-1023 (1.f) – normalized number If e = 0 and f /= 0, the v = (-1)s 2-1022 (0.f) Denormalized numbers – allow for graceful underflow If e = 0 and f = 0 the v = (-1)s 0 (zero) 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

The floating point standard Notes on single and double precision The leading 1 of the fractional part is not stored for normalized numbers Representation allows for +0 and -0 indicating direction of 0 (allow determination that might matter if rounding was used) Denormalized numbers allow graceful underflow towards 0 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Copyright 2006 - Joanne DeGroat, ECE, OSU Conversion Examples Converting from base 10 to the representation Single precision example Covert 10010 Step 1 – convert to binary - 0110 0100 In a binary representation form of 1.xxx have 0110 0100 = 1.100100 x 26 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Conversion Example Continued 1.1001 x 26 is binary for 100 Thus the exponent is a 6 Biased exponent will be 6+127=133 = 1000 0101 Sign will be a 0 for positive Stored fractional part f will be 1001 Thus we have s e f 0 100 0 010 1 1 00 1000…. 4 2 C 8 0 0 0 0 in hexadecimal $42C8 0000 is representation for 100 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Copyright 2006 - Joanne DeGroat, ECE, OSU Another example Representation for -175 175 = 128 + 32 + 8 + 4 + 2 +1 = 1010 1111 Or 1.0101111 x 27 S = 1 Exponent is 7 +127 = 134 = 1000 0110 Fractional part f = 0101111 Representation 1100 0011 0010 1111 0000 …. Or in Hex $C32F 0000 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Copyright 2006 - Joanne DeGroat, ECE, OSU Converting back Convert $C32F 0000 into decimal Extract components from 1100 0011 0010 1111 S = 1 Exponent = 1000 0110 = 128+4+2 = 134 unbias 134 – 127 =7 f = 0101111 so mantissa is 1.0101111 Adjust by exponent 1010 1111 (move binary pt 7 places) Or 128+32+15 = 175 Sign is negative so -175 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Copyright 2006 - Joanne DeGroat, ECE, OSU Another example Convert $41C8 0000 to decimal 0100 0001 1100 1000 0000 …. S is 0 so positive number Exponent 1000 0011 = 128+3=131-127=4 f = 1001 so mantissa is 1.1001 With 4 binary positions have 11001 as final number or a decimal 25 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Arithmetic with floating point numbers Add op1 $42C8 0000 and op2 $41C8 0000 First divide into component parts Op1 $42C8 0000 =0100 0010 1100 1000 0000 …. S = 0 E = 1000 0101 = 133 – 127 = 6 Mop1 = 1.10010000… Op2 $41C8 0000 =0100 0001 1100 1000 0000 …. E = 1000 0011 = 131 – 127 = 4 Mop2 = 1.10010000… 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Copyright 2006 - Joanne DeGroat, ECE, OSU Now add the mantissas But first align the mantissas Op1 1.1001000…. Op2 1.1001000…. Which is the smaller number and needs to be aligned Exponent difference between op1 and op2 is 2 So shift op2 by 2 binary places or Op2 becomes 0.0110010000… 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Copyright 2006 - Joanne DeGroat, ECE, OSU Add Add op1 mantissa with the aligned op2 mantissa 1.1001000000… 0.0110010000… 1.1111010000 Result exponent is 6 Value is 1111101 or 64+32+16+8+4+1=125 Values added were 100 and 25 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Constructing Result Value Sign 0 Exponent 6 E = 1000 0101 = 133 – 127 = 6 Mantissa of Result 1.1111010000 Fractional Part 1111010000…. Constructed Value 0 100 0010 1 111 1010 0000 0000 0000 0000 $4 2 F A 0 0 0 0 (125) 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Floating point representation of 125 Positive so s is 0 Exponent is 6 + 127 = 133 = 1000 0101 Fractional part from mantissa of 1.111101 or 111101 Constructed value 0 1000 0101 111101 00000000000000000 $42FA 0000 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Multiplication example Multiply op1 $42C8 0000 & op2 $41C8 0000 First divide into component parts Op1 $42C8 0000 =0100 0010 1100 1000 0000 …. S = 0 E = 1000 0101 = 133 – 127 = 6 Mop1 = 1.10010000… Op2 $41C8 0000 =0100 0001 1100 1000 0000 …. E = 1000 0011 = 131 – 127 = 4 Mop2 = 1.10010000… 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Multiplication basics Base 10 example 3x102 * 1.1x102 = 3.3 x 104 Have 2 numbers A x 2ea and B x 2eb Multiply and get result = A*B x 2ea+eb 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Copyright 2006 - Joanne DeGroat, ECE, OSU So here Have sign of both is + so result is + Exponent addition Both exponents are biased as stored If you add stored binary exponents you need to subtract the extra bias or 127 Or using pencil and paper (or powerpoint) can just add the unbiased exponent of one operand to the other biased exponent Here have 133 + 4 = 137 = 1000 1001 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Copyright 2006 - Joanne DeGroat, ECE, OSU The mantissas Do a binary multiplication 1.1001 1 1001 1100 1 11001 and add 100111 0001 Adjusting for binary point have 10.01110001 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Copyright 2006 - Joanne DeGroat, ECE, OSU Final result Exponent is 137 or 10 Mantissa is 10.01110001 Adjusted for exponent 1001 1100 0100 Value is 2048+256+128+64+4 Or 2304+128+68 = 2432 + 68 = 2500 And we were multiplying 100 * 25 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Copyright 2006 - Joanne DeGroat, ECE, OSU More Examples A = 100 $42C8 0000 0100 0010 1100 1000 0000 0000 0000 0000 S = 0 E = 1000 0101= 133 – 127 = 6 F = 1001 0000 --- ManA = 1.100100000 B = 25 $41C8 0000 0100 0001 1100 1000 0000 0000 0000 0000 E = 1000 0011 = 131 – 127 = 4 ManB = 1.100100000 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Copyright 2006 - Joanne DeGroat, ECE, OSU Example Continued For A + B need to align binary pt by 2 places ManA = 1.10010000000 ShfManB = 0.01100100000 Sum is 1.1111010000 with a bin exp of 6 0 100 0010 1 111 1010 0000 0000 ---- $4 2 F A 0 0 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Copyright 2006 - Joanne DeGroat, ECE, OSU Subtraction example B = -25 $C1C8 0000 1100 0001 1100 1000 0000 0000 0000 0000 S = 1 E = 10000011 = 131-127 = 4 F = 1001 0000 ---- ManB = 1.100100000 C = 10 $41C8 0000 0100 0001 0010 0000 0000 0000 0000 0000 S = 0 E = 1000 0010 = 130 -137 = 3 F =0100 0000 ---- ManC = 1.010000000-- 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Subtraction ex concluded For B+C need to subtract aligned mantissa of C from B ManB = 1.1001000000 ManCshftd = 0.1010000000 result 0.111100000 and exp of 4 Normalized mantissa is 1.111 exponent of 3 Result sign =1 Exp = 130 = 1000 0010 Result Man = 1.111000000--- Result 1100 0001 0111 0000 0000 0000 0000 0000 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Copyright 2006 - Joanne DeGroat, ECE, OSU Specification of a FPA Floating Point Add/Subtract Unit Specification Inputs in IEEE 754 Double Precision Must perform both addition and subtraction Must handle the full floating point standard Normalized numbers Not a Numbers – NaNs +/- Infinity Denormalized numbers 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Specifications continued Result will be a IEEE 754 Double Precision representation Unit will correctly handle the invalid operation of adding + ¥ and - ¥ = Nan per the standard Unit latches it inputs into registers from parallel 64-bit data busses. There is a separate signal line that indicates the operation add or subtract 9/25/08 – ECE764 L2a IEEE Floating Point Basics Copyright 2008 - Joanne DeGroat, ECE, OSU

Specifications continued Outputs The correctly represented result Flags that are output are Zero result Overflow to infinity from normalized numbers as inputs NaN result Overshift (result is the larger of the two operands) Denormalized result Inexact (result was rounded) Invalid operation for addition 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

High level block diagram Basic architecture interface Data – 64 bit A,B,& C Busses Control signals – Latch, Add/Sub, Asel, Drive Condition Flags Output – 7 Flag signals Clocks – Phi1 and Phi2 (a 2 phase clocked architecture 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU

Copyright 2006 - Joanne DeGroat, ECE, OSU Start the VHDL The entity interface VHDL code covered in next lecture 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU