ECE/CS 552: Floating Point

Slides:

Advertisements

Similar presentations

Multipliers Multipliers Booth’s Multiplier Floating Point Arithmetic.

Advertisements

Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.

Lecture 11 Oct 12 Circuits for floating-point operations addition multiplication division (only sketchy)

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.

Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

Lecture 16: Computer Arithmetic Today’s topic –Floating point numbers –IEEE 754 representations –FP arithmetic Reminder –HW 4 due Monday 1.

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3: IT Students.

Datorteknik FloatingPoint bild 1 Floating point Number system corresponding to the decimal notation 1,837 * 10 significand exponent a great number of corresponding.

Floating Point Numbers

1 Lecture 9: Floating Point Today’s topics:  Division  IEEE 754 representations  FP arithmetic Reminder: assignment 4 will be posted later today.

Chapter 3 Arithmetic for Computers. Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's.

Number Systems Standard positional representation of numbers:

Integer Arithmetic Floating Point Representation Floating Point Arithmetic Topics.

CSE 378 Floating-point1 How to represent real numbers In decimal scientific notation –sign –fraction –base (i.e., 10) to some power Most of the time, usual.

1 Module 2: Floating-Point Representation. 2 Floating Point Numbers ■ Significant x base exponent ■ Example:

ECEN 248 Integer Multiplication, Number Format Adopted from Copyright 2002 David H. Albonesi and the University of Rochester.

Computer ArchitectureFall 2008 © August 27, CS 447 – Computer Architecture Lecture 4 Computer Arithmetic (2)

Computer Organization and Architecture Computer Arithmetic Chapter 9.

Number Systems II Prepared by Dr P Marais (Modified by D Burford)

CEN 316 Computer Organization and Design Computer Arithmetic Floating Point Dr. Mansour AL Zuair.

Computing Systems Basic arithmetic for computers.

ECE232: Hardware Organization and Design

Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large numbers,

9.4 FLOATING-POINT REPRESENTATION

Computer Arithmetic II Instructor: Mozafar Bag-Mohammadi Spring 2006 University of Ilam.

Lecture 9: Floating Point

Computer Arithmetic II Instructor: Mozafar Bag-Mohammadi Ilam University.

CSC 221 Computer Organization and Assembly Language

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:

Computer Arithmetic Floating Point. We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Computer Architecture Lecture 22 Fasih ur Rehman.

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.

Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.

ECE/CS 552: Integer Dividers

CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic.

Cosc 2150: Computer Organization Chapter 9, Part 3 Floating point numbers.

William Stallings Computer Organization and Architecture 8th Edition

Floating Point Representations

2.4. Floating Point Numbers

Computer Architecture & Operations I

Integer Division.

Lecture 9: Floating Point

Topics IEEE Floating Point Standard Rounding Floating Point Operations

Floating Point Number system corresponding to the decimal notation

CS 232: Computer Architecture II

CS/COE0447 Computer Organization & Assembly Language

April 2006 Saeid Nooshabadi

Arithmetic for Computers

Topic 3d Representation of Real Numbers

ECE/CS 552: Arithmetic and Logic

CSCE 350 Computer Architecture

Number Representations

CSCI206 - Computer Organization & Programming

ECE/CS 552: Integer Multipliers

Floating Point Arithmetic

How to represent real numbers

How to represent real numbers

Computer Arithmetic Multiplication, Floating Point

ECEG-3202 Computer Architecture and Organization

October 17 Chapter 4 – Floating Point Read 5.1 through 5.3 1/16/2019

مظفر بگ محمدی دانشگاه ایلام

Floating Point Numbers

Topic 3d Representation of Real Numbers

Number Representations

Floating Point Arithmetic

Lecture 9: Shift, Mult, Div Fixed & Floating Point

Presentation transcript:

ECE/CS 552: Floating Point © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith

Basic Arithmetic and the ALU Now Floating point representation Floating point addition, multiplication These are not crucial for the project

Floating Point Want to represent larger range of numbers Fixed point (integer): -2n-1 … (2n-1 –1) How? Sacrifice precision for range by providing exponent to shift relative weight of each bit position Similar to scientific notation: 3.14159 x 1023 Cannot specify every discrete value in the range, but can span much larger range

Floating Point Still use a fixed number of bits IEEE 754 standard Sign bit S, exponent E, significand F Value: (-1)S x F x 2E IEEE 754 standard S E F Size Exponent Significand Range Single precision 32b 8b 23b 2x10+/-38 Double precision 64b 11b 52b 2x10+/-308

Floating Point Exponent Exponent specified in biased or excess notation Why? To simplify sorting Sign bit is MSB to ease sorting 2’s complement exponent: Large numbers have positive exponent Small numbers have negative exponent Sorting does not follow naturally

Excess or Biased Exponent 2’s Compl Excess-127 -127 1000 0001 0000 0000 -126 1000 0010 0000 0001 … +127 0111 1111 1111 1110 Value: (-1)S x F x 2(E-bias) SP: bias is 127 DP: bias is 1023

Floating Point Normalization S,E,F representation allows more than one representation for a particular value, e.g. 1.0 x 105 = 0.1 x 106 = 10.0 x 104 This makes comparison operations difficult Prefer to have a single representation Hence, normalize by convention: Only one digit to the left of the floating point In binary, that digit must be a 1 Since leading ‘1’ is implicit, no need to store it Hence, obtain one extra bit of precision for free

FP Overflow/Underflow Analogous to integer overflow Result is too big to represent Means exponent is too big FP Underflow Result is too small to represent Means exponent is too small (too negative) Both can raise an exception under IEEE754

IEEE754 Special Cases Single Precision Double Precision Value Exponent Significand nonzero denormalized 1-254 anything 1-2046 fp number 255 2047 infinity NaN (Not a Number)

FP Rounding Rounding is important FP rounding hardware helps Small errors accumulate over billions of ops FP rounding hardware helps Compute extra guard bit beyond 23/52 bits Further, compute additional round bit beyond that Multiply may result in leading 0 bit, normalize shifts guard bit into product, leaving round bit for rounding Finally, keep sticky bit that is set whenever ‘1’ bits are “lost” to the right Differentiates between 0.5 and 0.500000000001

Floating Point Addition Just like grade school First, align decimal points Then, add significands Finally, normalize result Example 9.997 x 102 9.997000 x 102 4.631 x 10-1 0.004631 x 102 Sum 10.001631 x 102 Normalized 1.0001631 x 103

FP Adder

FP Multiplication Sign: Ps = As xor Bs Exponent: PE = AE + BE Due to bias/excess, must subtract bias e = e1 + e2 E = e + 1023 = e1 + e2 + 1023 E = (E1 – 1023) + (E2 – 1023) + 1023 E = E1 + E2 –1023 Significand: PF = AF x BF Standard integer multiply (23b or 52b + g/r/s bits) Use Wallace tree of CSAs to sum partial products

FP Multiplication Compute sign, exponent, significand Normalize Shift left, right by 1 Check for overflow, underflow Round Normalize again (if necessary)

Summary Floating point representation Floating point add Normalization Overflow, underflow Rounding Floating point add Floating point multiply