Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.

Slides:

Advertisements

Similar presentations

Spring 2013 Advising Starts this week! CS2710 Computer Organization1.

Advertisements

Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.

Computer Organization CS224 Fall 2012 Lesson 19. Floating-Point Example  What number is represented by the single-precision float …00 

Fabián E. Bustamante, Spring 2007 Floating point Today IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.

Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

Lecture 16: Computer Arithmetic Today’s topic –Floating point numbers –IEEE 754 representations –FP arithmetic Reminder –HW 4 due Monday 1.

Datorteknik FloatingPoint bild 1 Floating point Number system corresponding to the decimal notation 1,837 * 10 significand exponent a great number of corresponding.

Floating Point Numbers

Faculty of Computer Science © 2006 CMPUT 229 Floating Point Representation Operating with Real Numbers.

CMPE12cCyrus Bazeghi 1 What do floating-point numbers represent? Rational numbers with non-repeating expansions in the given base within the specified.

ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 11: Floating Point Partially adapted from Computer Organization and Design, 4 th edition,

CSE 378 Floating-point1 How to represent real numbers In decimal scientific notation –sign –fraction –base (i.e., 10) to some power Most of the time, usual.

COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.

Computer ArchitectureFall 2008 © August 27, CS 447 – Computer Architecture Lecture 4 Computer Arithmetic (2)

Systems Architecture Lecture 14: Floating Point Arithmetic

Ch. 2 Floating Point Numbers

Information Representation (Level ISA3) Floating point numbers.

Computer Organization and Architecture Computer Arithmetic Chapter 9.

Number Systems II Prepared by Dr P Marais (Modified by D Burford)

Computer Architecture ALU Design : Division and Floating Point

Number Systems So far we have studied the following integer number systems in computer Unsigned numbers Sign/magnitude numbers Two’s complement numbers.

Computing Systems Basic arithmetic for computers.

Computer Architecture

ECE232: Hardware Organization and Design

Floating Point. Agenda  History  Basic Terms  General representation of floating point  Constructing a simple floating point representation  Floating.

S. Rawat I.I.T. Kanpur. Floating-point representation IEEE numbers are stored using a kind of scientific notation. ± mantissa * 2 exponent We can represent.

CPS3340 COMPUTER ARCHITECTURE Fall Semester, /14/2013 Lecture 16: Floating Point Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE.

Oct. 18, 2007SYSC 2001* - Fall SYSC2001-Ch9.ppt1 See Stallings Chapter 9 Computer Arithmetic.

Computer Organization CS224 Fall 2012 Lesson 21. FP Example: °F to °C  C code: float f2c (float fahr) { return ((5.0/9.0)*(fahr )); } fahr in $f12,

Fixed and Floating Point Numbers Lesson 3 Ioan Despi.

Lecture 9: Floating Point

Floating Point Representation for non-integral numbers – Including very small and very large numbers Like scientific notation – –2.34 × –

Lecture 12: Integer Arithmetic and Floating Point CS 2011 Fall 2014, Dr. Rozier.

1 Number Systems Lecture 10 Digital Design and Computer Architecture Harris & Harris Morgan Kaufmann / Elsevier, 2007.

Lecture notes Reading: Section 3.4, 3.5, 3.6 Multiplication

Computer Arithmetic Floating Point. We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large.

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.

Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 3 Arithmetic for Computers.

CS 224 Computer Organization Spring 2011 Arithmetic for Computers With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide contents.

CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic.

Floating Point Representations

Cosc 2150: Computer Organization Chapter 9, Part 3 Floating point numbers.

Floating Point Representations

Morgan Kaufmann Publishers Arithmetic for Computers

Floating Point Representations

CSCI206 - Computer Organization & Programming

Computer Architecture & Operations I

Integer Division.

Morgan Kaufmann Publishers Arithmetic for Computers

Topics IEEE Floating Point Standard Rounding Floating Point Operations

Floating Point Number system corresponding to the decimal notation

CS 232: Computer Architecture II

CS/COE0447 Computer Organization & Assembly Language

CS 367 Floating Point Topics (Ch 2.4) IEEE Floating Point Standard

PRESENTED BY J.SARAVANAN. Introduction: Objective: To provide hardware support for floating point arithmetic. To understand how to represent floating.

Arithmetic for Computers

ECE/CS 552: Floating Point

Number Representations

CSCI206 - Computer Organization & Programming

Morgan Kaufmann Publishers Arithmetic for Computers

Number Representations

Presentation transcript:

Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design

Floating Point  Representation for non-integer numbers  Including very small and very large numbers  Like scientific notation  –2.34 ×  × 10 –4  × 10 9  In binary  ±1.xxxxxxx 2 × 2 yyyy  Types float and double in C normalized not normalized

Floating Point Standard  Defined by IEEE Std  Developed in response to divergence of representations  Portability issues for scientific code  Now almost universally adopted  Two representations  Single precision (32-bit)  Double precision (64-bit)

IEEE Floating-Point Format (Single Precision)  S: sign bit (0  non-negative, 1  negative)  Normalize significand: 1.0 ≤ |significand| < 2.0  Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit)  Significand is Fraction with the “1.” restored  Exponent: excess representation: actual exponent + Bias  Ensures exponent is unsigned  Single: Bias = 127; Double: Bias = 1023 SExponentFraction single: 8 bits double: 11 bits single: 23 bits double: 52 bits

Single-Precision Range  Exponents and reserved  Smallest value l Exponent:  actual exponent = 1 – 127 = –126 l Fraction: 000…00  significand = 1.0 l ±1.0 × 2 –126 ≈ ±1.2 × 10 –38  Largest value l exponent:  actual exponent = 254 – 127 = +127 l Fraction: 111…11  significand ≈ 2.0 l ±2.0 × ≈ ±3.4 ×

IEEE 754 Double Precision  Double precision number represented in 64 bits (-1) S × S × 2 E or (-1) S × (1 + Fraction) × 2 (Exponent-Bias) sign Exponent: bias 1023 binary integer 0 < E < 2047 Significand: magnitude, normalized binary significand with hidden bit (1): 1.F SE F F 32

Double-Precision Range  Exponents 0000…00 and 1111…11 reserved  Smallest value l Exponent:  actual exponent = 1 – 1023 = –1022 l Fraction: 000…00  significand = 1.0 l ±1.0 × 2 –1022 ≈ ±2.2 × 10 –308  Largest value l Exponent:  actual exponent = 2046 – 1023 = l Fraction: 111…11  significand ≈ 2.0 l ±2.0 × ≈ ±1.8 ×

IEEE 754 FP Standard Encoding  Special encodings are used to represent unusual events l ± infinity for division by zero l NAN (not a number) for the results of invalid operations such as 0/0 l True zero is the bit string all zero Single PrecisionDouble PrecisionObject Represented E (8)F (23)E (11)F (52) …00000true zero (0) 0000 nonzero0000…0000nonzero± denormalized number to anything0000…0001 to 1111 …1110 anything± floating point number … ± infinity 1111 nonzero1111 … 1111nonzeronot a number (NaN)

Floating-Point Precision  Relative precision  all fraction bits are significant  Single: approx 2 –23 -Equivalent to 23 × log 10 2 ≈ 23 × 0.3 ≈ 7 decimal digits of precision  Double: approx 2 –52 -Equivalent to 52 × log 10 2 ≈ 52 × 0.3 ≈ 16 decimal digits of precision

Floating-Point Example  What number is represented by the single-precision float: …00  S = 1  Fraction = 01000…00 2  Exponent = = 129  x = (–1) 1 × ( ) × 2 (129 – 127) = (–1) × 1.25 × 2 2 = –5.0

Floating-Point Addition  Consider a 4-digit decimal example  × × 10 –1  1. Align decimal points  Shift number with smaller exponent  × × 10 1  2. Add significands  × × 10 1 = × 10 1  3. Normalize result & check for over/underflow  × 10 2  4. Round and renormalize if necessary  × 10 2

Floating-Point Addition  Now consider a 4-digit binary example  × 2 –1 + – × 2 –2 (0.5 + –0.4375)  1. Align binary points  Compare exponents, shift number with smaller one  × 2 –1 + – × 2 –1  2. Add significands  × 2 –1 + – × 2 – 1 = × 2 –1  3. Normalize result & check for over/underflow  × 2 –4, with no over/underflow  4. Round and renormalize if necessary  × 2 –4 (no change) =

Floating-Point Multiplication  Consider a 4-digit decimal example  × × × 10 –5  1. Add exponents  For biased exponents, subtract bias from sum  New exponent = 10 + –5 = 5  2. Multiply significands  × =  × 10 5  3. Normalize result & check for over/underflow  × 10 6  4. Round and renormalize if necessary  × 10 6  5. Determine sign of result from signs of operands  × 10 6

Floating-Point Multiplication  Now consider a 4-digit binary example  × 2 –1 × – × 2 –2 (0.5 × –0.4375)  1. Add exponents  Unbiased: –1 + –2 = –3  Biased: (– ) + (– ) = – – 127 = –  2. Multiply significands  × =  × 2 –3  3. Normalize result & check for over/underflow  × 2 –3 (no change) with no over/underflow  4. Round and renormalize if necessary  × 2 –3 (no change)  5. Determine sign: if same, +; else, -  – × 2 –3 = –

Accurate Arithmetic  IEEE Std 754 specifies additional rounding control l Extra bits of precision (Guard, Round, Sticky) l Choice of rounding modes l Allows programmer to fine-tune numerical behavior of a computation  Not all FP units implement all options l Most programming languages and FP libraries just use defaults  Trade-off between hardware complexity, performance, and market requirements

Support for Accurate Arithmetic  IEEE 754 FP rounding modes Always round up (toward +∞) Always round down (toward -∞) Truncate (toward 0) Round to nearest even (when the Guard || Round || Sticky are 100) – always creates a 0 in the least significant (kept ) bit of F  Rounding (except for truncation) requires the hardware to include extra F bits during calculations Guard bit – used to provide one F bit when shifting left to normalize a result (e.g., when normalizing F after division or subtraction) G Round bit – used to improve rounding accuracy R Sticky bit – used to support Round to nearest even; is set to a 1 whenever a 1 bit shifts (right) through it (e.g., when aligning F during addition/subtraction) S F = 1. xxxxxxxxxxxxxxxxxxxxxxx G R S

Associativity  Parallel programs may interleave operations in unexpected orders Assumptions of associativity may fail, since FP operations are not associative ! Need to validate parallel programs under varying degrees of parallelism