Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 9 - Floating.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette.edu ECE 313 - Computer Organization Lecture 9 - Floating Point Fall 2004 Reading: 3.6-3.9 HW Due Friday 10/1: 3.7,3.9, 3.10, 3.14 EXAM 1: Monday 10/4 Why did the Ariane 5 Explode? (image source: java.sun.com ) Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s 18-347 Slides - Fall 1999 CMU other sources as noted

ECE 313 Fall 2004Lecture 9 - Floating Point2 Outline - Floating Point  Motivation and Key Ideas   IEEE 754 Floating Point Format  Range and precision  Floating Point Arithmetic  MIPS Floating Point Instructions  Rounding & Errors  Summary

ECE 313 Fall 2004Lecture 9 - Floating Point3 Floating Point - Motivation  Review: n-bit integer representations  Unsigned: 0 to 2 n -1  Signed Two’s Complement: - 2 n-1 to 2 n-1 -1  Biased (excess-b):-b to 2 n -b  Problem: how do we represent:  Very large numbers9,345,524,282,135,672, 2 354  Very small numbers0.00000000000000005216, 2 -100  Rational numbers2/3  Irrational numberssqrt(2)  Transcendental numberse, π

ECE 313 Fall 2004Lecture 9 - Floating Point4 Fixed Point Representation  Idea: fixed-point numbers with fractions  Decimal point (binary point) marks start of fraction  Decimal: 1.2503 = 1 X 10 0 + 2 X 10 -1 + 5 X 10 -2 + 3 X 10 -4  Binary: 1.0100001 = 1 X 2 0 + 1 X 2 -2 + 1 X 2 -7  Problems  Limited locations for “decimal point” (binary point”)  Won’t work for very small or very larger numbers

ECE 313 Fall 2004Lecture 9 - Floating Point5 Another Approach: Scientific Notation  Represent a number as a combination of  Mantissa (significand): Normalized number AND  Exponent (base 10)  Example:6.02 X 10 23 Significand (mantissa) Radix (base) Exponent

ECE 313 Fall 2004Lecture 9 - Floating Point6 Floating Point  Key idea: adapt scientific notation to binary  Fixed-width binary number for significand  Fixed-width binary number for exponent (base 2)  Idea: represent a number as 1.xxxxxxx two X 2 yyyy Significand (mantissa) Radix (2) Exponent Leading ‘1’ (Implicit)  Important Points:  This is a tradeoff between precision and range  Arithmetic is approximate - error is inevitable!

ECE 313 Fall 2004Lecture 9 - Floating Point7 Outline - Floating Point  Motivation and Key Ideas  IEEE 754 Floating Point Format   Range and precision  Floating Point Arithmetic  MIPS Floating Point Instructions  Rounding & Errors  Summary

ECE 313 Fall 2004Lecture 9 - Floating Point8 IEEE 754 Floating Point  Single precision (C/C++/Java float type) Value N = (-1) S X 1.F X 2 E-127  Double precision (C/C++/Java double type) Value N = (-1) S X 1.F X 2 E-1023 Bias

ECE 313 Fall 2004Lecture 9 - Floating Point9 Floating Point Examples  8.75 ten = 1 X 2 3 + 1 X 2 -1 + 1 X 2 -2 = 1.00011 X 2 3  Single Precision: Significand: 1.00011000…. (note leading 1 is implied) Exponent: 3 + 127 = 130 = 10000010 two  Double Precision: Significand: 1.00011000… Exponent: 3 + 1023 = 1026 = 10000000010 two

ECE 313 Fall 2004Lecture 9 - Floating Point10 Floating Point Examples  -0.375 ten = 1 X 2 -2 + 1 X 2 -3 = 1. 1 X 2 -2  Single Precision: Significand: 1.1000…. Exponent: -2 + 127 = 125 = 01111101 two  Double Precision: Significand: 1.1000… Exponent: -2 + 1023 = 1021 = 01111111101 two

ECE 313 Fall 2004Lecture 9 - Floating Point11 Floating Point Examples  Q: What is the value of the following single- precision word?  Significand = 1 + 2 -1 + 2 -4 + 2 -8 + 2 -10 + 2 -12  Exponent = 8 - 127 = -119  Final Result = (1 + 2 -1 + 2 -4 + 2 -8 + 2 -10 + 2 -12 ) X 2 -119 = 2.36 X 10 -36

ECE 313 Fall 2004Lecture 9 - Floating Point12 Special Values in IEEE Floating Point  0000000 exponent - reserved for  zero value (all bits zero)  “Denormalized numbers” - drop the “1.” Used for “very small” numbers … “gradual underflow” Smallest denormalized number (single precision): 0.00000000000000000000001 X 2 -126 = 2 -149  1111111 exponent  Infinity - 111111 exponent, zero significand  NaN (Not a Number) - 1111111 exponent, nonzero significand

ECE 313 Fall 2004Lecture 9 - Floating Point13 Outline - Floating Point  Motivation and Key Ideas  IEEE 754 Floating Point Format  Range and precision   Floating Point Arithmetic  MIPS Floating Point Instructions  Rounding & Errors  Summary

ECE 313 Fall 2004Lecture 9 - Floating Point14 Floating Point Range and Precision  The tradeoff: range in exchange for uniformity  “Tiny” example: floating point with:  3 exponent bits  2 signficand bits –– –10–50+5+10 ++ DenormalizedNormalizedInfinity –1–0.8–0.6–0.4–0.20+0.2+0.4+0.6+0.8+1 DenormalizedNormalizedInfinity +0–0 Graphic and Example Source: R. Bryant and D. O’Halloran, Computer Systems: A Programmer’s Perspective, © Prentice Hall, 2002 s expS 0 1245

ECE 313 Fall 2004Lecture 9 - Floating Point15 Visualizing Floating Point - “Small” FP Representation  8-bit Floating Point Representation  the sign bit is in the most significant bit.  the next four bits are the exponent, with a bias of 7.  the last three bits are the frac  Same General Form as IEEE Format  normalized, denormalized  representation of 0, NaN, infinity) s expsignificand 0 2367 Example Source: R. Bryant and D. O’Halloran, Computer Systems: A Programmer’s Perspective, © Prentice Hall, 2002

ECE 313 Fall 2004Lecture 9 - Floating Point16 Small FP - Values Related to Exponent ExpexpE2 E 00000-6 1/64(denorms) 10001-61/64 20010-51/32 30011-41/16 40100-31/8 50101-21/4 60110-11/2 70111 01 81000+12 91001+24 101010+38 111011+416 121100+532 131101+664 141110+7128 151111n/a(inf, Nan).

ECE 313 Fall 2004Lecture 9 - Floating Point17 Small FP Example - Dynamic Range s exp frac EValue 0 0000 000-60 0 0000 001-61/8*1/64 = 1/512 0 0000 010-62/8*1/64 = 2/512 … 0 0000 110-66/8*1/64 = 6/512 0 0000 111-67/8*1/64 = 7/512 0 0001000-68/8*1/64 = 8/512 0 0001 001 -69/8*1/64 = 9/512 … 0 0110 110-114/8*1/2 = 14/16 0 0110 111-115/8*1/2 = 15/16 0 0111 000 08/8*1 = 1 0 0111 001 09/8*1 = 9/8 0 0111 010 010/8*1 = 10/8 … 0 1110110 714/8*128 = 224 0 1110 111 715/8*128 = 240 0 1111 000n/ainf closest to zero largest denorm smallest norm closest to 1 below closest to 1 above largest norm Denormalized numbers Normalized numbers

ECE 313 Fall 2004Lecture 9 - Floating Point18 Learning from Tiny & Small FP  Non-uniform spacing of numbers  very small spacing for large negative exponents  very large spacing for large positive exponents  Exact representation: sums of powers of 2  Approximate representation: everything else

ECE 313 Fall 2004Lecture 9 - Floating Point19 Summary: IEEE Floating Point Values Source: book p. 301

ECE 313 Fall 2004Lecture 9 - Floating Point20 IEEE Floating Point - Interesting Numbers Description expfrac Numeric Value Zero00…0000…000.0 Smallest Pos. Denorm.00…0000…012 – {23,52} X 2 – {126,1022}  Single  1.4 X 10 –45  Double  4.9 X 10 –324 Largest Denormalized00…0011…11(1.0 –  ) X 2 – {126,1022}  Single  1.18 X 10 –38  Double  2.2 X 10 –308 Smallest Pos. Normalized00…0100…001.0 X 2 – {126,1022}  Just larger than largest denormalized One01…1100…001.0 Largest Normalized11…1011…11(2.0 –  ) X 2 {127,1023}  Single  3.4 X 10 38  Double  1.8 X 10 308

ECE 313 Fall 2004Lecture 9 - Floating Point21 Outline - Floating Point  Motivation and Key Ideas  IEEE 754 Floating Point Format  Range and precision  Floating Point Arithmetic   MIPS Floating Point Instructions  Rounding & Errors  Summary

ECE 313 Fall 2004Lecture 9 - Floating Point22 Floating Point Addition (Fig. 3.16) 1.Align binary point to number with larger exponent 2.Add significands 3.Normalize result and adjust exponent 4. If overflow/underflow throw exception 5.Round result (go to 3 if normalization needed again) A 1.11 X 2 0 1.11 X 2 0 1.75 +B+ 1.00 X 2 -2 + 0.01 X 2 0 0.25 10.00 X 2 0 (Normalize) 1.00 X 2 1 2.00 Hardware - Fig. 3.17, p. 201

ECE 313 Fall 2004Lecture 9 - Floating Point23 Floating Point Multiplication (Fig. 3.18) 1.Add 2 exponents together to get new exponent (subtract 127 to get proper biased value) 2.Multiply significands 3.Normalize result if necessary (shift right) & adjust exponent 4. If overflow/underflow throw exception 5.Round result (go to 3 if normalization needed again) 6.Set sign of result using sign of X, Y

ECE 313 Fall 2004Lecture 9 - Floating Point24 Outline - Floating Point  Motivation and Key Ideas  IEEE 754 Floating Point Format  Range and precision  Floating Point Arithmetic  MIPS Floating Point Instructions   Rounding & Errors  Summary

ECE 313 Fall 2004Lecture 9 - Floating Point25 MIPS Floating Point Instructions  Organized as a coprocessor  Separate registers $f0-$f31  Separate operations  Separate data transfer (to same memory)  Basic operations  add.s - single add.d - double  sub.s - single sub.d - double  mul.s - single mul.d - double  div.s - single div.d - double

ECE 313 Fall 2004Lecture 9 - Floating Point26 MIPS Floating Point Instructions (cont’d)  Data transfer  lwc1, swcl (l.s, s.s) - load/store float to fp reg  l.d, s.d - load/store double to fp reg pair  Testing / branching  c.lt.s, c.lt.d, c.eq.s, c.eq.d, … compare and set condition bit if true  bclt - branch if condition true  bclf - branch if condition false

ECE 313 Fall 2004Lecture 9 - Floating Point27 Outline - Floating Point  Motivation and Key Ideas  IEEE 754 Floating Point Format  Range and precision  Floating Point Arithmetic  MIPS Floating Point Instructions  Rounding & Errors   Summary

ECE 313 Fall 2004Lecture 9 - Floating Point28 Rounding  Extra bits allow rounding after computation  Guard Digit (may shift into number during normalization)  Round digit - used to round when guard bit shifted during normalization  Sticky bit - used when there are 1’s to the right of the round digit e.g., “0.010000001” (round to nearest even)  IEEE 754 supports four rounding modes  Always round up  Always round down  Truncate  Round to nearest even (most common)

ECE 313 Fall 2004Lecture 9 - Floating Point29 Limitations on Floating-Point Math  Most numbers are approximate  Roundoff error is inevitable  Range (and accuracy) vary depending on exponent  “Normal” math properties not guaranteed:  Inverse (1/r)*r ≠ 1  Associative (A+B) + C ≠ A + (B+C) (A*B) * C ≠ A * (B*C)  Distributive (A+B) * C ≠ A*B + B*C  Scientific calculations require error management take a numerical analysis for more info

ECE 313 Fall 2004Lecture 9 - Floating Point30 IEEE Floating Point - Special Properties  Floating Point 0 same as Integer 0  All bits = 0  Can (Almost) Use Unsigned Integer Comparison  A > B if: A.EXP > B.EXP or A.EXP=B.EXP and A.SIG > B.SIG  But, must first compare sign bits  Must consider -0 == 0  NaNs problematic Will be greater than any other values What should comparison yield? This is equivalent to unsigned comparision!

ECE 313 Fall 2004Lecture 9 - Floating Point31 Addendum - Why Did the Ariane 5 Explode?  In 1996 Ariane 5 Flight 501 exploded after launch.  Estimated cost of accident: $500 million

ECE 313 Fall 2004Lecture 9 - Floating Point32 Addendum - Why Did the Ariane 5 Explode?  The cause was traced to the Inertial reference system (SRI).  Both the main and backup SRI failed.  Both units failed due to an out-of-range conversion  Input: double precision floating point  Output: 16-bit integer for “horizontal bias” (BH)  Careful analysis during software design had indicated that BH would “fit” in 16 bits  So, why didn’t it fit?

ECE 313 Fall 2004Lecture 9 - Floating Point33 Addendum - Why did the Ariane 5 Explode?  Careful analysis during software design had indicated that BH would “fit” in 16 bits  BUT, all analysis had been done for the Ariane 4, the predecessor of Ariane 5 - software was reused  Since Ariane 5 was a larger rocket, the values for BH were higher than anticipated  AND, there was no handler to deal with the exception!  For more information:  http://www.ima.umn.edu/~arnold/disasters/ariane.html  Or, Google “Ariane 5”

ECE 313 Fall 2004Lecture 9 - Floating Point34 Summary - Chapter 3  Important Topics  Signed & Unsigned Numbers (3.2)  Addition and Subtraction (3.3)  Carry Lookahead (B.6)  Constructing an ALU (B.5)  Multiplication and Division (3.4, 4.5)  Floating Point (3.6)  Coming Up:  Performance (Chapter 4)

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 9 - Floating.

Similar presentations

Presentation on theme: "Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 9 - Floating."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 9 - Floating.

Similar presentations

Presentation on theme: "Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 9 - Floating."— Presentation transcript:

Similar presentations

About project

Feedback