CHAPTER 3 Floating Point.

Slides:

Advertisements

Similar presentations

Spring 2013 Advising Starts this week! CS2710 Computer Organization1.

Advertisements

Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.

Computer Organization CS224 Fall 2012 Lesson 19. Floating-Point Example  What number is represented by the single-precision float …00 

Lecture 16: Computer Arithmetic Today’s topic –Floating point numbers –IEEE 754 representations –FP arithmetic Reminder –HW 4 due Monday 1.

Floating Point Numbers

1 CS61C L9 Fl. Pt. © UC Regents CS61C - Machine Structures Lecture 9 - Floating Point, Part I September 27, 2000 David Patterson

Integer Multiplication (1/3)

CS61C L11 Floating Point II (1) Garcia, Fall 2005 © UCB Lecturer PSOE, new dad Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C.

1 Lecture 9: Floating Point Today’s topics:  Division  IEEE 754 representations  FP arithmetic Reminder: assignment 4 will be posted later today.

Inst.eecs.berkeley.edu/~cs61c CS61CL : Machine Structures Lecture #6 – Number Representation, IEEE FP Greet class Jeremy Huddleston.

CS 61C L16 : Floating Point II (1) Kusalo, Spring 2005 © UCB Lecturer NSOE Steven Kusalo inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture.

CS 61C L16 : Floating Point II (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C.

CS61C L11 Floating Point I (1) Chae, Summer 2008 © UCB Albert Chae, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #11 – Floating.

CS61C L14 MIPS Instruction Representation II (1) Garcia, Spring 2007 © UCB Google takes on Office!  Google Apps: premium “services” ( , instant messaging,

CS61C L15 Floating Point I (1) Garcia, Fall 2011 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.

CS 61C L15 Floating Point I (1) Krause, Spring 2005 © UCB This day in history… TA Danny Krause inst.eecs.berkeley.edu/~cs61c-td inst.eecs.berkeley.edu/~cs61c.

CS61C L16 Floating Point II (1) Garcia, Fall 2006 © UCB Prof Smoot given Nobel!!  Prof George Smoot joins the ranks of the most distinguished faculty.

CS61C L16 Floating Point II (1)Mowery, Spring 2008 © UCB TA Oddinaire Keaton Mowery inst.eecs.berkeley.edu/~cs61c UC Berkeley.

CS 61C L15 Floating Point I (1) Garcia, Fall 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.

CS61C L11 Floating Point II (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #11.

COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.

CS 61C L10 Floating Point (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #10: Floating Point

CS61C L15 Floating Point I (1) Jacobs, Spring 2008 © UCB TA Ordinaire Dave Jacobs inst.eecs.berkeley.edu/~cs61c CS61C : Machine.

CS 61C L16 : Floating Point II (1) Garcia, Fall 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C.

IT 251 Computer Organization and Architecture Introduction to Floating Point Numbers Chia-Chi Teng.

Computer Organization and Architecture Computer Arithmetic Chapter 9.

Computer Arithmetic Nizamettin AYDIN

Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.

Csci 136 Computer Architecture II – Floating-Point Arithmetic

CEN 316 Computer Organization and Design Computer Arithmetic Floating Point Dr. Mansour AL Zuair.

CS61C L11 Floating Point (1) Pearce, Summer 2010 © UCB inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 11 Floating Point

Computing Systems Basic arithmetic for computers.

Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large numbers,

CPS3340 COMPUTER ARCHITECTURE Fall Semester, /14/2013 Lecture 16: Floating Point Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE.

CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.

Lecture 9: Floating Point

Floating Point Representation for non-integral numbers – Including very small and very large numbers Like scientific notation – –2.34 × –

Computer Arithmetic Floating Point. We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

FLOATING POINT ARITHMETIC P&H Slides ECE 411 – Spring 2005.

CS 61C L3.2.1 Floating Point 1 (1) K. Meinz, Summer 2004 © UCB CS61C : Machine Structures Lecture Floating Point Kurt Meinz inst.eecs.berkeley.edu/~cs61c.

CDA 3101 Fall 2013 Introduction to Computer Organization

Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.

10/7/2004Comp 120 Fall October 7 Read 5.1 through 5.3 Register! Questions? Chapter 4 – Floating Point.

Computer Architecture & Operations I

Integer Division.

Topics IEEE Floating Point Standard Rounding Floating Point Operations

CS/COE0447 Computer Organization & Assembly Language

April 2006 Saeid Nooshabadi

Floating Point Arithmetics

Arithmetic for Computers

inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture Floating Point Arithmetic

Computer Arithmetic Multiplication, Floating Point

Hello to Robb Neuschwander listening from Louisville, KY!

October 17 Chapter 4 – Floating Point Read 5.1 through 5.3 1/16/2019

cnn.com/2004/ALLPOLITICS/10/05/debate.main/

Floating Point Numbers

CDA 3101 Spring 2016 Introduction to Computer Organization

Presentation transcript:

CHAPTER 3 Floating Point

Representation of Fractions (1/2) With base 10, we have a decimal point to separate integer and fraction parts to a number. 20.4005 = 2x101 + 4x10-1 + 5x10-4 xx.yyyy 101 100 10-1 10-2 10-3 10-4

Representation of Fractions (2/2) “Binary Point” like decimal point signifies boundary between integer and fractional parts: xx.yyyy 21 20 2-1 2-2 2-3 2-4 Example 6-bit representation: 10.10102 = 1x21 + 1x2-1 + 1x2-3 = 2.62510 If we assume “fixed binary point”, range of 6-bit representations with this format: 0 to 3.9375 (almost 4)

Fractional Powers of 2 i 2-i 0 1.0 1 0.5 1/2 0.25 1/4 0.125 1/8 0 1.0 1 0.5 1/2 0.25 1/4 0.125 1/8 0.0625 1/16 0.03125 1/32 0.015625 0.0078125 0.00390625 0.001953125 0.0009765625 0.00048828125 0.000244140625 0.0001220703125 0.00006103515625 0.000030517578125

Representation of Fractions with Fixed Pt. What about addition and multiplication? Addition is straightforward: 01.100 1.510 00.100 0.510 10.000 2.010 Multiplication a bit more complex: 01.100 1.510 00.100 0.510 00 000 000 00 0110 0 00000 0000110000 Whole Fraction Where’s the answer, 0.11? (need to remember where point is)

Representation of Fractions So far, in our examples we used a “fixed” binary point what we really want is to “float” the binary point. Why? Floating binary point most effective use of our limited bits (and thus more accuracy in our number representation): example: put 0.1640625 into binary. Represent with 5-bits, choosing where to put the binary point. … 000000.001010100000… Store these bits and keep track of the binary point 2 places to the left of the MSB Any other solution would lose accuracy! With floating point rep., each number carries an exponent field recording the location of its binary point. The binary point is not stored, so very large and very small numbers can be represented.

Scientific Notation (in Decimal) significand exponent 6.0210 x 1023 decimal point radix (base) Normalized form: no leadings 0s (exactly one non-zero digit to left of decimal point) Alternatives to representing 1/1,000,000,000 Normalized: 1.0 x 10-9 Not normalized: 0.1 x 10-8,10.0 x 10-10

Scientific Notation (in Binary) significand exponent 1.02 x 2-1 “binary point” radix (base) Computer arithmetic that supports binary scientific notation is called floating point. Corresponds to C++ data types float, double

Floating Point Representation (1/2) Normalized: s1.xxxx…xxx2*2yyy…y2 Word Sizes (32 or 64 bits) 31 S Exponent 30 23 22 Significand 1 bit 8 bits 23 bits S represents sign binary Exponent yyyy binary significand/fraction -- xxxx Represent numbers as small as 2.0 x 10-38 to as large as 2.0 x 1038

Floating Point Representation (2/2) What if result too large? (> 2.0x1038 , < -2.0x1038 ) Overflow!  Positive exponent larger than represented in 8-bit exponent field What if result too small? (>0 & < 2.0x10-38 , <0 & > - 2.0x10-38 ) Underflow!  Negative exponent larger than represented in 8-bit exponent field What would help reduce chances of overflow and/or underflow? (more bits) overflow underflow overflow 2x10-38 2x1038 1 -1 -2x10-38 -2x1038

Double Precision Representation Next Multiple of Word Size (64 bits) 31 S Exponent 30 20 19 Significand 1 bit 11 bits 20 bits Significand (cont’d) 32 bits Double Precision (vs. Single Precision) C++ variable declared as double Represent numbers almost as small as 2.0 x 10-308 to almost as large as 2.0 x 10308 But primary advantage is greater accuracy (#digits) due to larger significand

IEEE 754 Floating Point Standard (1/2) Single Precision (DP similar): Sign bit: 1 means negative, 0 means positive Significand: To pack more bits, leading 1 implied for normalized numbers 1 + 23 bits single, 1 + 52 bits double always true when normalized: 0 < significand < 1 Note: zero has no leading 1, so reserve exponent value 0 just for number ZERO 31 S Exponent 30 23 22 Significand 1 bit 8 bits 23 bits

IEEE 754 Floating Point Standard (2/2) Biased Exponent Notation: bias is subtracted to get real exponent value Uses bias of 127 for single prec. Subtract 127 from Exponent field to get actual value Uses 1023 bias for double precision Bias = 2(#expbits – 1) – 1 31 S Exponent 30 23 22 Significand 1 bit 8 bits 23 bits Value = (-1)S x (1 + Significand) x 2(Exponent-127)

Example: Converting Binary FP to Decimal 0110 1000 101 0101 0100 0011 0100 0010 Sign: 0 => positive Exponent: 0110 1000two = 104ten Bias adjustment: 104 - 127 = -23 Significand: 1 + 1x2-1+ 0x2-2 + 1x2-3 + 0x2-4 + 1x2-5 +... =1+2-1+2-3 +2-5 +2-7 +2-9 +2-14 +2-15 +2-17 +2-22 = 1.0 + 0.666115 Represents: 1.66611510*2-23 ~ 1.986*10-7 (about 2/10,000,000)

Converting Decimal Fraction to FP (1/5) Simple Case: If denominator of fraction is a power of 2 (2, 4, 8, 16, etc.), then it’s easy. Show MIPS representation of -0.75 -0.75 = -3/4 -112/1002 = -0.11two Normalized to -1.12 x 2-1 (-1)S x (1 + significand) x 2(Exponent-127) (-1)1 x (1 + .100 0000 ... 0000) x 2(-1) 1 0111 1110 100 0000 0000 0000 0000 0000

Converting Decimal to FP (2/5) Given decimal fraction F Algorithm to convert F to binary fraction. for (k=1 to #bitsneeded) { dk = int ( 2 x F); F = fraction (2 x F) } F = 0.845  0.11012 k=1: F = 0.845 2 x F = 1.690 d1 = 1 F = 0.690 k=4: F = 0.760 2 x F = 1.520 d4 = 1 F = 0.520 k=2: F = 0.690 2 x F = 1.380 d2 = 1 F = 0.380 k=3: F = 0.380 2 x F = 0.760 d3 = 0 F = 0.760

Converting Decimal to FP (3/5) The number of binary digits may exceed the #bits in the significand The binary fraction is truncated  loss of accuracy Now, to normalize: Now, determine exponent: Finally, represent the number: 0.11012  1.10102 x 2-1 bias = 127 expon = 127 + -1 = 126 = 011111102 0111 1110 101 0000 0000 0000 0000 0000

Converting Decimal to FP (4/5) Fact: All rational numbers have a repeating pattern when written out in decimal. Fact: This still applies in binary. To finish conversion: Write out binary number with repeating pattern. Cut it off after correct number of bits (different for single v. double precision). Derive Sign, Exponent and Significand fields.

Converting Decimal to FP (5/5) -2.340625 x 101 Denormalize: -23.40625 Convert integer part: 23 = 16 + ( 7 = 4 + ( 3 = 2 + ( 1 ) ) ) = 101112 Convert fractional part: .40625 = .25 + ( .15625 = .125 + ( .03125 ) ) = .011012 Put parts together and normalize: 10111.01101 = 1.011101101 x 24 Convert exponent: 127 + 4 = 100000112 1 1000 0011 011 1011 0100 0000 0000 0000

Understanding the Significand (1/2) Method 1 (Fractions): In decimal: 0.34010  34010/100010  3410/10010 In binary: 0.1102  1102/10002 = 610/810  112/1002 = 310/410 Advantage: less purely numerical, more thought oriented; this method usually helps people understand the meaning of the significand better

Understanding the Significand (2/2) Method 2 (Place Values): Convert from scientific notation In decimal: 1.6732 = (1x100) + (6x10-1) + (7x10-2) + (3x10-3) + (2x10-4) In binary: 1.1001 = (1x20) + (1x2-1) + (0x2-2) + (0x2-3) + (1x2-4) Interpretation of value in each position extends beyond the decimal/binary point Advantage: good for quickly calculating significand value; use this method for translating FP numbers

Peer to Peer Exercise What is the decimal equivalent of: 1 1000 0001 111 0000 0000 0000 0000 0000 S Exponent Significand (-1)S x (1 + Significand) x 2(Exponent-127) (-1)1 x (1 + .111) x 2(129-127) -1 x (1.111) x 2(2) -111.12 -7.510

Peer to Peer Exercise What is the decimal equivalent of: 1 1000 0001 111 0000 0000 0000 0000 0000 S Exponent Significand (-1)S x (1 + Significand) x 2(Exponent-127) (-1)1 x (1 + .111) x 2(129-127) -1 x (1.111) x 2(2) -111.1 -7.5ten 1: -3.5 2: -7 3: -7.5 4: -7 * 2^129 5: -129 * 2^7

Precision and Accuracy Don’t confuse these two terms! Precision is the number of bits in a computer word used to represent a value. Accuracy is a measure of the difference in value between the actual (decimal) value and its computer representation. High precision permits high accuracy but doesn’t guarantee it. It is possible to have high precision but low accuracy. Example: float pi = 3.14; pi will be represented using all 24 bits of the significant (highly precise), but is only an approximation (not accurate). pi is actually some infinitely non-repeating number (transcendental), not 3.14

Representation for 0 Represent 0? Why two zeroes? exponent all zeroes significand all zeroes too What about sign? +0: 0 00000000 00000000000000000000000 -0: 1 00000000 00000000000000000000000 Why two zeroes? Helps in some limit comparisons Ask math majors

Representation for ± ∞ In FP, divide by 0 should produce ± ∞, not overflow. Why? OK to do further computations with ∞ e.g., X/0 > Y may be a valid comparison Ask math majors IEEE 754 represents ± ∞ Most positive exponent reserved for ∞ Significand all zeroes

Special Numbers What have we defined so far? (Single Precision) Exponent Significand Object 0 0 0 0 nonzero ??? 1-254 anything +/- fl. pt. # 255 0 +/- ∞ 255 nonzero ???

Representation for Not a Number What is sqrt(-4.0)or 0/0? If ∞ not an error, these shouldn’t be either. Called Not a Number (NaN) Exponent = 255, Significand nonzero Why is this useful? Symbol for typical invalid operations Hope NaNs help with debugging? Programmers may postpone tests or decisions to a later time They contaminate: op(NaN, X) = NaN

Representation for Denorms (1/2) Problem: There’s a gap between 0 and the smallest representable FP number These numbers can not be normalized doing so overflows the exponent Smallest representable pos num: a = 1.0… 2 * 2-126 = 2-126 Second smallest representable pos num: b = 1.000……1 2 * 2-126 = 2-126 + 2-149 b a + - Gaps!

Representation for Denorms (2/2) Solution: Consider: Exponent = 0, Significand != 0 Denormalized number: no leading 1, implicit exponent = -126. Smallest representable pos num: a = 2-149 = 0.000000...1 x 2-126 + -

Summary of Special Numbers Reserve exponents, significands: Exponent Significand Object 0 0 0 0 nonzero Denorm 1-254 anything +/- fl. pt. # 255 0 +/- ∞ 255 nonzero NaN

Rounding Math on real numbers  we worry about rounding to fit result in the significant field. FP hardware carries 2 extra bits of precision, and rounds for proper value Rounding occurs when… converting double to single precision converting floating point # to an integer Intermediate step if necessary

IEEE Four Rounding Modes Round towards + ∞ ALWAYS round “up”: 2.1  3, -2.1  -2 Round towards - ∞ ALWAYS round “down”: 1.9  1, -1.9  -2 Round towards 0 (i.e., truncate) Just drop the last bits Round to (nearest) even (default) Normal rounding, almost: 2.5  2, 3.5  4 Insures fairness on calculation Half the time we round up, other half down Also called unbiased

Floating Point Addition & Subtraction Algorithm: 1. Shift smaller number right to match exponents 2. Add significands 3. Normalize result, checking for over/underflow 4. Round result significand 5. Normalize result again, checking for over/underflow Subtraction essentially the same

Floating Point Addition & Subtraction Example (4 decimal digit significand): 9.234  10-1 + 7.656  10-2 = 9.234  10-1 + 0.7656  10-1 = 9.9996  10-1 (Extra position to round accurately) = 10.00  10-1 = 1.000  100

Floating Point Multiplication Algorithm: 1. Multiply significands (unsigned integer mult & fix sign) 2. Add exponents (signed biased integer add) 3. Normalize result, checking for over/underflow 4. Round result significand 5. Normalize result again, checking for over/underflow

Floating Point Multiplication Example (4 decimal digit significand): 3.501  10-3  -4.002  106 = -14.016002  103 = -1.4016002  104 = -1.402  104

Floating Point Division Algorithm: 1. Divide significands (unsigned integer div & fix sign) 2. Subtract exponents (signed biased integer sub) 3. Normalize result, checking for over/underflow 4. Round result significand 5. Normalize result again, checking for over/underflow

Floating Point Division Example (4 decimal digit significand): 4.010  10-3 / -4.012  106 = 4.010/-4.012 * 10-9 = -0.999501  10-9 = -9.99501  10-10 = -10.00  10-10 = -1.000  10-9

MIPS Floating Point Architecture (1/4) Separate floating point instructions: Single Precision: add.s, sub.s, mul.s, div.s Double Precision: add.d, sub.d, mul.d, div.d These are far more complicated than their integer counterparts so they can take much longer to execute

MIPS Floating Point Architecture (2/4) Problems: Inefficient to have different instructions take vastly differing amounts of time. Generally, a particular piece of data will not change from float to int, or vice versa, within a program. Only one type of instruction will be used on it. Some programs do no floating point calculations It takes lots of hardware relative to integers to do floating point fast

MIPS Floating Point Architecture (3/4) 1990 Solution: Make a completely separate chip that handles only FP. Coprocessor 1: FP chip contains 32 32-bit registers: $f0, $f1, … most of the registers specified in .s and .d instruction refer to this set separate load and store: lwc1 and swc1 (“load word coprocessor 1”, “store …”) Double Precision: by convention, even/odd pair contain one DP FP number: $f0/$f1, $f2/$f3, … , $f30/$f31 Even register is the name

MIPS Floating Point Architecture (4/4) 1990 Computer actually contains multiple separate chips: Processor: handles all the normal stuff Coprocessor 1: handles FP and only FP; more coprocessors?… Yes, later Today, FP coprocessor integrated with CPU, or cheap chips may leave out FP HW Instructions to move data between main processor and coprocessors: mfc0, mtc0, mfc1, mtc1, etc. Appendix A contains many more FP ops

Things to Remember Floating Point numbers approximate values that we want to use. IEEE 754 Floating Point Standard is most widely accepted standard for FP numbers New MIPS registers($f0-$f31), instructions: – Single Precision (32 bits, 2x10-38… 2x1038): add.s, sub.s, mul.s, div.s –Double Precision (64 bits , 2x10-308…2x10308): add.d, sub.d, mul.d, div.d Type is not associated with data, bits have no meaning unless given in context (instruction)