Computer Arithmetic : Floating- point CS 3339 Lecture 5 Apan Qasem Texas State University Spring 2015 *some slides adopted from P&H.

Slides:



Advertisements
Similar presentations
Spring 2013 Advising Starts this week! CS2710 Computer Organization1.
Advertisements

Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
Chapter Three.
Computer Organization CS224 Fall 2012 Lesson 19. Floating-Point Example  What number is represented by the single-precision float …00 
Arithmetic in Computers Chapter 4 Arithmetic in Computers2 Outline Data representation integers Unsigned integers Signed integers Floating-points.
CS/ECE 3330 Computer Architecture Chapter 3 Arithmetic.
CSCE 212 Chapter 3: Arithmetic for Computers Instructor: Jason D. Bakos.
Lecture 16: Computer Arithmetic Today’s topic –Floating point numbers –IEEE 754 representations –FP arithmetic Reminder –HW 4 due Monday 1.
1 Lecture 9: Floating Point Today’s topics:  Division  IEEE 754 representations  FP arithmetic Reminder: assignment 4 will be posted later today.
Chapter 3 Arithmetic for Computers. Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's.
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
1  2004 Morgan Kaufmann Publishers Chapter Three.
ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 11: Floating Point Partially adapted from Computer Organization and Design, 4 th edition,
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
Morgan Kaufmann Publishers
Systems Architecture Lecture 14: Floating Point Arithmetic
Morgan Kaufmann Publishers Arithmetic for Computers
Chapter 3 -Arithmetic for Computers Operations on integers – Addition and subtraction – Multiplication and division – Dealing with overflow Floating-point.
CEN 316 Computer Organization and Design Computer Arithmetic Floating Point Dr. Mansour AL Zuair.
Computer Architecture ALU Design : Division and Floating Point
Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.
Computing Systems Basic arithmetic for computers.
Computer Architecture and Operating Systems CS 3230 :Assembly Section Lecture 10 Department of Computer Science and Software Engineering University of.
COSC 2021: Computer Organization Instructor: Dr. Amir Asif Department of Computer Science York University Handout # 5 Unsigned integer and floating point.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /14/2013 Lecture 16: Floating Point Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE.
1 ECE369 Sections 3.5, 3.6 and ECE369 Number Systems Fixed Point: Binary point of a real number in a certain position –Can treat real numbers as.
Oct. 18, 2007SYSC 2001* - Fall SYSC2001-Ch9.ppt1 See Stallings Chapter 9 Computer Arithmetic.
Computer Organization CS224 Fall 2012 Lesson 21. FP Example: °F to °C  C code: float f2c (float fahr) { return ((5.0/9.0)*(fahr )); } fahr in $f12,
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
Lecture 9: Floating Point
Floating Point Representation for non-integral numbers – Including very small and very large numbers Like scientific notation – –2.34 × –
Lecture 12: Integer Arithmetic and Floating Point CS 2011 Fall 2014, Dr. Rozier.
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
CHAPTER 3 Floating Point.
CDA 3101 Fall 2013 Introduction to Computer Organization
Restoring Unsigned Integer Division
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.
Chapter 3 Arithmetic for Computers CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Lecture 6: MIPS Floating.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 3 Arithmetic for Computers.
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
Floating Point Numbers Opening Discussion zWhat did we talk about last class? zHave you seen anything interesting in the news?
순천향대학교 정보기술공학부 이 상 정 1 3. Arithmetic for Computers.
Morgan Kaufmann Publishers Arithmetic for Computers
CSE 340 Simulation Modeling | MUSHFIQUR ROUF CSE340:
10/7/2004Comp 120 Fall October 7 Read 5.1 through 5.3 Register! Questions? Chapter 4 – Floating Point.
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
Subword Parallellism Graphics and audio applications can take advantage of performing simultaneous operations on short vectors – Example: 128-bit adder:
CSE 340 Computer Architecture Spring 2016 MIPS Arithmetic Review.
Chapter 3 — Arithmetic for Computers — 1 Arithmetic for Computers Operations on integers Addition and subtraction Multiplication and division Dealing with.
Floating Point Representations
Morgan Kaufmann Publishers Arithmetic for Computers
Morgan Kaufmann Publishers Arithmetic for Computers
Morgan Kaufmann Publishers Arithmetic for Computers
Computer Architecture & Operations I
Arithmetic for Computers
CS/COE0447 Computer Organization & Assembly Language
James Gosling Sun Fellow Java Inventor
James Gosling Sun Fellow Java Inventor
PRESENTED BY J.SARAVANAN. Introduction: Objective: To provide hardware support for floating point arithmetic. To understand how to represent floating.
Arithmetic for Computers
Computer Arithmetic Multiplication, Floating Point
October 17 Chapter 4 – Floating Point Read 5.1 through 5.3 1/16/2019
Chapter 3 Arithmetic for Computers
Morgan Kaufmann Publishers Arithmetic for Computers
Chapter 3 Arithmetic for Computers
Presentation transcript:

Computer Arithmetic : Floating- point CS 3339 Lecture 5 Apan Qasem Texas State University Spring 2015 *some slides adopted from P&H

Review Signed binary representation sign-and-magnitude 2’s complement Overflow detection Implementation of binary multiplication and division in the microarchitecture performance issues?

Representing Big (and Small) Numbers No matter what our word size is, there will always be a need to represent bigger numbers MAXINT + 1 MININT - 1 There is also a need to represent really small numbers nanosecond: Also, there are numbers, that are not big or small but still might require a lot of space PI = 3.141… e = … Floating-point values allow us to represent large and small and real numbers Talking about HW, SW issues are different

Scientific Notation Idea of floating-point representation is not new Borrowed from convention used in scientific journals For example, would typically be written as 1.0 x and 602,214,150,000,000,000,000,000 would typically be written as x 10 23

Scientific Notation General format (-1) sign x F x 10 E F = fractional part E = exponent Fractional part usually normalized (e.g., 1.xxxx) For binary, it becomes (-1) sign x F x 2 E

Binary Representation of Floating-point Values Allocate some bits for exponent, some for fraction, and one for sign Still have to fit everything in 32 bits (single precision) What is the trade-off? Range vs. Precision more bits in the fraction (F) gives more accuracy, more bits in the exponent (E) increases size of the number being represented Range vs. Precision more bits in the fraction (F) gives more accuracy, more bits in the exponent (E) increases size of the number being represented SExponentFraction 0 31

Floating-point Standard Defined by IEEE Std Revised in Developed in response to divergence of representations Portability issues for scientific code Now almost universally adopted Enforced at the language level Two representations Single precision (32-bit) Double precision (64-bit)

IEEE Floating-point Format Sign To store 0 for non-negative and 1 for negative value sign-and-magnitude representation To interpret 0 implies non-negative and 1 is negative single: 8 bits double: 11 bits single: 23 bits double: 52 bits SExponentFraction (-1) s x (1 + Fraction x 2 (Exponent - Bias) ) FP Storage FP Interpretation

IEEE Floating-point Format Fraction To store Normalize fraction to a value in the range 1.0 ≤ x < 2.0 If value is 4.56 x 2 10 then normalized value is 1.14 x 2 12 Store only the bits past the decimal point (14) All normalized values have 1 to the left of the decimal so no need to represent it explicitly (hidden bit) Significand is Fraction with the “1.” restored To interpret Extract bits from fraction and add 1 single: 8 bits double: 11 bits single: 23 bits double: 52 bits SExponentFraction (-1) s x (1 + Fraction x 2 (Exponent - Bias) ) FP Storage FP Interpretation

IEEE Floating-point Format Exponent To store add bias to exponent If exponent is 3, store = 130 excess representation, ensures exponent is unsigned single: bias = 127; double: bias = 1023 To interpret Extract exponent bits and subtract bias If exponent bits represent 17 then interpret as 17 – 127 = -110 single: 8 bits double: 11 bits single: 23 bits double: 52 bits SExponentFraction (-1) s x (1 + Fraction x 2 (Exponent - Bias) ) FP Storage FP Interpretation

Single-Precision Range Exponents and are reserved Smallest value smallest stored exponent:  actual exponent = 1 – 127 = –126 Fraction: 000…00  significand = 1.0 ±1.0 × 2 –126 ≈ ±1.2 × 10 –38 Largest value largest stored exponent:  actual exponent = 254 – 127 = +127 Fraction: 111…11  significand ≈ 2.0 ±2.0 × ≈ ± 3.4 ×

Double-Precision Range Exponents 0000…00 and 1111…11 reserved Smallest value Exponent:  actual exponent = 1 – 1023 = –1022 Fraction: 000…00  significand = 1.0 ±1.0 × 2 –1022 ≈ ±2.2 × 10 –308 Largest value Exponent:  actual exponent = 2046 – 1023 = Fraction: 111…11  significand ≈ 2.0 ±2.0 × ≈ ±1.8 ×

IEEE 754 FP Standard : Examples Smallest+: = 1.0 x Zero: = true 0 Largest+: = x x 2 -1 = x 2 4

FP Conversion 0.75 x Convert to fraction with a power of 2 denominator 0.75 x 2 4 = 3/4 x 2 4 = 3/2 2 x Convert numerator to binary 11 two / 2 2 x Determine place for decimal (binary) point 0.11 two x Normalize 1.1 x Adjust for IEEE standard 1 + Fraction  Fraction is 0.1 Bias : = 130  Exponent is 130

IEEE 754 FP Standard : Examples Smallest+: = 1 x Zero: = true 0 Largest+: = x x 2 -1 = x 2 4 =

Floating-point Addition in Base 10 To add F 1 x 10 E1 and F 2 x 10 E2 1.Divide/Multiply one of the numbers to make the exponents the same 2.Factor out the exponent 3.Add the fractions 3.2 x x 10 4 = x x 10 4 = ( ) x 10 4 = x 10 4

Floating-point Addition in Binary Addition (and subtraction) (  F1  2 E1 ) + (  F2  2 E2 ) =  F3  2 E3 Step 0: Restore the hidden bit in F1 and in F2 Step 1: Align fractions by right shifting F2 by (E1 - E2) positions (assuming E1  E2) Step 2: Add the resulting F2 to F1 to form F3 Step 3: Normalize F3 (so it is in the form 1.XXXXX …) If E1 and E2 have the same sign check for overflow If E1 and E2 have different signs check for underflow Step 4: Round F3 and possibly normalize F3 again if F3 has more bits then we have room for Step 5: Re-hide the most significant bit of F3 before storing the result Number too small! 8 x 2 4 = 8/2 x 2 3 = 4 x 2 3 Divide by 2 ≈ right shift

Floating Point Addition Example Addition (assume 4 bits for fraction, 3 for exponent, bias = 4) (0.5 =  2 -1 ) + ( =  2 -2 ) Step 0: Step 1: Step 2: Step 3: Step 4: Step 5: Hidden bits restored in the representation above Right-shift significand with the smaller exponent until its exponent matches the larger exponent (just once in this example)  2 -1 and x 2 -1 Add significands ( ) = – = Normalize the sum, checking for exponent over/underflow x 2 -1 = x 2 -2 =.. = x 2 -4 No need to round, 4 bits in fractional field Re-hide the hidden bit before storing Can’t do 2’s complement math here

Floating Point Multiplication Multiplication (  F1  2 E1 ) x (  F2  2 E2 ) =  F3  2 E3 Step 0: Restore the hidden bit in F1 and in F2 Step 1: Add the two exponents and adjust for bias E1 + E2 – bias = E3 determine the sign of the product Step 2: Multiply F1 by F2 to form a double precision F3 Step 3: Normalize F3 (so it is in the form 1.XXXXX …) 1 bit right shift F3 and increment E3 Check for overflow/underflow Step 4: Round F3 and possibly normalize F3 again Step 5: Re-hide the most significant bit of F3 before storing the result Could we use a basic logic operation to determine sign ? E1 = 5, E2 = 4, bias = 3 E3 = ((5 – 3) + (4 – 3)) + 3 Subtract bias twice, add once

Floating Point Multiplication Example Multiply (assume 32-bit single-precision values) (0.5 =  2 -1 ) x ( =  2 -2 ) Step 0: Step 1: Step 2: Step 3: Step 4: Step 5: Hidden bits restored in the representation above Add the exponents E1 = = 126, E2 = = 125 = ( ) – 127 = 124 Multiply the significands x = Normalize the product, checking for exponent over/underflow x 2 -3 is already normalized No need to round Re-hide the hidden bit before storing

MIPS Floating Point Instructions MIPS has a separate Floating Point Register File ($f0, $f1, …, $f31) with special instructions to load to and store from them lwc1 $f1,16($s2) #$f1 = Memory[$s2+16] swc1 $f1,24($s4) #Memory[$s4+24] = $f1 the registers are used in pairs for double precision values supports IEEE 754 single and double-precision operations add.s $f2,$f4,$f6 #$f2 = $f4 + $f6 add.d $f2,$f4,$f6 #$f2||$f3 = $f4||$f5 + $f6||$f7 Similarly, instructions for sub.s, sub.d, mul.s, mul.d, div.s, div.d

MIPS Floating Point Instructions Floating point single precision comparison operations c.x.s $f2,$f4 #if($f2 < $f4) cond=1; else cond=0 where x may be eq, neq, lt, le, gt, ge Double-precision comparison operations c.x.d $f2, $f4 #$f2||$f3 < $f4||$f5 cond=1; else cond=0 Floating point branch operations bclt 25 #if(cond==1) go to PC+4+25 bclf 25 #if(cond==0)

Frequency of Common MIPS Instructions only included those with >3% and >1% SPECintSPECfp addu 5.2%3.5% addiu 9.0%7.2% or 4.0%1.2% sll 4.4%1.9% lui 3.3%0.5% lw 18.6%5.8% sw 7.6%2.0% lbu 3.7%0.1% beq 8.6%2.2% bne 8.4%1.4% slt 9.9%2.3% slti 3.1%0.3% sltu 3.4%0.8% SPECintSPECfp add.d 0.0%10.6% sub.d 0.0%4.9% mul.d 0.0%15.0% add.s 0.0%1.5% sub.s 0.0%1.8% mul.s 0.0%2.4% l.d 0.0%17.5% s.d 0.0%4.9% l.s 0.0%4.2% s.s 0.0%1.1% lhu 1.3%0.0%

Associativity Associativity matters in floating-point operations What are the implications? Need to be especially careful in parallelizing FP applications Incorrect results, even if no dependence!

x86 FP Architecture Originally based on 8087 FP coprocessor 8 × 80-bit extended-precision registers Used as a push-down stack Registers indexed from TOS: ST(0), ST(1), … FP values are 32-bit or 64 in memory Converted on load/store of memory operand Integer operands can also be converted on load/store Very difficult to generate and optimize code Result: poor FP performance

x86 FP Instructions Optional variations I : integer operand P : pop operand from stack R : reverse operand order But not all combinations allowed Data transferArithmeticCompareTranscendental FILD mem/ST(i) FISTP mem/ST(i) FLDPI FLD1 FLDZ FIADDP mem/ST(i) FISUBRP mem/ST(i) FIMULP mem/ST(i) FIDIVRP mem/ST(i) FSQRT FABS FRNDINT FICOMP FIUCOMP FSTSW AX/mem FPATAN F2XMI FCOS FPTAN FPREM FPSIN FYL2X

Streaming SIMD Extension 2 (SSE2) Adds 4 × 128-bit registers Extended to 8 registers in AMD64/EM64T Can be used for multiple FP operands 2 × 64-bit double precision 4 × 32-bit double precision Instructions operate on them simultaneously Single-Instruction Multiple-Data

Accurate Arithmetic IEEE Standard 754 specifies additional rounding control Extra bits of precision (guard, round, sticky) Choice of rounding modes Allows programmer to fine-tune numerical behavior of a computation Not all FP units implement all options Most programming languages and FP libraries just use defaults Trade-off between hardware complexity, performance, and market requirements

Who Cares About FP Accuracy? Important for scientific code But for everyday consumer use? “My bank balance is off by ¢!” The Intel Pentium FDIV bug The market expects accuracy whether they need it or not Q: Why didn't Intel call the Pentium the 586? A: Because they added 486 and 100 on the Pentium and got

Summary ISAs support arithmetic Signed and unsigned integers Floating-point approximation to reals Bounded range and precision Operations can overflow and underflow