Computer Organization CS224 Fall 2012 Lesson 21. FP Example: °F to °C  C code: float f2c (float fahr) { return ((5.0/9.0)*(fahr - 32.0)); } fahr in $f12,

Slides:



Advertisements
Similar presentations
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
Advertisements

Chapter Three.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 3 Arithmetic for Computers.
Computer Organization CS224 Fall 2012 Lesson 19. Floating-Point Example  What number is represented by the single-precision float …00 
Arithmetic in Computers Chapter 4 Arithmetic in Computers2 Outline Data representation integers Unsigned integers Signed integers Floating-points.
CS/ECE 3330 Computer Architecture Chapter 3 Arithmetic.
Chapter 3 Arithmetic for Computers. Exam 1 CSCE
Lecture 16: Computer Arithmetic Today’s topic –Floating point numbers –IEEE 754 representations –FP arithmetic Reminder –HW 4 due Monday 1.
Computer Arithmetic : Floating- point CS 3339 Lecture 5 Apan Qasem Texas State University Spring 2015 *some slides adopted from P&H.
1 Lecture 9: Floating Point Today’s topics:  Division  IEEE 754 representations  FP arithmetic Reminder: assignment 4 will be posted later today.
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 11: Floating Point Partially adapted from Computer Organization and Design, 4 th edition,
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
Morgan Kaufmann Publishers
Systems Architecture Lecture 14: Floating Point Arithmetic
Morgan Kaufmann Publishers Arithmetic for Computers
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Chapter 3 -Arithmetic for Computers Operations on integers – Addition and subtraction – Multiplication and division – Dealing with overflow Floating-point.
Computer Architecture ALU Design : Division and Floating Point
Computing Systems Basic arithmetic for computers.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /14/2013 Lecture 16: Floating Point Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE.
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
CSF 2009 Arithmetic for Computers Chapter 3. Arithmetic for Computers Operations on integers Addition and subtraction Multiplication and division Dealing.
Lecture 9: Floating Point
Floating Point Representation for non-integral numbers – Including very small and very large numbers Like scientific notation – –2.34 × –
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
The x87 FPU Lecture 19 Fri, Mar 26, 2004.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 3 Arithmetic for Computers.
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
Morgan Kaufmann Publishers Arithmetic for Computers
CDA 3101 Fall 2013 Introduction to Computer Organization
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.
Chapter 3 Arithmetic for Computers CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Lecture 6: MIPS Floating.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 3 Arithmetic for Computers.
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
Morgan Kaufmann Publishers Arithmetic for Computers
10/7/2004Comp 120 Fall October 7 Read 5.1 through 5.3 Register! Questions? Chapter 4 – Floating Point.
COMPUTER ORGANIZATION ARITHMETIC YASSER MOHAMMAD.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 3 Arithmetic for Computers.
Subword Parallellism Graphics and audio applications can take advantage of performing simultaneous operations on short vectors – Example: 128-bit adder:
Chapter 3 — Arithmetic for Computers — 1 Arithmetic for Computers Operations on integers Addition and subtraction Multiplication and division Dealing with.
Modified by S. J. Fritz Spring 2009 (1) Based on slides from D. Patterson and www-inst.eecs.berkeley.edu/~cs152/ COM 249 – Computer Organization and Assembly.
Announcements Our Midterm will be on November 23 rd 14:00 Our Midterm will be between November 8 th – 24 th (most probably November 20 th evening) MIPS.
/ Computer Architecture and Design
Morgan Kaufmann Publishers Arithmetic for Computers
Morgan Kaufmann Publishers Arithmetic for Computers
Morgan Kaufmann Publishers Arithmetic for Computers
Computer Architecture & Operations I
Morgan Kaufmann Publishers Arithmetic for Computers
Morgan Kaufmann Publishers Arithmetic for Computers
/ Computer Architecture and Design
Arithmetic for Computers
Floating Point ICS 233 Computer Architecture & Assembly Language
Morgan Kaufmann Publishers Arithmetic for Computers
Morgan Kaufmann Publishers Arithmetic for Computers
Computer Architecture Computer Science & Engineering
The University of Adelaide, School of Computer Science
Morgan Kaufmann Publishers Arithmetic for Computers
Morgan Kaufmann Publishers Arithmetic for Computers
Chapter 3 Arithmetic for Computers
Morgan Kaufmann Publishers Arithmetic for Computers
Morgan Kaufmann Publishers Arithmetic for Computers
Floating Point Faculty of Information Technology University of Petra
Number Representation
CDA 3101 Spring 2016 Introduction to Computer Organization
Presentation transcript:

Computer Organization CS224 Fall 2012 Lesson 21

FP Example: °F to °C  C code: float f2c (float fahr) { return ((5.0/9.0)*(fahr )); } fahr in $f12, result in $f0, literals in global memory space  Compiled MIPS code: f2c: lwc1 $f16, const5($gp) lwc1 $f18, const9($gp) div.s $f16, $f16, $f18 lwc1 $f18, const32($gp) sub.s $f18, $f12, $f18 mul.s $f0, $f16, $f18 jr $ra

FP Example: Array Multiplication  X = X + Y × Z l All 32 × 32 matrices, 64-bit double-precision elements  C code: void mm (double x[][], double y[][], double z[][]) { int i, j, k; for (i = 0; i! = 32; i = i + 1) for (j = 0; j! = 32; j = j + 1) for (k = 0; k! = 32; k = k + 1) x[i][j] = x[i][j] + y[i][k] * z[k][j]; } Addresses of x, y, z in $a0, $a1, $a2, and i, j, k in $s0, $s1, $s2

FP Example: Array Multiplication MIPS code: li $t1, 32 # $t1 = 32 (row size/loop end) li $s0, 0 # i = 0; initialize 1st for loop L1: li $s1, 0 # j = 0; restart 2nd for loop L2: li $s2, 0 # k = 0; restart 3rd for loop sll $t2, $s0, 5 # $t2 = i * 32 (size of row of x) addu $t2, $t2, $s1 # $t2 = i * size(row) + j sll $t2, $t2, 3 # $t2 = byte offset of [i][j] addu $t2, $a0, $t2 # $t2 = byte address of x[i][j] l.d $f4, 0($t2) # $f4 = 8 bytes of x[i][j] L3: sll $t0, $s2, 5 # $t0 = k * 32 (size of row of z) addu $t0, $t0, $s1 # $t0 = k * size(row) + j sll $t0, $t0, 3 # $t0 = byte offset of [k][j] addu $t0, $a2, $t0 # $t0 = byte address of z[k][j] l.d $f16, 0($t0) # $f16 = 8 bytes of z[k][j] …

FP Example: Array Multiplication … sll $t0, $s0, 5 # $t0 = i*32 (size of row of y) addu $t0, $t0, $s2 # $t0 = i*size(row) + k sll $t0, $t0, 3 # $t0 = byte offset of [i][k] addu $t0, $a1, $t0 # $t0 = byte address of y[i][k] l.d $f18, 0($t0) # $f18 = 8 bytes of y[i][k] mul.d $f16, $f18, $f16 # $f16 = y[i][k] * z[k][j] add.d $f4, $f4, $f16 # f4=x[i][j] + y[i][k]*z[k][j] addiu $s2, $s2, 1 # $k k + 1 bne $s2, $t1, L3 # if (k != 32) go to L3 s.d $f4, 0($t2) # x[i][j] = $f4 addiu $s1, $s1, 1 # $j = j + 1 bne $s1, $t1, L2 # if (j != 32) go to L2 addiu $s0, $s0, 1 # $i = i + 1 bne $s0, $t1, L1 # if (i != 32) go to L1

Accurate Arithmetic  IEEE Std 754 specifies additional rounding control l Extra bits of precision (guard, round, sticky) l Choice of rounding modes l Allows programmer to fine-tune numerical behavior of a computation  Not all FP units implement all options l Most programming languages and FP libraries just use defaults  Trade-off between hardware complexity, performance, and market requirements

Support for Accurate Arithmetic  IEEE 754 FP rounding modes Always round up (toward +∞) Always round down (toward -∞) Truncate (toward 0) Round to nearest even (when the Guard || Round || Sticky are 100) – always creates a 0 in the least significant (kept ) bit of F  Rounding (except for truncation) requires the hardware to include extra F bits during calculations Guard bit – used to provide one F bit when shifting left to normalize a result (e.g., when normalizing F after division or subtraction) G Round bit – used to improve rounding accuracy R Sticky bit – used to support Round to nearest even; is set to a 1 whenever a 1 bit shifts (right) through it (e.g., when aligning F during addition/subtraction) S F = 1. xxxxxxxxxxxxxxxxxxxxxxx G R S

Associativity  Parallel programs may interleave operations in unexpected orders §3.6 Parallelism and Computer Arithmetic: Associativity Assumptions of associativity may fail, since FP operations are not associative ! Need to validate parallel programs under varying degrees of parallelism

x86 FP Architecture  Originally based on 8087 FP coprocessor l 8 × 80-bit extended-precision registers l Used as a push-down stack l Registers indexed from TOS: ST(0), ST(1), …  FP values are 32-bit or 64-bit in memory l Converted on load/store of memory operand l Integer operands can also be converted on load/store  Very difficult to generate and optimize code l Result: poor FP performance §3.7 Real Stuff: Floating Point in the x86

x86 FP Instructions  Optional variations I : integer operand P : pop operand from stack R : reverse operand order l But not all combinations allowed Data transferArithmeticCompareTranscendental FILD mem/ST(i) FISTP mem/ST(i) FLDPI FLD1 FLDZ FIADDP mem/ST(i) FISUBRP mem/ST(i) FIMULP mem/ST(i) FIDIVRP mem/ST(i) FSQRT FABS FRNDINT FICOMP FIUCOMP FSTSW AX/mem FPATAN F2XMI FCOS FPTAN FPREM FPSIN FYL2X

Streaming SIMD Extension 2 (SSE2)  Adds 4 × 128-bit registers l Extended to 8 registers in AMD64/EM64T  Can be used for multiple FP operands l 2 × 64-bit double precision l 4 × 32-bit single precision l Instructions operate on them simultaneously -Single-Instruction Multiple-Data

Concluding Remarks  ISAs support arithmetic l Signed and unsigned integers l Floating-point approximation to reals  Bounded range and precision l Operations can overflow and underflow  MIPS ISA l Core instructions: 31 integer + 23 arithmetic l 54 most frequently used –Figure % of SPECINT (p. 282) - 97% of SPECFP ( “ “) l Other instructions: much less frequent: Figure 3.25  See Fig 2.45 and Fig 3.26 §3.9 Concluding Remarks