CMPUT 229 - Computer Organization and Architecture I1 CMPUT229 - Fall 2003 Topic7: Floating Point José Nelson Amaral.

Slides:



Advertisements
Similar presentations
Spring 2013 Advising Starts this week! CS2710 Computer Organization1.
Advertisements

Chapter Three.
Computer Organization CS224 Fall 2012 Lesson 19. Floating-Point Example  What number is represented by the single-precision float …00 
Arithmetic in Computers Chapter 4 Arithmetic in Computers2 Outline Data representation integers Unsigned integers Signed integers Floating-points.
Lecture 16: Computer Arithmetic Today’s topic –Floating point numbers –IEEE 754 representations –FP arithmetic Reminder –HW 4 due Monday 1.
Faculty of Computer Science © 2006 CMPUT 229 Floating Point Representation Operating with Real Numbers.
1 Lecture 9: Floating Point Today’s topics:  Division  IEEE 754 representations  FP arithmetic Reminder: assignment 4 will be posted later today.
CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (2)
CMPUT Computer Organization and Architecture I1 CMPUT229 - Fall 2003 TopicA: Flow Analysis José Nelson Amaral.
Floating Point Numbers
ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 11: Floating Point Partially adapted from Computer Organization and Design, 4 th edition,
Floating Point Numbers
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
Computer ArchitectureFall 2008 © August 27, CS 447 – Computer Architecture Lecture 4 Computer Arithmetic (2)
Systems Architecture Lecture 14: Floating Point Arithmetic
Information Representation (Level ISA3) Floating point numbers.
Computer Organization and Architecture Computer Arithmetic Chapter 9.
Computer Arithmetic Nizamettin AYDIN
Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.
CEN 316 Computer Organization and Design Computer Arithmetic Floating Point Dr. Mansour AL Zuair.
Fixed-Point Arithmetics: Part II
ECE232: Hardware Organization and Design
Computer Architecture and Operating Systems CS 3230 :Assembly Section Lecture 10 Department of Computer Science and Software Engineering University of.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /14/2013 Lecture 16: Floating Point Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE.
CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.
Lecture 9: Floating Point
Floating Point Representation for non-integral numbers – Including very small and very large numbers Like scientific notation – –2.34 × –
Floating-Point Representation We can store integers and characters easily in binary, but what about fractions? ¼ =.25 = 2.5 * *
CDA 3101 Fall 2013 Introduction to Computer Organization
1 Lecture 10: Floating Point, Digital Design Today’s topics:  FP arithmetic  Intro to Boolean functions.
Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.
Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.
CMPUT Computer Organization and Architecture I1 CMPUT229 - Fall 2003 Topic5: Linking José Nelson Amaral.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Lecture 6: MIPS Floating.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 3 Arithmetic for Computers.
CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic.
CMPUT Computer Organization and Architecture I1 CMPUT229 - Fall 2003 Topic6: Logic, Multiply and Divide Operations José Nelson Amaral.
10/7/2004Comp 120 Fall October 7 Read 5.1 through 5.3 Register! Questions? Chapter 4 – Floating Point.
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
CMPUT Computer Organization and Architecture I1 CMPUT229 - Fall 2003 Topic4: Procedures José Nelson Amaral.
William Stallings Computer Organization and Architecture 8th Edition
Floating Point Representations
/ Computer Architecture and Design
Morgan Kaufmann Publishers Arithmetic for Computers
Computer Architecture & Operations I
Floating Point Representations
Computer Architecture & Operations I
Morgan Kaufmann Publishers Arithmetic for Computers
PRESENTED BY J.SARAVANAN. Introduction: Objective: To provide hardware support for floating point arithmetic. To understand how to represent floating.
William Stallings Computer Organization and Architecture 7th Edition
Arithmetic for Computers
Lecture 10: Floating Point, Digital Design
Computer Arithmetic Multiplication, Floating Point
ECEG-3202 Computer Architecture and Organization
Review.
Morgan Kaufmann Publishers Arithmetic for Computers
Review In last lecture, done with unsigned and signed number representation. Introduced how to represent real numbers in float format.
Floating Point Faculty of Information Technology University of Petra
Computer Organization and Assembly Language
Presentation transcript:

CMPUT Computer Organization and Architecture I1 CMPUT229 - Fall 2003 Topic7: Floating Point José Nelson Amaral

CMPUT Computer Organization and Architecture I2 Reading Assignment

CMPUT Computer Organization and Architecture I3 Representing Large and Small Numbers How would you represent a number such as  in binary? The range (10 23 ) of this number is greater than the range of the 32-bits representation that we have used for integers (2 31  2.14  ). However the precision (6023) of this number is quite small, and can be expressed in a small number of bits. From: Patt and Patel, pp. 32 The solution is to use a floating point representation. A floating point representation allocates some bits for the range of the value, some bits for precision, and one bit for the sign.

CMPUT Computer Organization and Architecture I4 Floating Point Representation Most standard floating point representation use: 1 bit for the sign (positive or negative) 8 bits for the range (exponent field) 23 bits for the precision (fraction field) Sexponentfraction 2381 From: Patt and Patel, pp. 33

CMPUT Computer Organization and Architecture I5 Floating Point Representation (example) Sexponentfraction 2381 Thus the exponent is given by: From: Patt and Patel, pp. 34

CMPUT Computer Organization and Architecture I6 Floating Point Representation (example) Sexponentfraction What is the decimal value of the following floating point number? exponent exponent = =(128-8)+3=120+3=123 From: Patt and Patel, pp. 34

CMPUT Computer Organization and Architecture I7 Floating Point Representation (example) Sexponentfraction What is the decimal value of the following floating point number? exponent exponent = =131 From: Patt and Patel, pp. 35

CMPUT Computer Organization and Architecture I8 Floating Point Representation (example) Sexponentfraction What is the decimal value of the following floating point number? exponent exponent =128+2=130 From: Patt and Patel, pp. 35

Floating Point Sexponentfraction 2381 What is the largest number that can be represented in 32 bits floating point using the IEEE 754 format above? exponent exponent =254 From: Patt and Patel, pp. 35

CMPUT Computer Organization and Architecture I10 Floating Point Sexponentfraction 2381 What is the largest number that can be represented in 32 bits floating point using the IEEE 754 format above? exponent actual exponent = = 127 From: Patt and Patel, pp. 35

CMPUT Computer Organization and Architecture I11 Floating Point Sexponentfraction 2381 What is the smallest number (closest to zero) that can be represented in 32 bits floating point using the IEEE 754 format above? exponent actual exponent =0-126 = -126 From: Patt and Patel, pp. 35

CMPUT Computer Organization and Architecture I12 Special Floating Point Representations In the 8-bit field of the exponent we can represent numbers from 0 to 255. We studied how to read numbers with exponents from 0 to 254. What is the value represented when the exponent is 255 (i.e )? An exponent equal 255 = in a floating point representation indicates a special value. When the exponent is equal 255 = and the fraction is 0, the value represented is  infinity. When the exponent is equal 255 = and the fraction is non-zero, the value represented is Not a Number (NaN). Hen/Patt, pp. 301

CMPUT Computer Organization and Architecture I13 Double Precision 32-bit floating point representation is usually called single precision representation. A double precision floating point representation requires 64 bits. In double precision the following number of bits are used: 1 sign bit 11 bits for exponent 52 bits for fraction (also called significand)

CMPUT Computer Organization and Architecture I14 Floating Point Addition (Decimal) How do we perform the following addition?   Step 1: Align decimal point of the number with smaller exponent (notice lost of precision)   10 1 Step 2: Add significands:   10 1 =  10 1 Step 3: Renormalize the result:  10 1 =  10 2 Step 3: Round-off the result to the representation available:  10 2 =  10 2 Hen/Patt, pp. 281

CMPUT Computer Organization and Architecture I15 Floating Point Addition (Example) Convert the numbers and to floating point binary representation, and then perform the binary floating point addition of these numbers. Which number should have its significand adjusted? Hen/Patt, pp. 283

CMPUT Computer Organization and Architecture I16 Floating Point Multiplication (Decimal) Assume that we only can store four digits of the significand and two digits of the exponent in a decimal floating point representation. How would you multiply  by  in this representation? Step 1: Add the exponents: new exponent = = 5 Step 2: Multiply the significands:  Step 3: Normalize the product:  10 5 =  10 6 Step 4: Round-off the product:  10 6 =  10 6 Hen/Patt, pp. 286

CMPUT Computer Organization and Architecture I17 MIPS Coprocessors COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED Hen/Patt, pp. A-50

CMPUT Computer Organization and Architecture I18 Floating Point in MIPS MIPS Supports the IEEE 754 single-precision and double-precision formats. MIPS has a separate set of registers to store floating point operands: $f0, $f1, $f2,... In single precision, each individual register $f0, $f1, $f2, … contains one single precision (32-bit) value. In double precision, each pair of registers $f0-$f1, $f2-$f3, … contains one double precision (64-bit) value. Hen/Patt, pp. 288

CMPUT Computer Organization and Architecture I19 Floating Point in MIPS In order to load a value in a floating point register, MIPS offers the load word coprocessor, lwcz, instructions. Because the floating point coprocessor is the coprocessor number 1, the instruction is lwc1. Similarly to store the value of a floating point register into memory, MIPS offers the store word coprocessor, swc1. Hen/Patt, pp. 288

CMPUT Computer Organization and Architecture I20 Floating Point Instruction in MIPS What does the following assembly code do? lwc1$f4, 4($sp) lwc1$f6, 8($sp) add.s$f2, $f4, $f6 swc1$f2,12($sp) Hen/Patt, pp. 288 Reads two floating point values from the stack, performs their addition and stores the result in the stack.

CMPUT Computer Organization and Architecture I21 Floating Point (example) void mm ( double x[ ][ ], double y[ ][ ], double z[ ][ ]) { int i, j, k; for( i=0 ; i != 32 ; i=i+1 ) for( j=0 ; j != 32 ; j=j+1 ) { x[i][j] = 0.0; for( k=0 ; k != 32 ; k=k+1 ) x[i][j] = x[i][j] + y[i][k] * z[k][j]; } Parameter Passing Convention base of x[ ]  $a0 base of y[ ]  $a1 base of z[ ]  $a2 Assumption i  $s0 j  $s1 k  $s2 Hen/Patt, pp. 294

void mm ( double x[ ][ ], double y[ ][ ], double z[ ][ ]) { int i, j, k; for( i=0 ; i != 32 ; i=i+1 ) for( j=0 ; j != 32 ; j=j+1 ) { x[i][j] = 0; for( k=0 ; k != 32 ; k=k+1 ) x[i][j] = x[i][j] + y[i][k] * z[k][j] } i0i0 i  32 j0j0 j  32 x[i][j]  0.0 k  0 k  32 load x[i][j] load y[i][k] load z[k][j] d1  y[i][k]*z[k][j] d1 = d1 + x[i][j] x[i][j]  d1 k  k+1 j  j+1 i  i+1 return Do we need to load and store x[i][j] in every iteration of loop k?

i0i0 i  32 j0j0 j  32 d2  0.0 k  0 k  32 load y[i][k] load z[k][j] d1  y[i][k]*z[k][j] d2  d2+ d1 k  k+1 x[i][j]  d2 j  j+1 i  i+1 Parameter Passing Convention base of x[ ]  $a0 base of y[ ]  $a1 base of z[ ]  $a2 Assumption i  $s0 j  $s1 k  $s2 return void mm ( double x[ ][ ], double y[ ][ ], double z[ ][ ]) { int i, j, k; for( i=0 ; i != 32 ; i=i+1 ) for( j=0 ; j != 32 ; j=j+1 ) { x[i][j] = 0; for( k=0 ; k != 32 ; k=k+1 ) x[i][j] = x[i][j] + y[i][k] * z[k][j] }

i0i0 i  32 j0j0 j  32 d2  0.0 k  0 k  32 load y[i][k] load z[k][j] d1  y[i][k]*z[k][j] d2  d2+ d1 k  k+1 x[i][j]  d2 j  j+1 i  i+1 Parameter Passing Convention base of x[ ]  $a0 base of y[ ]  $a1 base of z[ ]  $a2 Assumption i  $s0 j  $s1 k  $s2 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0# i  0 L1:beq$s0, $t1, D1 li$s1, 0# j  0 L2:beq$s1, $t1, D2 $f4  0.0 li$s2, 0# k  0 L3:beq$s2,$t1, D3 addiu$s2, $s2, 1# k  k+1 jL3 D3:x[i][j]  $f4 addiu$s1, $s1, 1# j  j+1 jL2 D2:addiu$s0, $s0, 1# i  i+1 jL1 D1: return

i0i0 k  k+1 return i  32 j  32 k  32 load y[i][k] load z[k][j] d1  y[i][k]*z[k][j] d2  d2+ d1 j0j0 d2  0.0 k  0 x[i][j]  d2 j  j+1 i  i+1 void mm ( double x[ ][ ], double y[ ][ ], double z[ ][ ]) { int i, j, k; for( i=0 ; i != 32 ; i=i+1 ) for( j=0 ; j != 32 ; j=j+1 ) { x[i][j] = 0; for( k=0 ; k != 32 ; k=k+1 ) x[i][j] = x[i][j] + y[i][k] * z[k][j] }

i0i0 k  k+1 Parameter Passing Convention base of x[ ][ ]  $a0 base of y[ ][ ]  $a1 base of z[ ][ ]  $a2 Assumption i  $s0 j  $s1 k  $s2 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:$f4  0.0 li$s2, 0 # k  0 L3: addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 x[i][j]  $f4 addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1 return i  32 j  32 k  32 load y[i][k] load z[k][j] d1  y[i][k]*z[k][j] d2  d2+ d1 j0j0 d2  0.0 k  0 x[i][j]  d2 j  j+1 i  i+1

CMPUT Computer Organization and Architecture I27 The loop body Parameter Passing Convention base of x[ ][ ]  $a0 base of y[ ][ ]  $a1 base of z[ ][ ]  $a2 Assumption i  $s0 j  $s1 k  $s2 load y[i][k] load z[k][j] d1  y[i][k]*z[k][j] d2  d2+ d1 How do we load the y[i][k] into a floating point register? First we have to consider how a 2-dimensional matrix of doubles is stored in memory y[0][0]y[0][1]y[0][2]y[0][31] y[1][0]y[1][1]y[1][2]y[1][31] y[31][0]y[31][1]y[31][2]y[31][31] Base of y[ ][ ] Base of y[ ][ ]+8 Base of y[ ][ ]+8  32 In general, the address of y[i][k] is given by: add(y[i][k])= base of y[ ][ ] + ( i  32 + k )  8

CMPUT Computer Organization and Architecture I28 The loop body Parameter Passing Convention base of x[ ][ ]  $a0 base of y[ ][ ]  $a1 base of z[ ][ ]  $a2 Assumption i  $s0 j  $s1 k  $s2 load y[i][k] load z[k][j] d1  y[i][k]*z[k][j] d2  d2+ d1 In general, the address of y[i][k] is given by: add(y[i][k])= base of y[ ][ ] + ( i  32 + k )  8 MIPS assembly for load y[i][k]: L3:sll$t2, $s0, 5# $t2  32  i addu$t2, $t2, $s2# $t2  32  i + k sll$t2, $t2, 3# $t2  (32  i + k)  8 addu$t2, $a1, $t2# $t2  Addr(y[i][k]) l.d$f16, 0($t2)# $f16  y[i][k] Write the code to load z[k][j] in $f18. MIPS assembly for load z[k][j]: sll$t2, $s2, 5# $t2  32  k addu$t2, $t2, $s1# $t2  32  k + j sll$t2, $t2, 3# $t2  (32  k + j)  8 addu$t2, $a2, $t2# $t2  Addr(z[k][j]) l.d$f18, 0($t2)# $f16  z[k][j]

CMPUT Computer Organization and Architecture I29 The loop body (cont.) Parameter Passing Convention base of x[ ][ ]  $a0 base of y[ ][ ]  $a1 base of z[ ][ ]  $a2 Assumption i  $s0 j  $s1 k  $s2 load y[i][k] load z[k][j] d1  y[i][k]*z[k][j] d2  d2+ d1 MIPS assembly for multiply and add: mul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] add.d$f4, $f4, $f16 Once we have loaded y[i][k] into $f16 and z[k][j] into $f18, we can proceed to peform the multiply and the add:

CMPUT Computer Organization and Architecture I30 Initializing and Storing $f4 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:$f4  0.0 li$s2, 0 # k  0 L3: addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 x[i][j]  $f4 addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1 How can we initialize $f4? MIPS assembly to initialize $f4: mtc1$zero, $f2 mtc1$zero, $f3 Parameter Passing Convention base of x[ ][ ]  $a0 base of y[ ][ ]  $a1 base of z[ ][ ]  $a2 Assumption i  $s0 j  $s1 k  $s2 Warning: In your textbook, page A-69, mtcz is specified as follows: Move to coprocessor z: mtczrd, rt Move CPU register rt to coprocessor z’s register rd.

CMPUT Computer Organization and Architecture I31 Initializing and Storing $f4 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:$f4  0.0 li$s2, 0 # k  0 L3: addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 x[i][j]  $f4 addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1 How can we initialize $f4? MIPS assembly to initialize $f4: mtc1$zero, $f4 mtc1$zero, $f5 How can we store $f4 in x[i][j]? MIPS assembly to store $f4 in x[i][j]: L3:sll$t2, $s0, 5# $t2  32  i addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  i + j)  8 addu$t2, $a0, $t2# $t2  Addr(x[i][j]) swc1$f4, 0($t2)# x[i][j]  $f4 swc1$f5, 4($t2) Parameter Passing Convention base of x[ ][ ]  $a0 base of y[ ][ ]  $a1 base of z[ ][ ]  $a2 Assumption i  $s0 j  $s1 k  $s2

CMPUT Computer Organization and Architecture I32 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:mtc1$zero, $f4 mtc1$zero, $f5 li$s2, 0 # k  0 L3:sll$t2, $s0, 5 # $t2  32  i addu$t2, $t2, $s2# $t2  32  i + k sll$t2, $t2, 3# $t2  (32  i + k)  8 addu$t2, $a1, $t2# $t2  Addr(y[i][k]) l.d$f16, 0($t2)# $f16  y[i][k] sll$t2, $s2, 5# $t2  32  k addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  k + j)  8 addu$t2, $a2, $t2# $t2  Addr(z[k][j]) l.d$f18, 0($t2)# $f16  z[k][j] mul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] add.d$f4, $f4, $f16 addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 sll$t2, $s0, 5# $t2  32  i addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  i + j)  8 addu$t2, $a0, $t2# $t2  Addr(x[i][j]) swc1$f4, 0($t2)# x[i][j]  $f4 swc1$f5, 4($t2) addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1 Parameter Passing Convention base of x[ ][ ]  $a0 base of y[ ][ ]  $a1 base of z[ ][ ]  $a2 Assumption i  $s0 j  $s1 k  $s2

CMPUT Computer Organization and Architecture I33 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:mtc1$zero, $f4 mtc1$zero, $f5 li$s2, 0 # k  0 L3:sll$t2, $s0, 5 # $t2  32  i addu$t2, $t2, $s2# $t2  32  i + k sll$t2, $t2, 3# $t2  (32  i + k)  8 addu$t2, $a1, $t2# $t2  Addr(y[i][k]) l.d$f16, 0($t2)# $f16  y[i][k] sll$t2, $s2, 5# $t2  32  k addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  k + j)  8 addu$t2, $a2, $t2# $t2  Addr(z[k][j]) l.d$f18, 0($t2)# $f16  z[k][j] mul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] add.d$f4, $f4, $f16 addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 sll$t2, $s0, 5# $t2  32  i addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  i + j)  8 addu$t2, $a0, $t2# $t2  Addr(x[i][j]) swc1$f4, 0($t2)# x[i][j]  $f4 swc1$f5, 4($t2) addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1 Parameter Passing Convention base of x[ ][ ]  $a0 base of y[ ][ ]  $a1 base of z[ ][ ]  $a2 Assumption i  $s0 j  $s1 k  $s2 load y[i][k] in $f16 load z[k][j] in $f16 store $f4 in x[i][j]

CMPUT Computer Organization and Architecture I34 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:mtc1$zero, $f4 mtc1$zero, $f5 li$s2, 0 # k  0 L3:sll$t2, $s0, 5 # $t2  32  i addu$t2, $t2, $s2# $t2  32  i + k sll$t2, $t2, 3# $t2  (32  i + k)  8 addu$t2, $a1, $t2# $t2  Addr(y[i][k]) l.d$f16, 0($t2)# $f16  y[i][k] sll$t2, $s2, 5# $t2  32  k addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  k + j)  8 addu$t2, $a2, $t2# $t2  Addr(z[k][j]) l.d$f18, 0($t2)# $f16  z[k][j] mul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] add.d$f4, $f4, $f16 addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 sll$t2, $s0, 5# $t2  32  i addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  i + j)  8 addu$t2, $a0, $t2# $t2  Addr(x[i][j]) swc1$f4, 0($t2)# x[i][j]  $f4 swc1$f5, 4($t2) addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1 Write the code to save/restore registers that need to be saved in the stack.

CMPUT Computer Organization and Architecture I35 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:mtc1$zero, $f4 mtc1$zero, $f5 li$s2, 0 # k  0 L3:sll$t2, $s0, 5 # $t2  32  i addu$t2, $t2, $s2# $t2  32  i + k sll$t2, $t2, 3# $t2  (32  i + k)  8 addu$t2, $a1, $t2# $t2  Addr(y[i][k]) l.d$f16, 0($t2)# $f16  y[i][k] sll$t2, $s2, 5# $t2  32  k addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  k + j)  8 addu$t2, $a2, $t2# $t2  Addr(z[k][j]) l.d$f18, 0($t2)# $f16  z[k][j] mul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] add.d$f4, $f4, $f16 addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 sll$t2, $s0, 5# $t2  32  i addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  i + j)  8 addu$t2, $a0, $t2# $t2  Addr(x[i][j]) swc1$f4, 0($t2)# x[i][j]  $f4 swc1$f5, 4($t2) addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1 Write the code to save/restore registers that need to be saved in the stack. MIPS foo stack saving assembly: addi$sp, $sp, -36 sw$s0, 32($sp) sw$s1, 28($sp) sw$s2, 24($sp) swc1$f4, 20($sp) swc1$f5, 16($sp) swc1$f16, 12($sp) swc1$f17, 8($sp) swc1$f18, 4($sp) swc1$f19, 0($sp) MIPS foo stack restoring assembly: lwc1$f19, 0($sp) lwc1$f18, 4($sp) lwc1$f17, 8($sp) lwc1$f16, 12($sp) lwc1$f5, 16($sp) lwc1$f4, 20($sp) lw$s2, 24($sp) lw$s1, 28($sp) lw$s0, 32($sp) addi$sp, $sp, 36

CMPUT Computer Organization and Architecture I36 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:mtc1$zero, $f4 mtc1$zero, $f5 li$s2, 0 # k  0 L3:sll$t2, $s0, 5 # $t2  32  i addu$t2, $t2, $s2# $t2  32  i + k sll$t2, $t2, 3# $t2  (32  i + k)  8 addu$t2, $a1, $t2# $t2  Addr(y[i][k]) l.d$f16, 0($t2)# $f16  y[i][k] sll$t2, $s2, 5# $t2  32  k addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  k + j)  8 addu$t2, $a2, $t2# $t2  Addr(z[k][j]) l.d$f18, 0($t2)# $f16  z[k][j] mul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] add.d$f4, $f4, $f16 addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 sll$t2, $s0, 5# $t2  32  i addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  i + j)  8 addu$t2, $a0, $t2# $t2  Addr(x[i][j]) swc1$f4, 0($t2)# x[i][j]  $f4 swc1$f5, 4($t2) addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1 Suppose that we classify the instructions of this program into: integer logic and arithmetic 32-bit load/stores conditional branchs FP additions FP multiplications move to/from coprocessor How many instructions of each class are executed?

CMPUT Computer Organization and Architecture I37 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:mtc1$zero, $f4 mtc1$zero, $f5 li$s2, 0 # k  0 L3:sll$t2, $s0, 5 # $t2  32  i addu$t2, $t2, $s2# $t2  32  i + k sll$t2, $t2, 3# $t2  (32  i + k)  8 addu$t2, $a1, $t2# $t2  Addr(y[i][k]) l.d$f16, 0($t2)# $f16  y[i][k] sll$t2, $s2, 5# $t2  32  k addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  k + j)  8 addu$t2, $a2, $t2# $t2  Addr(z[k][j]) l.d$f18, 0($t2)# $f18  z[k][j] mul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] add.d$f4, $f4, $f16 addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 sll$t2, $s0, 5# $t2  32  i addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  i + j)  8 addu$t2, $a0, $t2# $t2  Addr(x[i][j]) swc1$f4, 0($t2)# x[i][j]  $f4 swc1$f5, 4($t2) addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1 First we will have to examine the pseudoinstructions. For instance li $t1, 32 is translated to ori $t1, $zero, 32 And l.d $f16, 0($t2) is translated to lwc1 $f18, 0($t2) lwc1 $f19, 4($t2)

CMPUT Computer Organization and Architecture I38 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:mtc1$zero, $f4 mtc1$zero, $f5 li$s2, 0 # k  0 L3:sll$t2, $s0, 5 # $t2  32  i addu$t2, $t2, $s2# $t2  32  i + k sll$t2, $t2, 3# $t2  (32  i + k)  8 addu$t2, $a1, $t2# $t2  Addr(y[i][k]) l.d$f16, 0($t2)# $f16  y[i][k] sll$t2, $s2, 5# $t2  32  k addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  k + j)  8 addu$t2, $a2, $t2# $t2  Addr(z[k][j]) l.d$f18, 0($t2)# $f16  z[k][j] mul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] add.d$f4, $f4, $f16 addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 sll$t2, $s0, 5# $t2  32  i addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  i + j)  8 addu$t2, $a0, $t2# $t2  Addr(x[i][j]) swc1$f4, 0($t2)# x[i][j]  $f4 swc1$f5, 4($t2) addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1 out = 1 L1 = 32 times L2 = 32  32 times L3 = 32  32  32 times How many times each loop is executed?

CMPUT Computer Organization and Architecture I39 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:mtc1$zero, $f4 mtc1$zero, $f5 li$s2, 0 # k  0 L3:sll$t2, $s0, 5 # $t2  32  i addu$t2, $t2, $s2# $t2  32  i + k sll$t2, $t2, 3# $t2  (32  i + k)  8 addu$t2, $a1, $t2# $t2  Addr(y[i][k]) l.d$f16, 0($t2)# $f16  y[i][k] sll$t2, $s2, 5# $t2  32  k addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  k + j)  8 addu$t2, $a2, $t2# $t2  Addr(z[k][j]) l.d$f18, 0($t2)# $f16  z[k][j] mul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] add.d$f4, $f4, $f16 addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 sll$t2, $s0, 5# $t2  32  i addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  i + j)  8 addu$t2, $a0, $t2# $t2  Addr(x[i][j]) swc1$f4, 0($t2)# x[i][j]  $f4 swc1$f5, 4($t2) addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1 L1 = 32 times L2 = 32  32 times L3 = 32  32  32 times Complete the table below with the number of instructions of each type executed in each region of the program.

CMPUT Computer Organization and Architecture I40 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:mtc1$zero, $f4 mtc1$zero, $f5 li$s2, 0 # k  0 L3:sll$t2, $s0, 5 # $t2  32  i addu$t2, $t2, $s2# $t2  32  i + k sll$t2, $t2, 3# $t2  (32  i + k)  8 addu$t2, $a1, $t2# $t2  Addr(y[i][k]) l.d$f16, 0($t2)# $f16  y[i][k] sll$t2, $s2, 5# $t2  32  k addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  k + j)  8 addu$t2, $a2, $t2# $t2  Addr(z[k][j]) l.d$f18, 0($t2)# $f16  z[k][j] mul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] add.d$f4, $f4, $f16 addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 sll$t2, $s0, 5# $t2  32  i addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  i + j)  8 addu$t2, $a0, $t2# $t2  Addr(x[i][j]) swc1$f4, 0($t2)# x[i][j]  $f4 swc1$f5, 4($t2) addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1 L1 = 32 times L2 = 32  32 times L3 = 32  32  32 times Complete the table below with the number of instructions of each type executed in each region of the program.

CMPUT Computer Organization and Architecture I41 MIPS assembly: li$t1, 32 # t1  32 li$s0, 0 # i  0 L1:li$s1, 0 # j  0 L2:mtc1$zero, $f4 mtc1$zero, $f5 li$s2, 0 # k  0 L3:sll$t2, $s0, 5 # $t2  32  i addu$t2, $t2, $s2# $t2  32  i + k sll$t2, $t2, 3# $t2  (32  i + k)  8 addu$t2, $a1, $t2# $t2  Addr(y[i][k]) l.d$f16, 0($t2)# $f16  y[i][k] sll$t2, $s2, 5# $t2  32  k addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  k + j)  8 addu$t2, $a2, $t2# $t2  Addr(z[k][j]) l.d$f18, 0($t2)# $f16  z[k][j] mul.d$f16, $f18, $f16# $f16  y[i][k]  z[k][j] add.d$f4, $f4, $f16 addiu$s2, $s2, 1 # k  k+1 bne$s2, $t1, L3 sll$t2, $s0, 5# $t2  32  i addu$t2, $t2, $s1# $t2  32  i + j sll$t2, $t2, 3# $t2  (32  i + j)  8 addu$t2, $a0, $t2# $t2  Addr(x[i][j]) swc1$f4, 0($t2)# x[i][j]  $f4 swc1$f5, 4($t2) addiu$s1, $s1, 1 # j  j+1 bne$s1, $t1, L2 addiu$s0, $s0, 1 # i  i+1 bne$s0, $t1, L1 L1 = 32 times L2 = 32  32 times = 1024 times L3 = 32  32  32 times = times Complete the table below with the number of instructions of each type executed in each region of the program.

CMPUT Computer Organization and Architecture I42 Computing CPI If you know that each of the following types of instructions take the indicated number of clock cycles to execute. How would you compute the CPI for this machine?

CMPUT Computer Organization and Architecture I43 Computing CPI (cont.)

CMPUT Computer Organization and Architecture I44 Computing Execution Time If the machine that we are using has a processor that operates at 1.3 GHz, how long does it take to execute foo( )?

CMPUT Computer Organization and Architecture I45 In preparation to the midterm... Write a code segment that reads a byte B from the address 0x and: a) writes 0x FF in address 0x if the bit 5 of B is 1; b) writes 0xFFFF FFFF FFFF FF00 in address 0x otherwise

CMPUT Computer Organization and Architecture I46 In preparation to the midterm... Write a minimum instruction sequence that inverts all the bits in the exponent field of the number stored in register $f2.