1 Answers to Test 1, Question 1 The functions determines if a positive number is prime. It does this by checking if the number is lower than 4, if so it.

Slides:

Advertisements

Similar presentations

Goal: Write Programs in Assembly

Advertisements

Review of the MIPS Instruction Set Architecture. RISC Instruction Set Basics All operations on data apply to data in registers and typically change the.

Lecture 5: MIPS Instruction Set

Adding the Jump Instruction

1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.

1 Answers to Test 1, Question 1 The function determines if the number is divisible by 6. It performs this by dividing the number first by 3 and then by.

Chapter 2 — Instructions: Language of the Computer — 1 Branching Far Away If branch target is too far to encode with 16-bit offset, assembler rewrites.

Solution 2nd Exam.

Assembly Code Example Selection Sort.

The University of Adelaide, School of Computer Science

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 5 MIPS ISA & Assembly Language Programming.

ELEN 468 Advanced Logic Design

© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.

1 Quiz 3, Answers 1,3 The CPI is: 0.22* * * *12 = = 5.42 In the 2nd case the CPI is 1.0. Every instruction.

Homework 2 Review Cornell CS Calling Conventions int gcd(int x, int y) { int t, a = x, b = y; while(b != 0) { t = b; b = mod(a, b); a = t; } return.

Caches The principle that states that if data is used, its neighbor will likely be used soon.

1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.

Computer Architecture - The Instruction Set The Course’s Goals  To be interesting and fun. An interested student learns more.  To answer questions that.

1 Answers to Test 2, Question 1 The functions receives the radius and height of a cylinder (גליל) and returns its volume (נפח). const float PI=3.14 float.

MIPS Instruction Set Advantages

9/29: Lecture Topics Memory –Addressing (naming) –Address space sizing Data transfer instructions –load/store on arrays on arrays with variable indices.

Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.

Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)

Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.

 Higher associativity means more complex hardware  But a highly-associative cache will also exhibit a lower miss rate —Each set has more blocks, so there’s.

CMPE 421 Parallel Computer Architecture

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

1 CENG 450 Computer Systems and Architecture Cache Review Amirali Baniasadi

The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.

The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.

CHAPTER 6 Instruction Set Architecture 12/7/

Lecture Objectives: 1)Explain the relationship between miss rate and block size in a cache. 2)Construct a flowchart explaining how a cache miss is handled.

ECE 15B Computer Organization Spring 2011 Dmitri Strukov Partially adapted from Computer Organization and Design, 4 th edition, Patterson and Hennessy,

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

EE 3755 Datapath Presented by Dr. Alexander Skavantzos.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

COMPUTER ORGANIZATION LECTURE 3: ISA YASSER MOHAMMAD.

COSC 3330/6308 Second Review Session Fall Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.

CSCI206 - Computer Organization & Programming

CSE 351 Section 9 3/1/12.

CS2100 Computer Organization

MIPS Instruction Set Advantages

CS2100 Computer Organisation

ELEN 468 Advanced Logic Design

RISC Concepts, MIPS ISA Logic Design Tutorial 8.

ECE 445 – Computer Organization

CSCI206 - Computer Organization & Programming

Instructions - Type and Format

Single-Cycle CPU DataPath.

Set-Associative Cache

Addressing in Jumps jump j Label go to Label op address 2 address

Direct Mapping.

Instruction encoding The ISA defines Format = Encoding

MIPS Microarchitecture Multicycle Processor

Instruction encoding The ISA defines Format = Encoding

COMS 361 Computer Organization

Instruction encoding The ISA defines Format = Encoding

Instruction encoding The ISA defines Format = Encoding

9/27: Lecture Topics Memory Data transfer instructions

Instruction Set Architecture

Caches & Memory.

Conditional Branching (beq)

Presentation transcript:

1 Answers to Test 1, Question 1 The functions determines if a positive number is prime. It does this by checking if the number is lower than 4, if so it is prime. It then divides the number by all numbers smaller than n/2. int prime(int n){ int i=2; if(n < 4) return 0; while(i<=n/2){ if(n%i == 0) return 1; i++; } return 0; }

2 Question 2 b 0 4 Result Operation a 2 CarryIn CarryOut 1 3 b 0 4 Result Operation a 2 CarryIn CarryOut 1 3

3 Question 3 ET A = 135M * 3.1 * 2ns = 837M ns = sec ET B = 125M *3.2 *(1/450M)ns = 888M ns = s A is faster than B by 6% (888/837) Computer A: FE=1.00 (all the instructions are effected), SE = 135/127 = (1.00/ )*0.837 = sec Computer B: FE=0.25, SE=3.2/2.6 = 1.23 (0.60/ )*0.888 = sec Now B is faster than A by 0.1%

4 Question 4 Mapping an address into a cache is done by: 1. Computing the block address of the address. Dividing the address into the number of words in each block. 2. Finding the cache location by computing the block address modulo (%) the number of blocks in the cache. 3. The offset of the word in the block, is the remainder of step 1. To get the block address of each address we have to divide by the number of words in the block, which is 1. This number modulo 16 is the cache block the address is mapped to. The only hits are the last accesses to 5, 9, and 17. All the other accesses are misses, so the hit rate is 19%.

5 Question 4 (cont) If the block size is 4 we must divide each address by 4 to get the block address and then compute modulo 4. Now we have 1(m), 4(m), 8(m), 5(h), 20(m), 17(m), 19(h), 56(m), 9(m), 11(h), 4(m), 43(m), 5(h), 6(h), 9(m), 17(h). The hit rate is now 37%. For the 3 rd case the mapping is set address modulo number of sets. So each address has to be divided by 8 and then modulo 2 taken: 1(m), 4(m), 8(m), 5(h), 20(m), 17(m), 19(h), 56(m), 9(h), 11(h), 4(m), 43(m), 5(h), 6(h), 9(h), 17(h). The hit ratio is now 50%.

6 Question 5 If the loop body is less than bytes a bne can be used: bne $a0,$a1,End … End: If the loop body is larger than bytes a j must be used: beq $a0,$a1,L1 j End L1: … End:

7 Question 5 (cont) If the loop spans a 256MB boundary j can't be used. Only jr can be used. The problem is getting the address in a register. Using lui or a shift operation can solve the problem: lui $t0,0xFF30 # li $t0,0xFF30 # sll $t0,$t0,16 addi $t0,$t0,0x800 beq $a0,$a1,L1 jr $t0 L1: … End:

8 Question 6 AMAT = 0.90* (0.90* *13) = 2.47 widening the bus between memory to L2 can cut down the miss penalty, as we can transfer more data per cycle. On the other hand more bus lines are needed which can be expensive. Longer blocks in L2 can take advantage of spatial locality and raise the hit-ratio. On the other hand not the whole block may be used. It also might effect the miss penalty as more data has to be brought from main memory. Going to direct mapped might reduce the hit-time, but might reduce the hit-ratio as well.

9 Question 6 (cont) Let's assume that widening the bus reduces the miss penalty to 10 cycles: AMAT = 0.90* (0.90* *10) = 2.44 Let's assume that raising the L2 block size raises the hit-ratio to 0.95, but also raises the miss penalty to 15 cycles. AMAT = 0.90* (0.95* *15) = Let's assume that going from 4-way set associative to direct-mapped, reduces the hit time to 1 cycle and the hit ratio to 80%. AMAT = 0.80* (0.90* *13) = 2.14

10 Question 7 swap: lw $t0,0($a0)# temp = *a lw $t1,0($a1)# $t1 = *b sw $t1,0($a0) # *a = *b sw $t0,0($a1) # *b = temp jr $ra # return

11 Question 8 What is reduced is the number of bits for the opcode, this reduces the number of possible instructions. RISC processors have instructions that are the same length (1 word) as opposed to GPRs that have variable length instructions. RISC processors have 3 operand instructions as opposed to 2 operand instructions. RISC processors access memory only through Load/Store instructions, as opposed to GPRs where all instructions (even ALU instructions) can have memory operands. RISC programs might have more instructions.