COSC 3330/6308 Solutions to Second Problem Set Jehan-François Pâris October 2012.

Slides:

Advertisements

Similar presentations

Lecture 12 Reduce Miss Penalty and Hit Time

Advertisements

ELEN 468 Advanced Logic Design

CMPT 334 Computer Organization

1 Quiz 3, Answers 1,3 The CPI is: 0.22* * * *12 = = 5.42 In the 2nd case the CPI is 1.0. Every instruction.

1  1998 Morgan Kaufmann Publishers Chapter Five The Processor: Datapath and Control.

Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.

Chapter Six Enhancing Performance with Pipelining

Pipelining Andreas Klappenecker CPSC321 Computer Architecture.

CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?

The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.

Chapter Five The Processor: Datapath and Control.

Lecture 16: Basic CPU Design

Answers to the VM Problems Spring First question A computer has 32 bit addresses and a virtual memory with a page size of 8 kilobytes.  How many.

ECE 353 ECE 353 Fall 2007 Lab 3 Machine Simulator November 1, 2007.

The Processor Data Path & Control Chapter 5 Part 1 - Introduction and Single Clock Cycle Design N. Guydosh 2/29/04.

Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.

Chapter VIII Virtual Memory Review Questions Jehan-François Pâris

TDC 311 The Microarchitecture. Introduction As mentioned earlier in the class, one Java statement generates multiple machine code statements Then one.

COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.

B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.

COSC 3330/6308 Solutions to the Third Problem Set Jehan-François Pâris November 2012.

1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.

1. Building A CPU  We’ve built a small ALU l Add, Subtract, SLT, And, Or l Could figure out Multiply and Divide  What about the rest l How do.

D ATA P ATH OF A PROCESSOR (MIPS) Module 1.1 : Elements of computer system UNIT 1.

December 26, 2015©2003 Craig Zilles (derived from slides by Howard Huang) 1 A single-cycle MIPS processor  As previously discussed, an instruction set.

5/13/99 Ashish Sabharwal1 Pipelining and Hazards n Hazards occur because –Don’t have enough resources (ALU’s, memory,…) Structural Hazard –Need a value.

Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010

February 22, 2016©2003 Craig Zilles (derived from slides by Howard Huang) 1 A single-cycle MIPS processor  As previously discussed, an instruction set.

Elements of Datapath for the fetch and increment The first element we need: a memory unit to store the instructions of a program and supply instructions.

Computer Organization Instructions Language of The Computer (MIPS) 2.

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

COSC 3330/6308 Second Review Session Fall Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction.

MIPS Processor.

Simulator Outline of MIPS Simulator project  Write a simulator for the MIPS five-stage pipeline that does the following: Implements a subset of.

Memory Hierarchy— Five Ways to Reduce Miss Penalty.

Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:

CS161 – Design and Architecture of Computer Systems

Computer Organization

Computer Organization CS224

CS2100 Computer Organization

Single-Cycle Datapath and Control

CSCI206 - Computer Organization & Programming

CSC 4250 Computer Architectures

Lecture 15: Basic CPU Design

Morgan Kaufmann Publishers

COSC 3330/6308 First Review Session

Performance of Single-cycle Design

ELEN 468 Advanced Logic Design

Morgan Kaufmann Publishers

Jehan-François Pâris FIRST MIDTERM ANSWERS Jehan-François Pâris

Lecture 6 Memory Hierarchy

MIPS Processor.

COSC 2021: Computer Organization Instructor: Dr. Amir Asif

Morgan Kaufmann Publishers The Processor

CSCI206 - Computer Organization & Programming

The processor: Pipelining and Branching

Design of the Control Unit for One-cycle Instruction Execution

MIPS Processor.

CSCI206 - Computer Organization & Programming

Rocky K. C. Chang 6 November 2017

The Processor Lecture 3.6: Control Hazards

Instruction encoding The ISA defines Format = Encoding

Guest Lecturer TA: Shreyas Chand

COSC 2021: Computer Organization Instructor: Dr. Amir Asif

COMS 361 Computer Organization

MIPS Processor.

Presentation transcript:

COSC 3330/6308 Solutions to Second Problem Set Jehan-François Pâris October 2012

First problem Detail for each of the four following MIPS instructions, which actions are being taken at each of their five steps. Do not forget to mention how and during which steps each instruction updates the program counter. (4×10 points).

jalr $s0, $s1 1.Fetch instruction and add 4 to PC 2.Read $s0 and save "somewhere" current value of PC 3.Transmit value of $s0 to PC either directly or through adder  This data path does not exist in the toy MIPS architecture we studied 4.Write saved PC value into register $s1  This data path does not exist in the toy MIPS architecture we studied

jalr $s0, $s1 (other good answer) This instruction cannot be implemented on the toy MIPS architecture because  There is no data path going from a read register line to the PC  Cannot set new value of PC to contents of $S0  There is no data path going from the PC to the write register line  Cannot save "old" value of PC into register $S1

sw $s1, 24($t0) 1.Fetch instruction and add 4 to PC 2.Read registers $s1 and $t0 and sign extend contents of displacement field of instruction 3.Compute memory address a by adding contents of register $t0 to sign-extended displacement 4.Store contents of register $s1 into memory address a

slt $t0, $s3, $s4 1.Fetch instruction and add 4 to PC 2.Read registers $s3 and $s4 3.Compare values of $s3 and $s4 using ALU 4.Store comparison result into register $t

jal Fetch instruction and add 4 to PC 2.Sign extend contents of displacement field of instruction 3.Save "somewhere" contents of PC 4.Multiply by four sign-extended contents of displacement field of instruction and replace 28 LSB of PC with new value 5. Write saved PC value into register $31  This data path does not exist in the toy MIPS architecture we studied

jal (other good answer) This instruction cannot be implemented on the toy MIPS architecture because  There is no data path going from the PC to the write register line  Cannot save "old" value of PC into register $S1

Second problem Consider these two potential additions to the MIPS instruction set and explain how they would restrict pipelining. (2×5 easy points)  cp d1(r1), d2(r2)  incr d2(r2)

cp d1(r1), d2(r2) Copy contents of word at address contents of r2 plus offset d2 into address contents of r1 plus displacement d1.

Answer (I) Let us look at the steps the instruction will have to take: 1.Instruction fetch 2.Instruction decode and read register r1 3.Use arithmetic unit to compute d1+[r1] 4.Access memory to read word at address d1+[r1]

Answer (II) And it continues: 5.Write somewhere the value v 6.Read register r2 7.Use arithmetic unit to compute d2+[r2] 8.Access memory to write value v at address d2+[r2] Instruction reads twice a register and accesses twice the ALU

incr d2(r2) Adds one to the contents of word at address contents of r2 plus offset d2

Answer (I) Let us look at the steps the instruction will have to take: 1.Instruction fetch 2.Instruction decode and read register r2 3.Use arithmetic unit to compute d2+[r2] –Store the address somewhere 4.Access memory to read word at address d2+[r2]

Answer (II) And it continues: 5.Use arithmetic unit to increment by 1 value that was just read 6.Access memory to write value v at address d2+[r2] that was previously saved Instruction accesses twice the ALU

Third problem Explain how you would pipeline the four following pairs of statements. (4×5 points)

Part A add $t0, $s0, $s1 beq $s1,$s2, 300 Cycle addIFID/RRALUWB beqIFID/RRALUWB No data hazard!

Part A (with special unit) Cycle addIFID/RRALUWB beqIFID/RRWB Both solutions will get full credit It can de done as this step uses a different paths than the previous instruction

Part B add $t2, $t0, $t1 sw $t3, 36($t2) Cycle addIFID/RRALUWB swIFID/RRALUMEM Data hazard is avoided thanks to forwarding unit

Part B (without forwarding unit) add $t2, $t0, $t1 sw $t3, 36($t2) Cycle addIFID/RRALUWB swIFID/RRALU Two cycles are lost (60% CREDIT)

Part C add $t0, $s0, $s1 beq $t0,$s2, 300 Cycle addIFID/RRALUWB beqIFID/RRALUWB Data hazard is avoided thanks to forwarding unit

Part C (with special unit) add $t0, $s0, $s1 beq $t0,$s2, 300 Cycle addIFID/RRALUWB beqIFID/RRWB It can de done as this step uses a different paths than the previous instruction

Part C (without forwarding unit) add $t0, $s0, $s1 beq $t0,$s2, 300 Cycle addIFID/RRALUWB beqIFID/RRALU Two cycles are lost (60% CREDIT)

Part D lw $t0, 24($t1) sub $s2, $t0, $t1 Cycle lwIFID/RRALUMEMWB subIFID/RRALUWB Data hazard is reduced by forwarding unit

Part D (without forwarding unit) lw $t0, 24($t1) sub $s2, $t0, $t1 Cycle lwIFID/RRALUMEMWB subIFID/RR Three cycles are lost (STILL 60% CREDIT)

Fourth problem A computer system has a two-level memory cache hierarchy.  L1 cache has a zero hit penalty, a miss penalty of 5 ns and a hit rate of 95 percent  L2 cache has a miss penalty of 100 ns and a hit rate of 90 percent.

Part A How many cycles are lost by each instruction accessing the memory if the CPU clock rate is 2 GHz?

Answer (I) Let P1 and P2 be respectively the miss penalties of caches L1 and L2 Duration of clock cycle 1/(2 GHz) = 0.25×10 -9 s = 0.5 ns Cache miss penalties P1 = 2  5 = 10 cycles P2 = 2  100 = 200 cycles

Answer (II) Recall P1 = 10 cycles and P2 = 200 cycles Let M1 and M2 be respectively the miss rates of caches L1 and L2 We have M1 = 0.05 and M2 = 0.10 Number of lost cycles/instruction M1  P1 + M1  M2  P2 = 0.05   0.10  200 = =1.5

Hint Use fractions to reduce risk of error

Part B We can either increase the hit rate of the topmost cache to 98 percent or increase the hit rate of the second cache to 95 percent. Which improvement would have more impact? (10 points)

Better L1 cache Recall P1 = 10 cycles and P2 = 200 cycles We now have M1 = 0.02 and M2 = 0.10 Number of lost cycles/instruction M1  P1 + M1  M2  P2 = 0.02   0.10  200 = = 0.6

Better L2 cache Recall P1 = 10 cycles and P2 = 200 cycles We now have M1 = 0.05 and M2 =0.05 Number of lost cycles/instruction M1  P1 + M1  M2  P2 = 0.05   0.05  200 = = 1

Answer For the values of M1, P1, M2 and P2 we considered  Improving the hit ratio of the L1 cache provides the best speedup

Fifth problem A virtual memory system has  A virtual address space of 4 Gigabytes  a page size of 8 Kilobytes. Each page table entry occupies 4 bytes.

Part A How many bits remain unchanged during the address translation? (5 points)

Answer How many bits remain unchanged during the address translation? (5 points)  Page size is 8 KB = 2 3  2 10 = 2 13 bytes  The last 13 bits of each address will remain unchanged during the address translation

Part B How many bits are used for the page number? (5 points)

Answer How many bits are used for the page number? (5 points)  Address space is 4 GB = 2 2  2 30 = 2 32 bytes  Will have 32-bit addresses  The page number will occupy the 32 – 13 = 19 most significant bits of the address

Reminder Page number Offset Virtual address: 32 or 64 bits Used to find right page frame number Copied unmodified Page frame numberOffset

Part C What is the maximum number of page table entries in a page table? (5 points)  Page number occupies 19 bits  Can have 2 19 pages in a process  Page tables will have 2 19 entries