COSC 3330/6308 Second Review Session Fall 2012. Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction.

Slides:

Advertisements

Similar presentations

SE-292 High Performance Computing

Advertisements

Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan

COSC 3330/6308 Solutions to Second Problem Set Jehan-François Pâris October 2012.

1 Memory hierarchy and paging Electronic Computers M.

Virtual Memory. Hierarchy Cache Memory : Provide invisible speedup to main memory.

Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.

CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.

COMP 3221: Microprocessors and Embedded Systems Lectures 27: Virtual Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.

1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

CS 342 – Operating Systems Spring 2003 © Ibrahim Korpeoglu Bilkent University1 Memory Management -3 CS 342 – Operating Systems Ibrahim Korpeoglu Bilkent.

Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.

Computer ArchitectureFall 2007 © November 21, 2007 Karem A. Sakallah Lecture 23 Virtual Memory (2) CS : Computer Architecture.

Translation Buffers (TLB’s)

Virtual Memory. Why do we need VM? Program address space: 0 – 2^32 bytes –4GB of space Physical memory available –256MB or so Multiprogramming systems.

1 Chapter 8 Virtual Memory Virtual memory is a storage allocation scheme in which secondary memory can be addressed as though it were part of main memory.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan

Virtual Memory I Chapter 8.

©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 24 Instructor: L.N. Bhuyan

Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.

Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

Computer Architecture Lecture 28 Fasih ur Rehman.

Some VM Complications Extra memory accesses Page tables are huge

Problems discussed in the review session for the final COSC 4330/6310 Summer 2012.

CMPE 421 Parallel Computer Architecture

Lecture 19: Virtual Memory

July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.

The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

IT253: Computer Organization

Virtual Memory Expanding Memory Multiple Concurrent Processes.

Virtual Memory Part 1 Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology May 2, 2012L22-1

The Three C’s of Misses 7.5 Compulsory Misses The first time a memory location is accessed, it is always a miss Also known as cold-start misses Only way.

Virtual Memory 1 1.

The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.

Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.

Lecture 17 Final Review Prof. Mike Schulte Computer Architecture ECE 201.

Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)

Lectures 8 & 9 Virtual Memory - Paging & Segmentation System Design.

LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

Cache Perf. CSE 471 Autumn 021 Cache Performance CPI contributed by cache = CPI c = miss rate * number of cycles to handle the miss Another important metric.

Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.

CS161 – Design and Architecture of Computer

Translation Lookaside Buffer

CMSC 611: Advanced Computer Architecture

Lecture 11 Virtual Memory

Computer Organization

ECE232: Hardware Organization and Design

Memory COMPUTER ARCHITECTURE

CS161 – Design and Architecture of Computer

Virtual Memory User memory model so far:

CSC 4250 Computer Architectures

Memory Hierarchy Virtual Memory, Address Translation

Lecture 14 Virtual Memory and the Alpha Memory Hierarchy

Translation Lookaside Buffer

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

Translation Buffers (TLB’s)

Translation Buffers (TLB’s)

Translation Buffers (TLBs)

Review What are the advantages/disadvantages of pages versus segments?

Virtual Memory 1 1.

Presentation transcript:

COSC 3330/6308 Second Review Session Fall 2012

Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction does not skip. (4×5 points for each correct line) InstructionIFID/RRALUMEMWB add r1, r2, r3 slt r1, r2, r3 ld r1, d(r2) st r1, d(r2)

Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction does not skip. (4×5 points for each correct line) InstructionIFID/RRALUMEMWB add r1, r2, r3XXXX slt r1, r2, r3 ld r1, d(r2) st r1, d(r2)

Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction does not skip. (4×5 points for each correct line) InstructionIFID/RRALUMEMWB add r1, r2, r3XXXX slt r1, r2, r3XXXX ld r1, d(r2) st r1, d(r2)

Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction does not skip. (4×5 points for each correct line) InstructionIFID/RRALUMEMWB add r1, r2, r3XXXX slt r1, r2, r3XXXX ld r1, d(r2)XXXXX st r1, d(r2)

Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction does not skip. (4×5 points for each correct line) InstructionIFID/RRALUMEMWB add r1, r2, r3XXXX slt r1, r2, r3XXXX ld r1, d(r2)XXXXX st r1, d(r2)XXXX

Conditional branch What is missing in the following diagram sketching the datapaths of the non-pipelined version of the conditional branch instruction? (2×5 points)

Conditional Branch

Shift left 2

Conditional Branch Shift left 2 Add

Immediate instructions Remember that the MIPS instruction set has a variety of immediate instructions such as  addi r1, r2, im that stores into r1 the sum of the contents of register r2 and the immediate value im. Show on the following diagram what would be the datapaths for that instruction. (3×5 points)

addi r1, r2, im Register file Sign-extended immediate ALU

addi r1, r2, im Register file Sign-extended immediate ALU

addi r1, r2, im Register file Sign-extended immediate ALU

Pipelining Consider the following pair of MIPS instructions  sub r3, r1, r2 add r4, r3, r6 Show how the second instruction will proceed when bypassing is not implemented. (5 points)

Pipelining w/o bypassing Steps sub r3, r1, r2IFID/RRALUWB add r4, r3, r6IF Cannot read register operation before being able to read new value of register r3

Pipelining w/o bypassing Steps sub r3, r1, r2IFID/RRALUWB add r4, r3, r6IFID/RRALUWB Cannot read register operation before being able to read new value of register r3

Pipelining Show how the second instruction will proceed if bypassing is implemented.

Pipelining with bypassing Steps sub r3, r1, r2IFID/RRALUWB add r4, r3, r6IF

Pipelining with bypassing Steps sub r3, r1, r2IFID/RRALUWB add r4, r3, r6IFID/RRALUWB

More pipelining Consider the following pair of MIPS instructions  lw r3, d(r1) add r4, r3, r6  Show how the second instruction will proceed when bypassing is not implemented. (5 points)

Without bypassing Steps lw r3, d(r1)IFID/RRALUMEMWB add r4, r3, r6IF Cannot read register operation before being able to read new value of register r3

Without bypassing Steps lw r3, d(r1)IFID/RRALUMEMWB add r4, r3, r6IFID/RRALU Cannot read register operation before being able to read new value of register r3

More pipelining Show how the second instruction will proceed if bypassing is implemented.

With bypassing Steps lw r3, d(r1)IFID/RRALUMEMWB add r4, r3, r6IF Cannot read register operation before being able to read new value of register r3

With bypassing Steps lw r3, d(r1)IFID/RRALUMEMWB add r4, r3, r6IFID/RRALUWB Cannot read register operation before being able to read new value of register r3

A last word about data hazards Which single MIPS instruction can cause the worst data hazards? (5 points)

A last word about data hazards Which single MIPS instruction can cause the worst data hazards? (5 points) lw (load word into register)  It goes though all cycles before updating its register

The comparator The MIPS architecture we have discussed in class includes a small comparator that checks whether the two register read outputs are equal or not.  Which MIPS instructions use this comparator? (5 points)  Why do they use this comparator instead of the ALU? (5 points)  How is this comparator implemented? (5 points)

The comparator The comparator is used by the beq and bne instructions So that the branch decision can be made one step earlier It XORes the two 32 values then ORes bitwise the result

Without special unit beqIFID/RRALUMEMWB nextIFID/RR ABORT nextIF ABORT destIFID/RRALU Must wait until end of ALU step of beq to know whether we will branch or not

With special unit beqIFID/RRALUMEMWB nextIF ABORT destIFID/RRALU Since special unit is very fast, we know whether we will branch or not by the end of the ID/RR step

Disk reliability What do we mean when we say that disk failure rates follow a bathtub curve? (5 points)

Disk reliability What do we mean when we say that disk failure rates follow a bathtub curve? (5 points) Disk failure rates are higher  For new disks (infant mortality)  As disks wear down at the end of their useful lifetime

Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  What is the cache size (tags excluded) in bytes?

The cache Tag 4 words = 4  4 bytes Tag Bit 2,048 lines

Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  What is the cache size (tags excluded) in bytes? 2,048  4  4 = 32K bytes

Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  What is the tag size?

Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  What is the tag size? 32 – 4 – 11 =17 bits  Remove log2 (16) = 4 bits since each entry is 16-byte long  Remove log2(2,048) = 11 bits that are given by address in cache.

Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  How could we increase the hit ratio of the cache without increasing its size?

Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  How could we increase the hit ratio of the cache without increasing its size? Replacing it with a set-associative cache that could store 1,204 pairs of four-word entries.

Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  What would be the main disadvantage of your solution?

Caching A small direct-mapping cache has 2,048 entries with each entry containing four words. The computer memory is byte-addressable and all addresses are 32-bit addresses. (4×5 points)  What would be the main disadvantage of your solution? Set-associative caches are slower than direct mapping caches

Main memory organization Assuming that a main memory access takes  1 bus clock cycle to send the address,  16 bus clock cycle to initiate a read,  1 bus clock cycle to send a word of data, how many clock cycles would it take to transfer 16 bytes to the cache if  the data are stored in a single bank of memory? (5 points)  the data are stored in a four-way interleaved memory? (5 points)

Single bank memory Assuming that a main memory access takes  1 bus clock cycle to send the address,  16 bus clock cycle to initiate a read,  1 bus clock cycle to send a word of data, how many clock cycles would it take to transfer 16 bytes to the cache?   (16 + 1) = 69 cycles All operations are done sequentially

Four-way interleaved memory Assuming that a main memory access takes  1 bus clock cycle to send the address,  16 bus clock cycle to initiate a read,  1 bus clock cycle to send a word of data, how many clock cycles would it take to transfer 16 bytes to the cache?   1 = 21 cycles The reads, but not the data transfers, are now performed in parallel

Protecting page tables How can we prevent user programs from modifying their own page tables? (5 points)

Protecting page tables How can we prevent user programs from modifying their own page tables? (5 points)  We must store page tables in the protected area of the operating system.

Caches and virtual memory What would be a reasonable page size for a virtual memory system? Justify your answer in a few words. Would that be a reasonable block size for a cache? Justify your answer in a few words.

Caches and virtual memory What would be a reasonable page size for a virtual memory system?  4K bytes

Caches and virtual memory What would be a reasonable page size for a virtual memory system?  4K bytes Justify your answer in a few words.  Because page faults are very costly, the system should try to bring in as many useful data as possible.

Caches and virtual memory What would be a reasonable page size for a virtual memory system?  4K bytes Would that be a reasonable block size for a cache?  NO

Caches and virtual memory What would be a reasonable page size for a virtual memory system?  4K bytes Would that be a reasonable block size for a cache?  NO Justify your answer in a few words. Cache block sizes are much smaller: 64 bytes is a good choice because larger block sizes create too many collisions.

Page table size How can we limit the size of page tables to 512KB in a 32-bit virtual system?

Answer We do all the computations in reverse  Desired page table size 512 KB  Number of page table entries

Answer We do all the computations in reverse  Desired page table size 512 KB  Number of page table entries?

Answer We do all the computations in reverse  Desired page table size 512 KB  Number of page table entries: 512/4 =128 K Each page table entry occupies four bytes  Number of bits occupied by the page number?

Answer We do all the computations in reverse  Desired page table size 512 KB  Number of page table entries: 512/4 =128K Each page table entry occupies four bytes  Number of bits occupied by the page number: log2(128K) = log2(2 17 ) = 17 bits  Number of bits occupied by the byte offset?

Answer We do all the computations in reverse  Desired page table size 512 KB  Number of page table entries: 512/4 =128K Each page table entry occupies four bytes  Number of bits occupied by the page number: log2(128K) = log2(2 17 ) = 17 bits  Number of bits occupied by the byte offset: = 15 bits  Page size?

Answer We do all the computations in reverse  Desired page table size 512 KB  Number of page table entries: 512/4 =128K Each page table entry occupies four bytes  Number of bits occupied by the page number: log2(128K) = log2(2 17 ) = 17 bits  Number of bits occupied by the byte offset: = 15 bits  Page size: 2 15 bytes = 32 KB

TLB misses When comparing the hit ratios of two translation look-aside buffers, which question should we ask first?

Answer Are TLB misses handled by the firmware or by the OS?  If TLB misses are handled by the firmware, the cost of a TLB miss is one extra memory reference  If TLB misses are handled by the OS, the cost of a TLB miss is two context switches.

The dirty bit What is the purpose of the dirty bit?

Answer The dirty bit tells whether a page has been modified since the last time it was brought into main memory. It is used whenever a page must be expelled from main memory.  If its dirty bit is ON, the page must be saved to disk before being expelled  If its dirty bit is OFF, there already is an exact copy of the page on disk.

Page table organization What is the main advantage of hashed page tables?

Answer Hashed page tables only keep track of the pages that are actually in main memory Their size is proportional to the size of the physical memory  Instead of the size of the virtual address space

ALWAYS REMEMBER One KILOis2 10 One MEGAis2 20 One GIGAis2 30 In binary, 2 n is 1 followed by n zeroes