The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

Slides:

Advertisements

Similar presentations

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

Advertisements

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.

Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.

CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and

Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.

1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Chapter 7 Large and Fast: Exploiting Memory Hierarchy Bo Cheng.

The Memory Hierarchy II CPSC 321 Andreas Klappenecker.

Translation Buffers (TLB’s)

Virtual Memory. Why do we need VM? Program address space: 0 – 2^32 bytes –4GB of space Physical memory available –256MB or so Multiprogramming systems.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.

1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)

11/10/2005Comp 120 Fall November 10 8 classes to go! questions to me –Topics you would like covered –Things you don’t understand –Suggestions.

1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.

Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.

Lecture 19: Virtual Memory

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.

Virtual Memory Expanding Memory Multiple Concurrent Processes.

CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and

Virtual Memory. DRAM as cache What about programs larger than DRAM? When we run multiple programs, all must fit in DRAM! Add another larger, slower level.

1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.

1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.

Lecture 08: Memory Hierarchy Cache Performance Kai Bu

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.

CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.

CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.

Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.

CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.

Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

1  1998 Morgan Kaufmann Publishers Chapter Seven.

Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)

Improving Memory Access 2/3 The Cache and Virtual Memory

LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”

Summary of caches: The Principle of Locality: –Program likely to access a relatively small portion of the address space at any instant of time. Temporal.

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.

The Memory Hierarchy Lecture 31 20/07/2009Lecture 31_CA&O_Engr. Umbreen Sabir.

Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.

CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.

CS161 – Design and Architecture of Computer

CMSC 611: Advanced Computer Architecture

Virtual Memory Chapter 7.4.

Memory COMPUTER ARCHITECTURE

CS161 – Design and Architecture of Computer

Lecture 12 Virtual Memory.

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Morgan Kaufmann Publishers

Chapter 8 Digital Design and Computer Architecture: ARM® Edition

Part V Memory System Design

Virtual Memory 4 classes to go! Today: Virtual Memory.

Lecture 08: Memory Hierarchy Cache Performance

Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

TLB Performance Seung Ki Lee.

CSC3050 – Computer Architecture

Cache Memory Rabi Mahapatra

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Presentation transcript:

The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

Translation-Look-aside Buffer (TLB)  To optimize the translation process and reduce memory access time  TLB is a cache that holds recently used page table mappings.  TLB tags hold the virtual page number and its data holds the corresponding physical page number.  TLB also holds the reference bit, valid bit and dirty bit  TLB miss - page in page table loaded by the CPU - much more frequent or  Page not in page table - page fault exception  In case of a miss the CPU selects which entry in the TLB needs to be replaced. Its reference and dirty bits are then written back into the page table.  Miss rates for the TLB are % penalty is clock cycles much smaller than page fault! 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

Example:  Consider a virtual memory system with 40-bit virtual byte address, 16 KB page and 36-bit physical byte address.  What is the total size of the page table for each process on this machine, assuming that the valid, protection, dirty and use bits take a total of 4 bits and that all the virtual pages are in use?  Assume that disk addresses are not stored on the page table.  Page table size = #entries  entry size  The #entries = # pages in virtual address = 2 40 bytes = 16  10 3 bytes/page  = 2 40 = 2 26 entries 2 4  2 10  The width of each entry is = 40 bits  Thus the size of the page table is 2 26  40 = 5  2 26 bytes= 335 MB /05/2009Lecture 32_CA&O_Engr Umbreen Sabir

TLB and cache working together (Intrinsity FastMATH Proc.)  4 KB pages, TLB - 16 entries, fully associative - all need to be compared. Each entry is 64-bits  20 tag bits (virtual page #)  20 data bits (physical page #)  valid, ref and dirty bits, etc.  One of the extra bits is a write access bit. Prevents programs from writing into pages for which they have only read access - part of protection mechanism.  There could be three misses - cache miss, TLB miss and page fault.  A TLB miss in this case takes 16 cycles on average.  CPU saves process state then gives control of the CPU to another process, then brings page from disk. 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

How are TLB misses and Page Faults handled?  TLB miss – no entry in TLB matches the virtual address. In that case, if the page is in memory (as indicated by the page table) then that address is placed in the TLB.  So the TLB miss is handled by the OS in software. Once the TLB has the virtual address in, then the instruction that caused the TLB miss is re-executed.  If the valid bit of the retrieved page address in the TLB is 0, then a page fault  When a page fault occurs, the OS takes control and stores the states of the process that caused the page fault, as well as the address of the instruction that caused the page fault in the EPC. 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

How are TLB misses and Page Faults handled?  The OS then finds a place for the page by discarding an old one (if it was dirty it first has to be saved on disk)  After that the OS starts the transfer of the needed page from hard disk and gives control of the CPU to another process (millions of cycles).  Once the page was transferred, then the OS reads the EPC and returns control to the offending process so it can complete.  Also, if that instruction that caused the page fault was a sw, the write control line for the data memory is de- asserted to prevent the sw from completing.  When an exception occurs, the processor sets a bit that disable exceptions, so that a subsequent exception will not overwrite the EPC. 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

The influence of Block size  In general, larger block size take advantage of spatial locality BUT:  Larger block size means larger miss penalty - Takes longer time to fill up the block  If block size is too big relative to cache size, miss rate will go up  Too few cache blocks  In general, Average Access Time = Hit Time  (1 - Miss Rate) + Miss Penalty  Miss Rate Miss Penalty Block Size Miss Rate Exploits Spatial Locality Fewer blocks: compromises temporal locality Block Size Average Access Time Increased Miss Penalty & Miss Rate Block Size 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

The Influence of Associativity  Every change that improves the miss rate can also negatively affect overall performance  Ex. We can reduce the miss rate by increasing associativity (30% gain for small caches going from direct-mapped to two-way associative).  But large associativity does not make sense for modern caches which are large, since hardware costs more (more comparators) and the access time is larger.  While for cache full associativity does not pay, for paged memory it is good because misses are very expensive. Large page size means that Page Table is small. 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

The influence of associativity (SPEC2000) Small caches Large caches 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

Memory writes options  There are two options: write-through (for cache) and write- back (for paged memory).  During write-back pages are written to disk only if they were modified prior to being replaced.  The advantages of write-back are that multiple writes to a given page require only one write to the disk, and using high bandwidth, not one word-at-a-time.  Individual words can be written in a page much faster (cache rate) than if they were written-through to disk.  The advantage of write-through is that misses are simpler to handle and easier to implement (using write buffer).  In the future more caches will use write-back because of the CPU- Memory gap. 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

Processor-DRAM Memory Gap (latency) Solutions to reduce the gap: -L3 cache - Have the L2, L3 caches do something while idle 7% DRAM annual performance improvement 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

Sources of (Cache) Misses  Compulsory (cold start or process migration, first reference): first access to a block  “Cold” fact of life: not a whole lot you can do about it  Note: If you are going to run “billions” of instruction, Compulsory Misses are insignificant  Conflict (collision): Multiple memory locations (blocks) mapped to the same cache location  Solution 1: increase cache size  Solution 2: increase associativity  Capacity : Cache cannot contain all blocks accessed by the program  Solution: increase cache size  Invalidation : other process (e.g., I/O) updates memory 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

Additional conflict misses when going from two-way to one-way associative cache Additional conflict misses when going from four-way to two-way associative cache Total Misses Rate vs. Cache type and size Capacity misses reduce for larger caches 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

Design alternatives  Increase cache size  Decreases capacity misses  May increase access time  Increase associativity  Decreases conflict miss rate  May increase access time  Increase block size :  Decreases miss rate due to spatial locality  But increased miss penalty  Very large blocks may increase miss rate for small caches  So design of memory hierarchies is interesting 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

Processor-DRAM Memory Gap for Multi-cores Cores Performance degradation for memory intensive applications 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir