Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

Cache Performance 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
Cache Memory Organization
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Systems I Locality and Caching
ECE Dept., University of Toronto
Memory Hierarchy 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
0 High-Performance Computer Architecture Memory Organization Chapter 5 from Quantitative Architecture January 2006.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
Paging (continued) & Caching CS-3013 A-term Paging (continued) & Caching CS-3013 Operating Systems A-term 2008 (Slides include materials from Modern.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Memory Architecture Chapter 5 in Hennessy & Patterson.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Virtual Memory.  Next in memory hierarchy  Motivations:  to remove programming burdens of a small, limited amount of main memory  to allow efficient.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
Virtual Memory Topics Address spaces Motivations for virtual memory Address translation Accelerating translation with TLBs.
Memory Hierarchy 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
Summary of caches: The Principle of Locality: –Program likely to access a relatively small portion of the address space at any instant of time. Temporal.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
CSE 351 Section 9 3/1/12.
Cache Memory and Performance
Section 9: Virtual Memory (VM)
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Today How’s Lab 3 going? HW 3 will be out today
Local secondary storage (local disks)
Virtual Memory: Concepts /18-213/14-513/15-513: Introduction to Computer Systems 17th Lecture, October 23, 2018.
Lecture 21: Memory Hierarchy
Andy Wang Operating Systems COP 4610 / CGS 5765
Lecture 23: Cache, Memory, Virtual Memory
Lecture 22: Cache Hierarchies, Memory
ECE Dept., University of Toronto
Andy Wang Operating Systems COP 4610 / CGS 5765
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Lecture 21: Memory Hierarchy
Fundamentals of Computing: Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Cache Memory and Performance
Sarah Diesburg Operating Systems CS 3430
Andy Wang Operating Systems COP 4610 / CGS 5765
Memory Management Jennifer Rexford.
Sarah Diesburg Operating Systems COP 4610
Instructor: Phil Gibbons
Presentation transcript:

Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited

Memory Hierarchy regs on-chip L1 cache (SRAM) main memory (DRAM) local secondary storage (local disks) Larger, slower, cheaper per byte remote secondary storage (tapes, distributed file systems, Web servers) on-chip L2 cache (SRAM) Smaller, faster, costlier per byte

– 3 – Why Caches Work Temporal locality: Recently referenced items are likely to be referenced again in the near future Spatial locality: Items with nearby addresses tend to be referenced close together in time block

Cache (L1 and L2) Performance Metrics Miss Rate Fraction of memory references not found in cache (misses / accesses) = 1 – hit rate Typical numbers (in percentages): 3-10% for L1 can be quite small (e.g., < 1%) for L2, depending on size, etc. Hit Time Time to deliver a block in the cache to the processor includes time to determine whether the line is in the cache Typical numbers: 1-3 clock cycles for L clock cycles for L2 Miss Penalty Additional time required because of a miss typically cycles for main memory

– 5 – Lets think about those numbers Huge difference between a hit and a miss Could be 100x, if just L1 and main memory Would you believe 99% hits is twice as good as 97%? Consider: cache hit time of 1 cycle miss penalty of 100 cycles Average access time: 0.97 * 1 cycle * 100 cycles = 3.97 cycles 0.99 * 1 cycle * 100 cycles = 1.99 cycles

Types of Cache Misses Cold (compulsory) miss Occurs on first access to a block Spatial locality of access helps (also prefetching---more later) Conflict miss Multiple data objects all map to the same slot (like in hashing) e.g, block i must be placed in cache entry/slot: i mod 8 replacing block already in that slot referencing blocks 0, 8, 0, 8,... would miss every time Conflict misses are less of a problem these days Set associative caches with 8, or 16 set size per slot help Capacity miss When the set of active cache blocks (working set) is larger than the cache This is where to focus nowadays

– 7 – What about writes? Multiple copies of data exist: L1, L2, Main Memory, Disk What to do on a write-hit? Write-back (defer write to memory until replacement of line) Need a dirty bit (line different from memory or not) What to do on a write-miss? Write-allocate (load into cache, update line in cache)Typical Write-back + Write-allocateRare Write-through (write immediately to memory, usually for I/O)

– 8 – Main Memory is something like a Cache (for Disk) Driven by enormous miss penalty: Disk is about 10,000x slower than DRAM DRAM Design: Large page (block) size: typically 4KB

– 9 – Programs refer to virtual memory addresses Conceptually very large array of bytes (4GB for IA32, 16 exabytes for 64 bits) Each byte has its own address System provides address space private to each process Allocation: Compiler and run-time system All allocation within single virtual address space Virtual Memory

Virtual Addressing MMU = Memory Management Unit MMU keeps mapping of VAs -> PAs in a “page table” 0: 1: Main memory MMU 2: 3: 4: 5: 6: 7: Physical address (PA) Data word... CPU Virtual address (VA) CPU Chip

– 11 – MMU Needs Table of Translations MMU keeps mapping of VAs -> PAs in a “page table” 0: 1: Main memory MMU 2: 3: 4: 5: 6: 7: Physical address (PA)... CPU Virtual address (VA) CPU Chip Page Table

– 12 – Where is page table kept ? In main memory – can be cached e.g., in L2 (like data) 0: 1: Main memory MMU 2: 3: 4: 5: 6: 7: Physical address (PA)... CPU Virtual address (VA) CPU Chip Page Table

– 13 – Speeding up Translation with a TLB Translation Lookaside Buffer (TLB) Small hardware cache for page table in MMU Caches page table entries for a number of pages (eg., 256 entries)

– 14 – TLB Hit MMU Mem PA Data CPU VA CPU Chip PTE A TLB hit saves you from accessing memory for the page table TLB VA 3 Page Table

– 15 – TLB Miss MMU Mem PA Data CPU VA CPU Chip PTE TLB VA 4 PTE request 3 A TLB miss incurs an additional memory access (the PT) Page Table

– 16 – How to Program for Virtual Memory At any point in time, programs tend to access a set of active virtual pages called the working set Programs with better temporal locality will have smaller working sets If ((working set size) > main mem size) Thrashing: Performance meltdown where pages are swapped (copied) in and out continuously If ((# working set pages) > # TLB entries) Will suffer TLB misses Not as bad as page thrashing, but still worth avoiding

– 17 – More on TLBs Assume a 256-entry TLB, and each page is 4KB Can only have TLB hits for 1MB of data (256*4kB = 1MB) This is called the “TLB reach”---amount of mem TLB can cover Typical L2 cache is 6MB Hence should consider TLB-size before L2 size when tiling? Real CPUs have second-level TLBs (like an L2 for TLB) This is getting complicated to reason about! Likely have to experiment to find best tile size

– 18 – Memory Optimization: Summary Caches Conflict Misses: Not much of a concern (set-associative caches) Cache Capacity: Keep working set within on-chip cache capacity Fit in L1 or L2 depending on working-set size Virtual Memory: Page Misses: Keep page-level working set within main memory capacity TLB Misses: may want to keep working set #pages < TLB #entries

IA32 Linux Memory Layout Stack Runtime stack (8MB limit)Data Statically allocated data E.g., arrays & strings declared in codeHeap Dynamically allocated storage When call malloc(), calloc(), new()Text Executable machine instructions Read-only