Module: Virtual Memory

Slides:



Advertisements
Similar presentations
Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.
Advertisements

EECS 470 Virtual Memory Lecture 15. Why Use Virtual Memory? Decouples size of physical memory from programmer visible virtual memory Provides a convenient.
Virtual Memory Chapter 18 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Chapter 3.2 : Virtual Memory
Translation Buffers (TLB’s)
1 Chapter 8 Virtual Memory Virtual memory is a storage allocation scheme in which secondary memory can be addressed as though it were part of main memory.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
CS 241 Section Week #12 (04/22/10).
Lecture 19: Virtual Memory
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
Lecture 9: Memory Hierarchy Virtual Memory Kai Bu
Virtual Memory Part 1 Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology May 2, 2012L22-1
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
CS203 – Advanced Computer Architecture Virtual Memory.
Virtual Memory Chapter 8.
CS161 – Design and Architecture of Computer
Memory: Page Table Structure
CMSC 611: Advanced Computer Architecture
Virtual Memory Chapter 7.4.
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Lecture 12 Virtual Memory.
From Address Translation to Demand Paging
From Address Translation to Demand Paging
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?
Memory Hierarchy Virtual Memory, Address Translation
Morgan Kaufmann Publishers
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Part V Memory System Design
From Address Translation to Demand Paging
Evolution in Memory Management Techniques
CMSC 611: Advanced Computer Architecture
Andy Wang Operating Systems COP 4610 / CGS 5765
Andy Wang Operating Systems COP 4610 / CGS 5765
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Translation Buffers (TLB’s)
Virtual Memory Overcoming main memory size limitation
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Virtual Memory Prof. Eric Rotenberg
Translation Buffers (TLB’s)
CSC3050 – Computer Architecture
Computer Architecture
CSE 471 Autumn 1998 Virtual memory
CS703 - Advanced Operating Systems
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
Translation Buffers (TLBs)
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
CS161 – Design and Architecture of Computer
Virtual Memory.
Sarah Diesburg Operating Systems CS 3430
Andy Wang Operating Systems COP 4610 / CGS 5765
Review What are the advantages/disadvantages of pages versus segments?
Sarah Diesburg Operating Systems COP 4610
Virtual Memory 1 1.
Presentation transcript:

Module: Virtual Memory

Topical Outline Review of virtual memory management Motivation Operation Speeding up address translation via concurrency Speeding up address translation via use of virtually addressed caches

The Memory Hierarchy Memory registers Cache Memory ALU Memory Managed by the compiler Memory Managed by the hardware Managed by the operating system Managed by the operating system Caching is the mode of operation at each level of the hierarchy data movement from tape  disk  memory  cache  registers Control of movement is performed by HW or SW guided by Static vs. run-time management of resources Time scales of operation Span nanoseconds (register access) to seconds (tapes)

Caching Mechanisms Registers: static decisions made by compiler software. TLB/HW cache: runtime decision made on tags, possibly many tags in parallel. VM as cache: mapping (w/HW acceleration) followed by SW-managed refill on miss. File system cache: SW managed Web page cache: SW managed

Transfer controlled by software Caching From Disk Memory registers Cache Memory blocks ALU Memory Transfer controlled by software Memory Programmers have managed use of smaller, faster memories Application controlled overlays Disks are mechanical devices  operational speeds are on the order of milliseconds This is on the order of 100,000s of instructions Software management! Unit of management is large blocks to amortize disk overhead

Virtual Memory Management Follows the same basic principles as cache management Unit of management is a page A page is typically 4Kbytes – 32 Kbytes The program sees a virtual address space partitioned into virtual pages Virtual address space typically exceeds physical memory size The program resides on disk Physical memory is partitioned into physical pages

Management Policies Demand driven operation Placement policy Pages are brought into memory when referenced Placement policy Fully associative placement Replacement policy Approximations to least recently used (LRU) are the most common Update policy At current disk latencies, write-through is infeasible Write-back update policy is employed

Address Translation: Concepts virtual memory pages (located on disk) Physical memory pages base address VPN offset pages PPN offset offset Move page and translate address Address Translation Data Structure PPN VPN Offsets within the virtual page and corresponding physical page are the same We only need to translate the virtual page number (VPN) to the corresponding physical page number (PPN)

Address Translation Implementation: The Page Table VPN offset V state PPN/Disk Address index Page table base register Physical address To cache Translate the virtual page address to the physical page address Keep state information on a page Modified/Not Modified Access rights In memory or on disk Caching policies

Translation Mechanism Address translation requires looking up a page table entry (PTE) in a page table (PT) Store page table in memory PT miss handling (i.e. a page fault) is in SW Facilitates code relocation Many possible page-table organizations: Forward-walked, multilevel tables “backward”-walked (via recursion) multilevel tables hashed “inverse” tables

Caching PTEs: The Translation Lookaside Buffer (TLB) A four entry TLB VPN PPN V state VPN PPN V state VPN PPN V state VPN PPN V state Keep a cache of most recently used PTEs Each PTE corresponds to a “relatively” large part of memory For example, a 16Kbyte page may have 4K instructions A small set of PTEs can cover a large code segment For example, 8 PTEs and 16 Kbyte pages corresponds to a program size of 32K instructions The TLB access time is comparable or better than cache access time Typically operates as a fully associative cache, but can be implemented as a set associative cache

TLB Operation TLB size typically a function of the target domain Memory registers virtual address physical address TLB Cache Memory ALU Memory miss Memory Translate & Update TLB TLB size typically a function of the target domain High end machines will have fully associative large TLBs PTE entries are replaced on a demand driven basis The TLB is in the critical path

The Memory Access Path Memory registers virtual address physical address TLB Cache Memory ALU Memory Memory Virtual-to-physical address translation occurs on every access! Adds to the latency of every memory access How can we optimize the critical path?

Optimizing the Critical Path: Principles Concurrency between address translation and cache access Overlap cache access and TLB translation Making translation the exception Address the cache with the virtual address

Overlapping Cache and TLB Access Direct Mapped Cache TLB State Bits Tag Data 31 : VPN PPN V state VPN PPN V state : : : : : : : VPN PPN V state VPN PPN V state Mux Access TLB with VPN 18 14 Access Cache with Line Address 18 8 6 Example: direct mapped, 16 Kbyte cache with 64 byte lines and 16 Kbyte pages

Overlapping Cache and TLB Access The virtual address offset field is not translated, and is the same in the physical address If all bits of the cache line address comes from the offset field, TLB and cache can be accessed in parallel What if the cache size is doubled? Virtual address 18 14 Concurrent access is no longer possible! physical address interpreted by the cache 17 1 8 6 line address is 9 bits This bit may be different in the physical address

Ensuring Concurrency Increase the associativity of the cache  reduce the number of bits in the line address Increase the page size  increase the number of low order bits of the virtual address that are not translated Constrain the placement policy to ensure line address can be known from the virtual address In the preceding example make sure the least significant bit of the virtual page number and physical page number are equal

Making Address Translation the Exception Memory physical address registers virtual address miss Cache translate Memory ALU Memory hit Memory The cache is addressed using virtual addresses When there is a miss in the cache, address translation takes place producing the physical address used to access main memory Page table entries are also updated

Design Space Relationship between virtual  physical address translation and the indexing of the cache Goal: speed up the critical path Make the common case fast Tag virtual physical VIVT VIPT virtual Conventional Cache Index PIVT PIPT physical

Physically Indexed Physically Tagged Virtual address Virtual Page Number offset Address Translation Physical Page Number offset Physical address Physical address Tag Index Byte Tag Data = No concurrency between address translation and cache index operation

Virtually Indexed Physically Tagged Virtual address Virtual Page Number offset Address Translation Physical Page Number offset Physical address Tag Data = Some concurrency between address translation and cache index operation

Physically Indexed Virtually Tagged Virtual address Virtual Page Number offset Address Translation Physical Page Number offset Physical address Physical address Tag Index Byte Tag Data = Generally not used

Virtually Indexed Virtually Tagged Virtual address Virtual Page Number offset Tag Data = Translations checked on a cache miss What are the issues with this type of cache?

Issues for Virtually Addressed Caches Page Table V Access rights PPN/Disk Address Protection Each access to page normally can be checked against permissions recorded in the PTE during translation Solution: Mechanisms must be provided to for checking access rights

Issues (cont.) False hits in a multi-process system Solutions Multiple processes typically have the same virtual address spaces The same virtual address in two distinct address spaces refer to two distinct physical addresses Solutions Flush on a context swap This is a problem in SMT machines Add an address space identifier (ASID) or process ID (PID) Now all processes effectively have distinct address spaces

Issues (cont.) Synonym or Aliasing Problem Solution: Multiple copies of the same physical memory line exist at distinct locations in the cache This can occur when Two processes may share a physical page This page may be mapped to different parts of the individual virtual address spaces  A shared physical location has two virtual addresses Page size is smaller than the cache Solution: Placement constraints on shared pages

Solving Synonyms shared page B’s Page table A’s Page table Process A Main Memory Process B How do we ensure than any line in the shared (physical) page maps to exactly one line in the cache Constrain the necessary locations in address spaces of A and B Ensures all lower order bits required for addressing the cache are identical

Address Spaces, PIDs, DLLs and SMT* B’s Page table A’s Page table Process A: PID 00 Main Memory Process B: PID 01 Assume page size = cache size and that synonyms have been avoided through the use of PIDs A shared DLL will lead to DLL thrashing in direct mapped caches DLL duplication in set associative caches *F. Mohamood, M. Ghosh, and H.-H. S. Lee. "DLL-Conscious Instruction Fetch Optimization for SMT Processors." In Proceedings of the 2nd Watson Conference on Interaction between Architecture, Circuits, and Compilers (P=AC2), pp.143-152, Yorktown Heights, NY, September 2005.

Alpha Memory Management Paged, segmented address space Each page table fits in a page 8KB pages L1 & L2 tables use physical addresses L3 is virtually mapped 64 bit PTE 32 bit frame address Status/protection bits Protection Pages tables protected

Address space identifier DEC Alpha TLB Address space identifier <8> <31> <35> ASN VPN PPN V state The TLB is not flushed on a context switch since the ASN is used TLB is flushed when ASNs are recycled Actual address size is a function of the processor mode

Some Example Memory Hierarchies Intel P6 Core DEC Alpha 21264 Split I + D (64K+6K) Split I + D TLBs (128) two-cycle L1 access 2-way set associative 8-entry victim cache L2 up to 16MB L2 access 12 cycles Aside: 6 FUs FUs in clusters 4-wide fetch 20+15 queues Core of PPro, PII, PIII Split I + D L1 (8K+8K) Split I + D TLBs (32+64) L2 originally on MCM PIII has 2x cache + L2 “Xeon” has big L2 Aside: note 5 FUs and 3-wide fetch unit, 40-entry ROB

Study Guide Differences between SRAM/DRAM in operation and performance Given a memory organization determine the miss penalty in cycles Cache basics Mappings from main memory to locations in the cache hierarchy Computation of CPI impact of miss penalties, miss rate, and hit times Computation CPI impact of update strategies Choosing between word interleaved and independent memory bank organization

Study Guide (cont.) Find a skewing scheme to ensure concurrent accesses to a given data structure For example, diagonals of a matrix Sub-blocks of a matrix Evaluate the CPI impact of various optimizations Relate mapping of data structures to main memory (such as matrices) to cache behavior and the behavior of optimizations

Study Guide Be able to trace through the page table and cache data structures on a memory reference (see sample problems) Understand how to allocate virtual pages to page frames to minimize conflicts in the cache Relationships between address translation, page size, and cache size. For example, when is concurrency possible between cache indexing and address translation Given a memory system design (page sizes, virtual and physical address spaces, cache parameters) how would you modify the design to improve performance For example, address translation time, page the page table

Study Guide Virtually indexed caches Problems & solutions Using 32 bit virtual addresses create an instance of synonyms Using 32 bit virtual addresses show how a false hit can take place on a context swap if the cache is not flushed or if PIDs are not used Trace a memory reference through the complete Alpha memory hierarchy Given a cache design and virtual address space and page size, define by VPN the pages that conflict in the cache