Computer Architecture Virtual Memory (VM)

Slides:



Advertisements
Similar presentations
Chapter 4 Memory Management Basic memory management Swapping
Advertisements

Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.
OS Fall’02 Virtual Memory Operating Systems Fall 2002.
Virtual Memory Introduction to Operating Systems: Module 9.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
OS Spring ‘04 Paging and Virtual Memory Operating Systems Spring 2004.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley and Rabi Mahapatra & Hank Walker.
Translation Look-Aside Buffers TLBs usually small, typically entries Like any other cache, the TLB can be fully associative, set associative,
Paging and Virtual Memory. Memory management: Review  Fixed partitioning, dynamic partitioning  Problems Internal/external fragmentation A process can.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
Vm Computer Architecture Lecture 16: Virtual Memory.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Computer Architecture Lecture 28 Fasih ur Rehman.
Lecture 19: Virtual Memory
IT253: Computer Organization
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.
© 2004, D. J. Foreman 1 Virtual Memory. © 2004, D. J. Foreman 2 Objectives  Avoid copy/restore entire address space  Avoid unusable holes in memory.
Computer Architecture 2012 – virtual memory 1 Computer Architecture Virtual Memory (VM) By Dan Tsafrir, 10/6/2011 Presentation based on slides by Lihu.
Memory Management Fundamentals Virtual Memory. Outline Introduction Motivation for virtual memory Paging – general concepts –Principle of locality, demand.
Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
CS203 – Advanced Computer Architecture Virtual Memory.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Computer Architecture 2010 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.
Virtual Memory Chapter 8.
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Virtual Memory Chapter 7.4.
ECE232: Hardware Organization and Design
CS161 – Design and Architecture of Computer
CS352H: Computer Systems Architecture
Lecture 12 Virtual Memory.
From Address Translation to Demand Paging
CS703 - Advanced Operating Systems
From Address Translation to Demand Paging
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Computer Architecture Virtual Memory (VM)
Memory Hierarchy Virtual Memory, Address Translation
CSE 153 Design of Operating Systems Winter 2018
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
EECE.4810/EECE.5730 Operating Systems
ECE 445 – Computer Organization
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
CSE 451: Operating Systems Autumn 2005 Memory Management
Translation Buffers (TLB’s)
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Contents Memory types & memory hierarchy Virtual memory (VM)
CSE451 Virtual Memory Paging Autumn 2002
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Translation Buffers (TLB’s)
CSC3050 – Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CS703 - Advanced Operating Systems
CSE 451: Operating Systems Winter 2005 Page Tables, TLBs, and Other Pragmatics Steve Gribble 1.
CSE 153 Design of Operating Systems Winter 2019
Translation Buffers (TLBs)
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
COMP755 Advanced Operating Systems
Operating Systems: Internals and Design Principles, 6/E
Virtual Memory.
Review What are the advantages/disadvantages of pages versus segments?
Virtual Memory 1 1.
Presentation transcript:

Computer Architecture Virtual Memory (VM) By Dan Tsafrir, 23/5/2011 Presentation based on slides by Lihu Rappoport

http://www.youtube.com/watch?v=3ye2OXj32DM (funny beginning)

DRAM (dynamic random-access memory) Corsair 1333 MHz DDR3 Laptop Memory Price (at amazon.com): $43 for 4 GB $79 for 8 GB “The physical memory”

VM – motivation Provides isolation between processes Processes can concurrently run on a single machine Vm prevents them from accessing the memory of one another (But still allows for convenient sharing when required) Provides illusion of large memory VM size can be bigger than physical memory size VM decouples program from real size (can differ across machines) Provides illusion of contiguous memory Programmers need not worry about where data is placed exactly Allows for memory dynamic growth Can add memory to processes at runtime as needed Allows for memory overcommitment Sum of VM spaces (across all processes) can be >= physical DRAM often one of the most costly parts in the system

VM – terminology Virtual address space Space used by the programmer “Ideal” = contagious & as big is you’d like Physical address The real, underlying physical memory address Completely abstracted away by OS/HW

VM – basic idea Divide memory (virtual & physical) into fixed size blocks “page” = chunk of contagious data in virtual space “frame” = physical memory exactly enough to hold one page |page| = |frame| (= size) page size = power of 2 = 2k (bytes) By default, k=12 almost always => page size is 4KB While virtual address space is contiguous Pages can be mapped into arbitrary frames Pages can reside In memory or on disk (hence, overcommitment) All programs are written using vm address space HW does on-the-fly translation from virtual and physical addresses Use a page table to translate between virtual and physical addresses

VM – simplistic illustration address translation frames (DRAM) pages (virtual space) disk Memory acts as a cache for the secondary storage (disk) Immediate advantages Illusion of contiguity & of having more physical memory Program actual location unimportant Dynamic growth, isolation, & sharing are easy to obtain

Translation – use a “page table” virtual address (64bit) 63 12 11 virtual page number (52bit) page offset (12bit) how to map? physical frame number (20bit) page offset (12bit) physical address (32bit) (page size is typically 212 byte = 4KB)

Translation – use a “page table” V D AC frameNumber page table base register access control dirty bit 1 valid bit (page size is typically 212 byte = 4KB)

Translation – use a “page table” 63 page offset (12bit) 11 virtual page number (52bit) physical frame number (20bit) 31 virtual address (64bit) physical address (32bit) V D frameNumber 1 page table base register valid bit dirty bit 12 AC access control (page size is typically 212 byte = 4KB)

Translation – use a “page table” V D AC frameNumber “PTE” (page table entry)

points to memory frame or disk address Page tables Page Table points to memory frame or disk address Virtual page number Physical Memory Valid 1 1 1 1 1 1 1 Disk 1 1

Checks If ( valid == 1 ) page is in main memory at frame address stored in table  Data is readily available (e.g., can copy it to the cache) else /*page fault */ need to fetch page from disk  causes a trap, usually accompanied by a context switch: current process suspended while page is fetched from disk Access Control R=read-only, R/W=read/write, X=execute If ( access type incompatible with specified access rights )  protection violation fault  traps to fault-handler Demand paging Pages fetched from secondary memory only upon the first fault Rather then, e.g., upon file open

Page replacement Page replacement policy Decided which page to place on disk LRU (least recently used) Typically too wasteful (updated upon each memory reference) FIFO (first in first out) Simplest: no need to update upon references, but ignores usage Second-chance Set per-page “was it referenced?” bit (can be done by HW or SW) Swap out first page with bit = 0, FIFO order When traversed, if bit = 1, set it to be 0 and push the associated page to end of the list (in FIFO terms, page becomes newest) Clock More efficient variant of second-chance Pages are cyclically ordered (no FIFO); search clockwise for first page with bit=0; set bit=0 for pages that have bit=1

Page replacement – cont. NRU (not recently used) More sophisticated LRU approximation HW or SW maintains per-page ‘referenced’ & ‘modified’ bits Periodically (clock interrupt), SW turns ‘referenced’ off Replacement algorithm partitions pages to Class 0: not referenced, not modified Class 1: not referenced, modified Class 2: referenced, not modified Class 3: referenced, modified Choose at random a page from the lowest class for removal Underlying principles (order is important): Prefer keeping referenced over unreferenced Prefer keeping modified over unmodified Can a page be modified but not referenced?

Page replacement – advanced ARC (adaptive replacement cache) Factors not only recency (when latest access), but also frequency (how many times accessed) User determines which factor has more weight Better (but more wasteful) than LRU Develop by IBM: Nimrod Megiddo & Dharmendra Modha Details: http://www.usenix.org/events/fast03/tech/full_papers/megiddo/megiddo.pdf CAR (clock with adaptive replacement) Similar to ARC, and comparable in performance But, unlike ARC, doesn’t require user-specified parameters Likewise developed by IBM: Sorav Bansal & Dharmendra Modha Details: http://www.usenix.org/events/fast04/tech/full_papers/bansal/bansal.pdf

Page faults Page faults: the data is not in memory  retrieve it from disk CPU detects the situation (valid=0) But it cannot remedy the situation (doesn’t know disk; it’s the OS job) Thus, it must trap to OS OS loads page from disk Possibly writing victim page to disk (if no room & if dirty) Possibly avoids reading from disk due to OS “buffer cache” OS updates page table (valid=1) OS resumes process; now, HW will retry & succeed! Page fault incurs a significant penalty “Major” page fault = must go get page from disk “Minor” page fault = page already resides in OS buffer cache Possible only for files; not for “anonymous” spaces like the stack => pages shouldn’t be too small (as noted, typically 4KB)

Page size Smaller page size (typically 4KB) PROS: minimizes internal fragmentation CONS: increase size of page table Bigger size (called “superpages” if > 4K) PROS: Amortize disk access cost May prefetch useful data May discard useless data early CONS: Increased fragmentation Might transfer unnecessary info at the expense of useful info Lots of work to increase page size beyond 4K HW supports it for years; OS is the “bottleneck” Attractive because: Bigger DRAMs, increasing memory/disk performance gap

TLB (translation lookaside buffer) Page table resides in memory Each translation requires a memory access Might be required for each load/store! TLB Cache recently used PTEs speed up translation typically 128 to 256 entries usually 4 to 8 way associative TLB access time is comparable to L1 cache access time Yes No TLB Hit ? Access Page Table Virtual Address Physical Addresses TLB Access

Making Address Translation Fast TLB is a cache for recent address translations: Valid 1 Physical Memory Disk Virtual page number Page Table Valid Tag Physical Page TLB Physical Page Or Disk Address

TLB Access Virtual page number Offset Tag Set PTE Hit/Miss Way 0 Way 1 = = = = Way MUX PTE Hit/Miss

Unified L2 L2 is unified (no separation for data/inst) – as the main memory In case of a miss in either: d-L1, i-L1, d-TLB, or i-TLB => try to get missed data from L2 PTEs can and do reside in L2 L1 Data Cache L1 Instruction cache L2 cache Memory translations translations Data TLB Instruction TLB

VM & cache Yes No Access TLB Page Table In Memory Cache Virtual Address L1 Cache Hit ? Physical Addresses Data Memory L2 Cache TLB access is serial with cache access => performance is crucial! Page table entries can be cached in L2 cache (as data)

Overlapped TLB & cache access VM view of a Physical Address Page offset 11 Physical Page Number 12 29 Cache view of a Physical Address disp 13 tag 14 29 5 set 6 #Set is not contained within the Page Offset The #Set is not known until the physical page number is known Cache can be accessed only after address translation done

Overlapped TLB & cache access (cont) Virtual Memory view of a Physical Address 29 12 11 Physical Page Number Page offset Cache view of a Physical Address 29 12 11 6 5 disp tag set In the above example #Set is contained within the Page Offset The #Set is known immediately Cache can be accessed in parallel with address translation Once translation is done, match upper bits with tags Limitation: Cache ≤ (page size × associativity)

Overlapped TLB & cache access (cont) Virtual page number Page offset Tag Set set disp TLB Hit/Miss Way MUX = Cache Set# Set# Physical page number = = = = = = = = Way MUX Hit/Miss Data

Overlapped TLB & cache access (cont) Assume cache is 32K Byte, 2 way set-associative, 64 byte/line (215/ 2 ways) / (26 bytes/line) = 215-1-6 = 28 = 256 sets In order to still allow overlap between set access and TLB access Take the upper two bits of the set number from bits [1:0] of the VPN Physical_addr[13:12] may be different than virtual_addr[13:12] Tag is comprised of bits [31:12] of the physical address The tag may mis-match bits [13:12] of the physical address Cache miss  allocate missing line according to its virtual set address and physical tag 29 12 11 Physical Page Number Page offset 29 14 13 12 11 6 5 set disp tag VPN[1:0]

Swap & DMA (direct memory access) DMA copies page to disk controller Access memory without requiring CPU involvement Reads each line: Executes snoop-invalidate for each line in the cache (both L1 and L2) If the line resides in the cache: if it is modified reads its line from the cache into memory invalidates the line Writes the line to the disk controller This means that when a page is swapped-out of memory All data in the caches which belongs to that page is invalidated The page in the disk is up-to-date The TLB is snooped If the TLB hits for the swapped-out page, TLB entry is invalidated In the page table Assign 0 to valid bit in PTE of swapped-out pages The rest of the PTE bits may be used by the OS for keeping the location of the page on disk

Context switch Each process has its own address space Akin to saying “each process has its own page table” OS allocates frames for process => updates its page table If only one PTE points to frame throughput the system Only the associated process can access the corresponding frame Shared memory Two PTEs of two processes point to the same frame Upon context switching Save current architectural state to memory Architectural registers Register that holds the page table base address in memory Flush TLB Same virtual addresses are routinely resused Load the new architectural state from memory

Virtually-addressed cache Cache uses virtual addresses (tags are virtual) Only require address translation on cache miss TLB not in path to cache hit! But… Aliasing: 2 virtual addresses mapped to same physical address => 2 cache lines holding data of same physical address  => Must update all cache entries with same physical address  data Trans- lation Cache Main Memory VA hit PA CPU

Virtually-addressed cache Cache must be flushed at task switch Possible solution: include unique process ID (PID) in tag How to share & synchronize memory among processes As noted, must permit multiple virtual pages to refer to same physical frame Problem: incoherence if they point to different physical pages Solution: require sufficiently many common virtual LSB With direct mapped cache, guarantied that they all point to same physical page