Download presentation
Presentation is loading. Please wait.
1
Computer Architecture Virtual Memory (VM)
By Dan Tsafrir, 23/5/2011 Presentation based on slides by Lihu Rappoport
2
http://www.youtube.com/watch?v=3ye2OXj32DM (funny beginning)
3
DRAM (dynamic random-access memory)
Corsair 1333 MHz DDR3 Laptop Memory Price (at amazon.com): $43 for 4 GB $79 for 8 GB “The physical memory”
4
VM – motivation Provides isolation between processes
Processes can concurrently run on a single machine Vm prevents them from accessing the memory of one another (But still allows for convenient sharing when required) Provides illusion of large memory VM size can be bigger than physical memory size VM decouples program from real size (can differ across machines) Provides illusion of contiguous memory Programmers need not worry about where data is placed exactly Allows for memory dynamic growth Can add memory to processes at runtime as needed Allows for memory overcommitment Sum of VM spaces (across all processes) can be >= physical DRAM often one of the most costly parts in the system
5
VM – terminology Virtual address space Space used by the programmer
“Ideal” = contagious & as big is you’d like Physical address The real, underlying physical memory address Completely abstracted away by OS/HW
6
VM – basic idea Divide memory (virtual & physical) into fixed size blocks “page” = chunk of contagious data in virtual space “frame” = physical memory exactly enough to hold one page |page| = |frame| (= size) page size = power of 2 = 2k (bytes) By default, k=12 almost always => page size is 4KB While virtual address space is contiguous Pages can be mapped into arbitrary frames Pages can reside In memory or on disk (hence, overcommitment) All programs are written using vm address space HW does on-the-fly translation from virtual and physical addresses Use a page table to translate between virtual and physical addresses
7
VM – simplistic illustration
address translation frames (DRAM) pages (virtual space) disk Memory acts as a cache for the secondary storage (disk) Immediate advantages Illusion of contiguity & of having more physical memory Program actual location unimportant Dynamic growth, isolation, & sharing are easy to obtain
8
Translation – use a “page table”
virtual address (64bit) 63 12 11 virtual page number (52bit) page offset (12bit) how to map? physical frame number (20bit) page offset (12bit) physical address (32bit) (page size is typically 212 byte = 4KB)
9
Translation – use a “page table”
V D AC frameNumber page table base register access control dirty bit 1 valid bit (page size is typically 212 byte = 4KB)
10
Translation – use a “page table”
63 page offset (12bit) 11 virtual page number (52bit) physical frame number (20bit) 31 virtual address (64bit) physical address (32bit) V D frameNumber 1 page table base register valid bit dirty bit 12 AC access control (page size is typically 212 byte = 4KB)
11
Translation – use a “page table”
V D AC frameNumber “PTE” (page table entry)
12
points to memory frame or disk address
Page tables Page Table points to memory frame or disk address Virtual page number Physical Memory Valid 1 1 1 1 1 1 1 Disk 1 1
13
Checks If ( valid == 1 ) page is in main memory at frame address stored in table Data is readily available (e.g., can copy it to the cache) else /*page fault */ need to fetch page from disk causes a trap, usually accompanied by a context switch: current process suspended while page is fetched from disk Access Control R=read-only, R/W=read/write, X=execute If ( access type incompatible with specified access rights ) protection violation fault traps to fault-handler Demand paging Pages fetched from secondary memory only upon the first fault Rather then, e.g., upon file open
14
Page replacement Page replacement policy
Decided which page to place on disk LRU (least recently used) Typically too wasteful (updated upon each memory reference) FIFO (first in first out) Simplest: no need to update upon references, but ignores usage Second-chance Set per-page “was it referenced?” bit (can be done by HW or SW) Swap out first page with bit = 0, FIFO order When traversed, if bit = 1, set it to be 0 and push the associated page to end of the list (in FIFO terms, page becomes newest) Clock More efficient variant of second-chance Pages are cyclically ordered (no FIFO); search clockwise for first page with bit=0; set bit=0 for pages that have bit=1
15
Page replacement – cont.
NRU (not recently used) More sophisticated LRU approximation HW or SW maintains per-page ‘referenced’ & ‘modified’ bits Periodically (clock interrupt), SW turns ‘referenced’ off Replacement algorithm partitions pages to Class 0: not referenced, not modified Class 1: not referenced, modified Class 2: referenced, not modified Class 3: referenced, modified Choose at random a page from the lowest class for removal Underlying principles (order is important): Prefer keeping referenced over unreferenced Prefer keeping modified over unmodified Can a page be modified but not referenced?
16
Page replacement – advanced
ARC (adaptive replacement cache) Factors not only recency (when latest access), but also frequency (how many times accessed) User determines which factor has more weight Better (but more wasteful) than LRU Develop by IBM: Nimrod Megiddo & Dharmendra Modha Details: CAR (clock with adaptive replacement) Similar to ARC, and comparable in performance But, unlike ARC, doesn’t require user-specified parameters Likewise developed by IBM: Sorav Bansal & Dharmendra Modha Details:
17
Page faults Page faults: the data is not in memory retrieve it from disk CPU detects the situation (valid=0) But it cannot remedy the situation (doesn’t know disk; it’s the OS job) Thus, it must trap to OS OS loads page from disk Possibly writing victim page to disk (if no room & if dirty) Possibly avoids reading from disk due to OS “buffer cache” OS updates page table (valid=1) OS resumes process; now, HW will retry & succeed! Page fault incurs a significant penalty “Major” page fault = must go get page from disk “Minor” page fault = page already resides in OS buffer cache Possible only for files; not for “anonymous” spaces like the stack => pages shouldn’t be too small (as noted, typically 4KB)
18
Page size Smaller page size (typically 4KB)
PROS: minimizes internal fragmentation CONS: increase size of page table Bigger size (called “superpages” if > 4K) PROS: Amortize disk access cost May prefetch useful data May discard useless data early CONS: Increased fragmentation Might transfer unnecessary info at the expense of useful info Lots of work to increase page size beyond 4K HW supports it for years; OS is the “bottleneck” Attractive because: Bigger DRAMs, increasing memory/disk performance gap
19
TLB (translation lookaside buffer)
Page table resides in memory Each translation requires a memory access Might be required for each load/store! TLB Cache recently used PTEs speed up translation typically 128 to 256 entries usually 4 to 8 way associative TLB access time is comparable to L1 cache access time Yes No TLB Hit ? Access Page Table Virtual Address Physical Addresses TLB Access
20
Making Address Translation Fast
TLB is a cache for recent address translations: Valid 1 Physical Memory Disk Virtual page number Page Table Valid Tag Physical Page TLB Physical Page Or Disk Address
21
TLB Access Virtual page number Offset Tag Set PTE Hit/Miss Way 0 Way 1
= = = = Way MUX PTE Hit/Miss
22
Unified L2 L2 is unified (no separation for data/inst) – as the main memory In case of a miss in either: d-L1, i-L1, d-TLB, or i-TLB => try to get missed data from L2 PTEs can and do reside in L2 L1 Data Cache L1 Instruction cache L2 cache Memory translations translations Data TLB Instruction TLB
23
VM & cache Yes No Access TLB Page Table In Memory Cache Virtual Address L1 Cache Hit ? Physical Addresses Data Memory L2 Cache TLB access is serial with cache access => performance is crucial! Page table entries can be cached in L2 cache (as data)
24
Overlapped TLB & cache access
VM view of a Physical Address Page offset 11 Physical Page Number 12 29 Cache view of a Physical Address disp 13 tag 14 29 5 set 6 #Set is not contained within the Page Offset The #Set is not known until the physical page number is known Cache can be accessed only after address translation done
25
Overlapped TLB & cache access (cont)
Virtual Memory view of a Physical Address 29 12 11 Physical Page Number Page offset Cache view of a Physical Address 29 12 11 6 5 disp tag set In the above example #Set is contained within the Page Offset The #Set is known immediately Cache can be accessed in parallel with address translation Once translation is done, match upper bits with tags Limitation: Cache ≤ (page size × associativity)
26
Overlapped TLB & cache access (cont)
Virtual page number Page offset Tag Set set disp TLB Hit/Miss Way MUX = Cache Set# Set# Physical page number = = = = = = = = Way MUX Hit/Miss Data
27
Overlapped TLB & cache access (cont)
Assume cache is 32K Byte, 2 way set-associative, 64 byte/line (215/ 2 ways) / (26 bytes/line) = = 28 = 256 sets In order to still allow overlap between set access and TLB access Take the upper two bits of the set number from bits [1:0] of the VPN Physical_addr[13:12] may be different than virtual_addr[13:12] Tag is comprised of bits [31:12] of the physical address The tag may mis-match bits [13:12] of the physical address Cache miss allocate missing line according to its virtual set address and physical tag 29 12 11 Physical Page Number Page offset 29 14 6 5 set disp tag VPN[1:0]
28
Swap & DMA (direct memory access)
DMA copies page to disk controller Access memory without requiring CPU involvement Reads each line: Executes snoop-invalidate for each line in the cache (both L1 and L2) If the line resides in the cache: if it is modified reads its line from the cache into memory invalidates the line Writes the line to the disk controller This means that when a page is swapped-out of memory All data in the caches which belongs to that page is invalidated The page in the disk is up-to-date The TLB is snooped If the TLB hits for the swapped-out page, TLB entry is invalidated In the page table Assign 0 to valid bit in PTE of swapped-out pages The rest of the PTE bits may be used by the OS for keeping the location of the page on disk
29
Context switch Each process has its own address space
Akin to saying “each process has its own page table” OS allocates frames for process => updates its page table If only one PTE points to frame throughput the system Only the associated process can access the corresponding frame Shared memory Two PTEs of two processes point to the same frame Upon context switching Save current architectural state to memory Architectural registers Register that holds the page table base address in memory Flush TLB Same virtual addresses are routinely resused Load the new architectural state from memory
30
Virtually-addressed cache
Cache uses virtual addresses (tags are virtual) Only require address translation on cache miss TLB not in path to cache hit! But… Aliasing: 2 virtual addresses mapped to same physical address => 2 cache lines holding data of same physical address => Must update all cache entries with same physical address data Trans- lation Cache Main Memory VA hit PA CPU
31
Virtually-addressed cache
Cache must be flushed at task switch Possible solution: include unique process ID (PID) in tag How to share & synchronize memory among processes As noted, must permit multiple virtual pages to refer to same physical frame Problem: incoherence if they point to different physical pages Solution: require sufficiently many common virtual LSB With direct mapped cache, guarantied that they all point to same physical page
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.