Lecture 24 – Paging implementation 2017.07.18 “If the dream is a translation of waking life, waking life is also a translation of the dream.” - Rene Magritte © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Paging implementation Paging adds an extra indirection to main memory access Use hardware support in the form of a Memory Management Unit to reduce access time (overhead) Typically, Access to configure the MMU restricted to the operating system Operating system runs without memory mapping Processor mode changes to virtual when running an application © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Where are page tables stored? In main memory, above the operating system Today, a Memory Management Unit (MMU) on a chip external to the processor is common © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Addressing and translation CPU Address bus (one-way) Data bus (bidirectional) Virtual-address bus L1, L2, … cache Virtual-address bus Memory Management Unit (MMU) Page Table Register Translation Look-aside Buffer (TLB) DRAM Page table 0 Page table 1 … Page table n Physical-address bus I/O devices © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Virtually-addressed caches (L1, L2, L3) Typically, MMU is below the caches (L1, L2, L3) in the memory hierarchy Means addresses used by cache hardware are virtual Every process uses the same set of virtual addresses, e.g., 0x00000000 to 0xFFFFFFFF How distinguish a block of one process from the block of another? © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Disambiguating cache blocks Two approaches for distinguishing cache blocks from two different processes but with identical virtual addresses Operating system invalidates all cache blocks when switching execution from one process to another Cache hardware augmented to include a process ID alongside the tag © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Fast address translation is essential Address translations occur for each memory access that goes outside the processor chip Today, typically, this is any memory access that misses in the lowest cache level Translation performed by the MMU Translation time contributes to the von Neumann bottleneck Page table lookup requires a slow DRAM access A faster way: cache recent translations © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Translation lookaside buffer A translation cache invented by IBM In a size N content-addressable memory (CAM) store recent translations: (Virtual page number, hardware page frame) When a translation is needed MMU queries CAM at same time it starts page table lookup Translation time = min(CAM, table lookup) Program locality means CAM has needed translation 90% of the time for reasonable N Intel calls it a cache. I7 TLB is 4-way set associative with 64 entries (4K pages) for data For instructions it’s 6-way set associative with 1536 entries. © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Memory Management Unit (MMU) Page Table Register Holds a pointer to the current page table, the one to use for translation for the current process This register is updated with every context switch from one process to the next One page table in main memory per process Page table Holds virtual page to physical page frame mapping Holds valid bit, dirty bit, permission bits and other metadata for each page © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
MMU part 2 Virtual to physical translation using the page table means reading main memory (slow) Improve performance of virtual memory by caching the N most recent translations in an N-entry CAM called the Translation Look-aside Buffer (TLB) A page fault (miss) is signaled by the MMU Lookup is called a page walk © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Practical page tables Page table maps each virtual page; for a large virtual address space (desirable) this implies a large table. Example, 40-bit byte-address memory: 40-bit virtual address & 16 Kbyte pages (214 bytes) 240/214=226 pages each with, say 4-byte table entry = 0.25 GB size of page table for each process Most programs use a small portion of the full virtual address space, so most page table entries are marked “invalid” and, thus, are wasted main memory Reduce main memory waste by using a multi-level page table © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Two-level page table Virtual page number field split into two parts: first-level and second-level page numbers Every process has a first-level table Second-level tables allocated only when will contain a valid entry; no more vast extents of table to hold translations marked Invalid © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Two-Level page tables The page number is divided into two parts: first-level page number and the second-level page number Example: VM address:0x00402657 Offset=0x657 (last 3 hex digits) 1st level index = 0x001 , 2nd level index = 0x002 First-level index (10 bits) Second-level index (10 bits) Offset (12 bits) 1024 slots pointing to pages that hold 1024 more 00 0000 0001 00 0000 0010 0110 0101 0111 First level Second level Offset © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Address Translation Virtual address 31 22 21 12 11 0 210-1 0x65000 Second-level page tables (multiple tables for each process ONLY as needed) Page frame number in physical mem Virtual address 210-1 1st level 10 bits 2nd level 10 bits Offset 12 bits … 9 10 10 31 22 21 12 11 0 9 8 First Level Page Table (one for each process); hold pointer to second-level table 0x65000 210-1 7 7 6 5 5 2 210-1 0x65000 4 4 0x70000 3 2 0x45000 1 0x70000 0x45000 Red = various translations in tables © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Address Translation j i Virtual address = 0x00402657 0x001 0x002 0x657 Second-level page tables (multiple tables for each process ONLY as needed) Page frame number in physical mem 210-1 … 0x001 0x002 0x657 9 10 10 31 22 21 12 11 0 9 8 First Level Page Table (one for each process); hold pointer to second-level table 0x65000 210-1 7 7 6 5 j 5 2 210-1 0x65000 4 4 0x70000 3 2 0x45000 1 i 0x70000 0x45000 Blue = translation of 0x00402657 Black = other translations in tables © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Page metadata In example, each table entry uses 20 bits to store the page number (232/212 = 220 pages) Given a 32-bit memory word, use other 12 bits to store metadata Valid bit, dirty bit Permission bits: read, write, execute Bits to support page replacement algorithm © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Types of page fault CPU tries to access a page not resident in main memory Metadata valid bit marked “invalid” MMU will generate a signal to the operating system kernel to load the page from disk CPU tries a memory access that violates one or more permission bits MMU generates a signal to the operating system which stops the process with a SEGV or other © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Processing a Page Fault A program tries to read/write a location in memory that is in a non-resident page. This could happen when: fetching the next instruction to execute or trying to read/write memory not resident in RAM 2. The MMU tries to look up the VM address and finds that the page is not resident using the resident bit. Then the MMU generates a page fault, that is an interrupt from the MMU 3. Save return address and registers in the stack Restarting instructions can be tricky – restart from beginning? Side effects (autoincrementing address modes) Hardware support for tracking side-effects and rollback. © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Processing a Page Fault 4. The CPU looks up the interrupt handler that corresponds to the page fault in the interrupt vector and jumps to this interrupt handler 5. In the page fault handler If the VM address corresponds to a page that is not valid for this process, then generate a SEGV signal to the process. The default behavior for SEGV is to kill the process and dump core Otherwise, if VM address is in a valid page, then the page has to be loaded from disk. © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Processing a Page Fault 6. Find a free page in physical memory. If there are no free pages, then use one that is in use and write to disk if modified 7. Load the page from disk and update the page table with the address of the page replaced. Also, clear the modified and access bits 8. Restore registers, return and retry the offending instruction Thrashing – working set exceeds capacity of physical memory, continuous swapping © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Processing a Page Fault The page fault handler retries the offending instruction at the end of the page fault The page fault is completely transparent to the program, that is, the program will have no knowledge that the page fault occurred. © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Using mmap The mmap() function establishes a mapping between a process's address space and a file or a shared memory object. #include <sys/mman.h> void *mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t off); Mmap returns the address of the memory mapping and it will be always aligned to a page size (addr%PageSize==0) The data in the file can be read/written as if it were memory Flags MAP_ANONYMOUS, MAP_FIXED etc etc © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Using mmap File mmap Disk Memory ptr = mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0) 0xFFFFFFFF File ptr= 0x00020000 mmap Updates shared with other processes, flushed to underlying file intermittently (msync()) 0x00000000 Disk Memory © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Virtual memory for speed VM also speeds up the execution of programs: Mmap the text segment of an executable or shared library Mmap the data segment of a program Use of VM during fork to copy memory of the parent into the child Allocate zero-initialized memory. it is used to allocate space for bss, stack and sbrk() Shared Memory © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
1. Mmap the text segment of an executable or a shared library initially mmap does not read any pages any pages will be loaded on demand when they are accessed startup time is fast because only the pages needed will be loaded instead of the entire program It also saves RAM because only the portions of the program that are needed will be in RAM © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
1. Mmap the text segment of an executable or a shared library Physical pages where the text segment is stored is shared by multiple instances of the same program. Protections: PROT_READ|PROT_EXEC Flags: MAP_PRIVATE Implements copy-on-write if protections allow © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
1. Mmap the text segment of an executable or a shared library 0xFFFFFFFF text 0x00020000 mmap text Executable File 0x00000000 Disk Virtual Memory © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
1. Mmap the text segment of an executable or a shared library Physical Pages of the text section are shared across multiple processes running the same program/shared library. text text text Process 1 Virtual Memory Process 2 Virtual Memory Physical Memory © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
2. Mmap the data segment of a program During the loading of a program, the OS mmaps the data segment of the program The data segment contains initialized global variables. Multiple instances of the same program will share the same physical memory pages where the data segment is mapped as long as the page is not modified If a page is modified, the OS will create a copy of the page and make the change in the copy. This is called "copy on write" © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
2. Mmap the data segment of a program Processes running the same program will share the same unmodified physical pages of the data segment Data page A Data page A Data page A Data page B Data page B Data page B Data page C Data page C Data page C Process 1 Virtual Memory Process 2 Virtual Memory Physical Memory © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
2. Mmap the data segment of a program When a process modifies a page, it creates a private copy (A*). This is called copy-on-write. Data page A* Data page A Data page A Data page B Data page B Data page B Data page C Data page C Data page C Data page A* Process 1 Virtual Memory Process 2 Virtual Memory Physical Memory © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
3. Use of VM during fork to copy memory of the parent into the child After forking, the child gets a copy of the memory of the parent Both parent and child share the same RAM pages (physical memory) as long as they are not modified When a page is modified by either parent or child, the OS will create a copy of the page in RAM and will do the modifications on the copy © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
3. Use of VM during fork to copy memory of the parent into the child The copy on write in fork is accomplished by making the common pages read-only. The OS will catch the modifications during the page fault and it will create a copy and update the page table of the writing process. Then it will retry the modify instruction. © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
3. Use of VM during fork to copy memory of the parent into the child After fork() both parent and child will use the same pages page A page A page A page B page B page B page C page C page C Parent’s Virtual Memory Child’s Virtual Memory Physical Memory © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
3. Use of VM during fork to copy memory of the parent into the child When the chhild or parent modifies a page, the OS creates a private copy (A*) for the process. This is called copy-on-write. page A* page A page A page B page B page B page C page C page C page A* Parent’s Virtual Memory Child’s Virtual Memory Physical Memory © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
4. Allocate zero-initialized memory It is used to allocate space for bss and sbrk() When allocating memory using sbrk or map with the MAP_ANONYMOUS flag, all the VM pages in this mapping will map to a single page in RAM that has zeroes and that is read only. When a page is modified the OS creates a copy of the page (copy on write) and retries the modifying instruction This allows fast allocation. No RAM is initialized to zeros until the page is modified This also saves RAM. only modified pages use RAM. Used initially by the stack, but as execution continues, not guaranteed to be 0. © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
4. Allocate zero-initialized memory. This is implemented by making the entries in the same page table point to a page with 0s and making the pages read only. An instruction that tries to modify the page will get a page fault. The page fault allocates another physical page with 0’s and updates the page table to point to it. The instruction is retried and the program continues as if it never happened. © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
4. Allocate zero-initialized memory. After allocating zero initialized memory with sbrk or mmap, all pages point to a single page with zeroes page A 0’s page B 0’s 0’s page C 0’s Parent’s Virtual Memory Physical Memory © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
4. Allocate zero-initialized memory. When a page is modified, the page creates a copy of the page and the modification is done in the copy. page A 0’s page B X 0’s page C 0’s page B X Parent’s Virtual Memory Physical Memory © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra
Summary Demand paged virtual memory Is a form of caching Makes use of fixed size idea to speed operation Lets main memory be shared in many ways Two-level page tables conserve main memory Speeds creation of child processes, allocation of zero-initialized memory © 2017 by George B. Adams III Portions © 2017 Dr. Jeffrey A. Turkstra