Address Translation and Virtual Memory CENG331 - Computer Organization Instructor: Murat Manguoglu (Section 1) Adapted from: http://inst.eecs.berkeley.edu/~cs152
Bare Machine Physical Address PC D E M Physical Address W Inst. Cache Decode Data Cache + Memory Controller Physical Address Physical Address Physical Address Main Memory (DRAM) In a bare machine, the only kind of address is a physical address CS252 S05
How could location independence be achieved? Absolute Addresses EDSAC, early 50’s Only one program ran at a time, with unrestricted access to entire machine (RAM + I/O devices) Addresses in a program depended upon where the program was to be loaded in memory But it was more convenient for programmers to write location-independent subroutines How could location independence be achieved? Location independence: Load time rewriting PC relative branches PC relative data access or base pointer on entry to routine. Linker and/or loader modify addresses of subroutines and callers when building a program memory image CS252 S05
Dynamic Address Translation Motivation In the early machines, I/O operations were slow and each word transferred involved the CPU Higher throughput if CPU and I/O of 2 or more programs were overlapped. How?multiprogramming with DMA I/O devices, interrupts Location-independent programs Programming and storage management ease need for a base register Protection Independent programs should not affect each other inadvertently need for a bound register Multiprogramming drives requirement for resident supervisor software to manage context switches between multiple programs prog1 Physical Memory prog2 OS CS252 S05
Simple Base and Bound Translation Segment Length Bound Register Bounds Violation? Physical Address current segment Physical Memory Logical Address Load X + Base Register Base Physical Address Program Address Space Base and bounds registers are visible/accessible only when processor is running in the supervisor mode CS252 S05
Separate Areas for Program and Data (Scheme used on all Cray vector supercomputers prior to X1, 2002) Load X Program Address Space Data Bound Register Bounds Violation? Logical Address data segment Mem. Address Register Physical Address Data Base Register + Main Memory Program Bound Register Bounds Violation? Logical Address program segment Program Counter Permits sharing of program segments. Physical Address Program Base Register + What is an advantage of this separation? CS252 S05
Base and Bound Machine + + Prog. Bound Register Data Bound Register Bounds Violation? Bounds Violation? Logical Address Logical Address PC D E M W Decode Data Cache Inst. Cache + + + Physical Address Physical Address Program Base Register Data Base Register Physical Address Physical Address Memory Controller Physical Address Main Memory (DRAM) [ Can fold addition of base register into (register+immediate) address calculation using a carry-save adder (sums three numbers with only a few gate delays more than adding two numbers) ] CS252 S05
Memory Fragmentation free Users 4 & 5 arrive Users 2 & 5 leave OS Space 16K 24K 32K user 1 user 2 user 3 OS Space 16K 24K 32K user 1 user 2 user 3 user 5 user 4 8K OS Space 16K 24K 32K user 1 user 4 8K user 3 Called Burping the memory. As users come and go, the storage is “fragmented”. Therefore, at some stage programs have to be moved around to compact the storage. CS252 S05
Paged Memory Systems Processor-generated address can be split into: page number offset A page table contains the physical address of the base of each page: 1 2 3 1 1 Physical Memory 2 2 3 3 Address Space of User-1 Page Table of User-1 Relaxes the contiguous allocation requirement. Page tables make it possible to store the pages of a program non-contiguously. CS252 S05
Private Address Space per User Page Table User 2 User 3 Physical Memory free OS pages OS ensures that the page tables are disjoint. Each user has a page table Page table contains an entry for each user page CS252 S05
Where Should Page Tables Reside? Space required by the page tables (PT) is proportional to the address space, number of users, ... Too large to keep in registers Idea: Keep PTs in the main memory needs one reference to retrieve the page base address and another to access the data word doubles the number of memory references! CS252 S05
Page Tables in Physical Memory VA1 User 1 Virtual Address Space User 2 Virtual Address Space PT User 1 PT User 2 Physical Memory CS252 S05
A Problem in the Early Sixties There were many applications whose data could not fit in the main memory, e.g., payroll Paged memory system reduced fragmentation but still required the whole program to be resident in the main memory CS252 S05
Manual Overlays Assume an instruction can address all the storage on the drum Method 1: programmer keeps track of addresses in the main memory and initiates an I/O transfer when required Difficult, error-prone! Method 2: automatic initiation of I/O transfers by software address translation Brooker’s interpretive coding, 1960 Inefficient! 40k bits main 640k bits drum Central Store Ferranti Mercury 1956 British Firm Ferranti, did Mercury and then Atlas Method 1 too difficult for users Method 2 too slow. Not just an ancient black art, e.g., IBM Cell microprocessor using in Playstation-3 has explicitly managed local store! CS252 S05
Demand Paging in Atlas (1962) “A page from secondary storage is brought into the primary storage whenever it is (implicitly) demanded by the processor.” Tom Kilburn Secondary (Drum) 32x6 pages Primary 32 Pages 512 words/page Central Memory Primary memory as a cache for secondary memory Single-level Store User sees 32 x 6 x 512 words of storage CS252 S05
Hardware Organization of Atlas 16 ROM pages 0.4 ~1 sec system code (not swapped) system data Effective Address Initial Address Decode 2 subsidiary pages 1.4 sec PARs 48-bit words 512-word pages 1 Page Address Register (PAR) per page frame Main 32 pages 1.4 sec Drum (4) 192 pages 8 Tape decks 88 sec/word 31 <effective PN , status> Compare the effective page address against all 32 PARs match normal access no match page fault save the state of the partially executed instruction CS252 S05
Atlas Demand Paging Scheme On a page fault: Input transfer into a free page is initiated The Page Address Register (PAR) is updated If no free page is left, a page is selected to be replaced (based on usage) The replaced page is written on the drum to minimize drum latency effect, the first empty page on the drum was selected The page table is updated to point to the new location of the page on the drum CS252 S05
Linear Page Table Page Table Entry (PTE) contains: Data word Data Pages Page Table Entry (PTE) contains: A bit to indicate if a page exists PPN (physical page number) for a memory-resident page DPN (disk page number) for a page on the disk Status bits for protection and usage OS sets the Page Table Base Register whenever active user process changes Page Table PPN PPN DPN PPN Offset DPN PPN PPN DPN DPN VPN DPN PPN PPN PT Base Register VPN Offset Virtual address CS252 S05
Size of Linear Page Table With 32-bit addresses, 4-KB pages & 4-byte PTEs: 220 PTEs, i.e, 4 MB page table per user 4 GB of swap needed to back up full virtual address space Larger pages? Internal fragmentation (Not all memory in page is used) Larger page fault penalty (more time to read from disk) What about 64-bit virtual address space??? Even 1MB pages would require 244 8-byte PTEs (35 TB!) What is the “saving grace” ? Virtual address space is large but only a small fraction of the pages are populated. So we can use a sparse representation of the table. CS252 S05
Hierarchical Page Table Virtual Address 31 22 21 12 11 p1 p2 offset 10-bit L1 index 10-bit L2 index offset Root of the Current Page Table p2 p1 Physical Memory (Processor Register) Level 1 Page Table Level 2 Page Tables page in primary memory page in secondary memory PTE of a nonexistent page Data Pages CS252 S05
Two-Level Page Tables in Physical Memory Virtual Address Spaces Level 1 PT User 1 VA1 User 1 Level 1 PT User 2 User2/VA1 VA1 User1/VA1 User 2 Level 2 PT User 2 CS252 S05
Address Translation & Protection Virtual Address Virtual Page No. (VPN) offset Kernel/User Mode Protection Check Read/Write Address Translation Exception? Physical Address Physical Page No. (PPN) offset Every instruction and data access needs address translation and protection checks A good VM design needs to be fast (~ one cycle) and space efficient CS252 S05
Translation Lookaside Buffers (TLB) Address translation is very expensive! In a two-level page table, each reference becomes several memory accesses Solution: Cache translations in TLB TLB hit Single-Cycle Translation TLB miss Page-Table Walk to refill virtual address VPN offset V R W D tag PPN (VPN = virtual page number) 3 memory references 2 page faults (disk accesses) + .. Actually used in IBM before paged memory. (PPN = physical page number) hit? physical address PPN offset CS252 S05
64 entries * 4 KB = 256 KB (if contiguous) TLB Designs Typically 32-128 entries, usually fully associative Each entry maps a large page, hence less spatial locality across pages more likely that two entries conflict Sometimes larger TLBs (256-512 entries) are 4-8 way set-associative Larger systems sometimes have multi-level (L1 and L2) TLBs Random or FIFO replacement policy No process information in TLB? TLB Reach: Size of largest virtual address space that can be simultaneously mapped by TLB Example: 64 TLB entries, 4KB pages, one page per entry TLB Reach = _____________________________________________? 64 entries * 4 KB = 256 KB (if contiguous) CS252 S05
Handling a TLB Miss Software (MIPS, Alpha) TLB miss causes an exception and the operating system walks the page tables and reloads TLB. A privileged “untranslated” addressing mode used for walk Hardware (SPARC v8, x86, PowerPC, RISC-V) A memory management unit (MMU) walks the page tables and reloads the TLB If a missing (data or PT) page is encountered during the TLB reloading, MMU gives up and signals a Page-Fault exception for the original instruction CS252 S05
Hierarchical Page Table Walk: SPARC v8 31 11 0 Virtual Address Index 1 Index 2 Index 3 Offset 31 23 17 11 0 Context Table Register root ptr PTP PTE Context Table L1 Table L2 Table L3 Table Physical Address PPN Offset MMU does this table walk in hardware on a TLB miss CS252 S05
Page-Based Virtual-Memory Machine (Hardware Page-Table Walk) Page Fault? Protection violation? Page Fault? Protection violation? Virtual Address Virtual Address Physical Address Physical Address PC D E M W Inst. TLB Decode Data TLB Inst. Cache Data Cache + Miss? Miss? Page-Table Base Register Hardware Page Table Walker Physical Address Physical Address Memory Controller Physical Address Main Memory (DRAM) Assumes page tables held in untranslated physical memory CS252 S05
Modern Virtual Memory Systems Illusion of a large, private, uniform store Protection & Privacy several users, each with their private address space and one or more shared address spaces page table name space Demand Paging Provides the ability to run programs larger than the primary memory Hides differences in machine configurations The price is address translation on each memory reference OS useri Swapping Store Primary Memory Portability on machines with different memory configurations. mapping VA PA TLB CS252 S05
Address Translation: putting it all together Virtual Address hardware hardware or software software Restart instruction TLB Lookup miss hit Page Table Walk Protection Check the page is Ï memory Î memory denied Need to restart instruction. Soft and hard page faults. permitted Page Fault (OS loads page) Protection Fault Update TLB Physical Address (to cache) SEGFAULT CS252 S05
Page Fault Handler When the referenced page is not in DRAM: The missing page is located (or created) It is brought in from disk, and page table is updated Another job may be run on the CPU while the first job waits for the requested page to be read from disk If no free pages are left, a page is swapped out Pseudo-LRU replacement policy Since it takes a long time to transfer a page (msecs), page faults are handled completely in software by the OS Untranslated addressing mode is essential to allow kernel to access page tables CS252 S05
Handling VM-related exceptions PC D E M W Inst TLB Inst. Cache Decode Data TLB Data Cache + TLB miss? Page Fault? Protection violation? TLB miss? Page Fault? Protection violation? Handling a TLB miss needs a hardware or software mechanism to refill TLB Handling a page fault (e.g., page is on disk) needs a restartable exception so software handler can resume after retrieving page Precise exceptions are easy to restart Can be imprecise but restartable, but this complicates OS software Handling protection violation may abort process But often handled the same as a page fault CS252 S05
Address Translation in CPU Pipeline PC D E M W Inst TLB Inst. Cache Decode Data TLB Data Cache + TLB miss? Page Fault? Protection violation? TLB miss? Page Fault? Protection violation? Need to cope with additional latency of TLB: slow down the clock? pipeline the TLB and cache access? virtual address caches parallel TLB/cache access CS252 S05
Virtual-Address Caches CPU Physical Cache TLB Primary Memory VA PA Alternative: place the cache before the TLB CPU VA (StrongARM) Virtual Cache PA TLB Primary Memory one-step process in case of a hit (+) cache needs to be flushed on a context switch unless address space identifiers (ASIDs) included in tags (-) aliasing problems due to the sharing of pages (-) maintaining cache coherence (-) (see later in course) CS252 S05
Virtually Addressed Cache (Virtual Index/Virtual Tag) Virtual Address Virtual Address PC D E M W Decode Inst. Cache Data Cache + Miss? Miss? Inst. TLB Page-Table Base Register Hardware Page Table Walker Data TLB Physical Address Physical Address Memory Controller Instruction data Physical Address Main Memory (DRAM) Translate on miss CS252 S05
VM features track historical uses: Bare machine, only physical addresses One program owned entire machine Batch-style multiprogramming Several programs sharing CPU while waiting for I/O Base & bound: translation and protection between programs (not virtual memory) Problem with external fragmentation (holes in memory), needed occasional memory defragmentation as new jobs arrived Time sharing More interactive programs, waiting for user. Also, more jobs/second. Motivated move to fixed-size page translation and protection, no external fragmentation (but now internal fragmentation, wasted bytes in page) Motivated adoption of virtual memory to allow more jobs to share limited physical memory resources while holding working set in memory Virtual Machine Monitors Run multiple operating systems on one machine Idea from 1970s IBM mainframes, now common on laptops e.g., run Windows on top of Mac OS X Hardware support for two levels of translation/protection Guest OS virtual -> Guest OS physical -> Host machine physical CS252 S05
Virtual Memory Use Today - 1 Servers/desktops/laptops/smartphones have full demand-paged virtual memory Portability between machines with different memory sizes Protection between multiple users or multiple tasks Share small physical memory among active tasks Simplifies implementation of some OS features Vector supercomputers have translation and protection but rarely complete demand-paging (Older Crays: base&bound, Japanese & Cray X1/X2: pages) Don’t waste expensive CPU time thrashing to disk (make jobs fit in memory) Mostly run in batch mode (run set of jobs that fits in memory) Difficult to implement restartable vector instructions CS252 S05
Virtual Memory Use Today - 2 Most embedded processors and DSPs provide physical addressing only Can’t afford area/speed/power budget for virtual memory support Often there is no secondary storage to swap to! Programs custom written for particular memory configuration in product Difficult to implement restartable instructions for exposed architectures CS252 S05
Acknowledgements These slides contain material developed and copyright by: Arvind (MIT) Krste Asanovic (MIT/UCB) Joel Emer (Intel/MIT) James Hoe (CMU) John Kubiatowicz (UCB) David Patterson (UCB) MIT material derived from course 6.823 UCB material derived from course CS252 CS252 S05