Computer Architecture 2011 – VM x86 1 Computer Architecture Virtual Memory (VM) – x86 By Dan Tsafrir, 30/5/2011 Presentation based on slides by Lihu Rappoport
Computer Architecture 2011 – VM x (funny beginning)
Computer Architecture 2011 – VM x86 3 Reminder: VM motivation VM provides –Illusion of large memory –Illusion of contiguity –Ability to overcommitment –Process isolation
Computer Architecture 2011 – VM x86 4 Reminder: page table translates VA=>PA Valid 1 Physical Memory Disk Page Table points to memory frame or disk address Virtual page number Think of it as a hash table that maps VA to PA
Computer Architecture 2011 – VM x86 5 Reminder: TLB accelerates translation TLB is a VA => PA cache
Computer Architecture 2011 – VM x86 6 Reminder: VM concepts A page can be –Not yet loaded –Loaded –On disk A loaded page can be –Dirty –Clean When a page is not loaded (P bit clear) page fault occurs –It may require throwing a loaded page to insert the new one OS prioritize throwing by LRU and dirty/clean/avail bits Dirty page should be written to Disk. Clean need not. –New page is either loaded from disk or “initialized” –CPU will set page “access” flag when accessed, “dirty” when written
Computer Architecture 2011 – VM x86 7 Goal In the context of x86… Provide a method to map –From virtual address (used by program) –To: physical address Method should be efficient –Can generally be exercised by HW alone –Typically no SW involvement
Computer Architecture 2011 – VM x BIT X86 REGULAR PAGING
Computer Architecture 2011 – VM x86 9 Hierarchical translation x86 supports 4KB & 4MB pages –Q: why would we want a 4MB (called “super-page”)? –A: TLB is small… Page directory –Each process has its own page-directory (but threads share) CR3 points to p-d of current process –Holds 1024 PDEs (page-directory entries), each is 32 bits –Each PDE contains a PS (“page size”) flag PS=1: PDE points directly to a 4MB (super)page PS=0: PDE points to “page table” whose entries point to 4KB pages Page table –Holds 1024 PTEs (page-table entries), each is 32 bits –Each PTE points to a 4KB page in physical memory
Computer Architecture 2011 – VM x86 10 Mapping only 4KB pages (typical) 2-level hierarchy –All pages are 4KB aligned –Total of 2 20 (=1M) 4KB pages = 4GB DIR (10 bits) –Point to PDE in page directory –We assume all PDEs have PS=0 –=> Each PDE provides 20bit of 4KB- aligned base physical address of a 4KB page table (no superpaging) TABLE (10 bits) –Point to PTE in page table –PTE provides a 20 bit, 4KB-aligned base physical address of a 4KB page OFFSET (12 bits) –Offset within the selected 4KB page 031 DIRTABLEOFFSET 32bit linear address KB 1K-PTE page table 4KB 1K-PDE page directory PDE 4K Page data CR3 (PDBR) PTE 20+12=32 (4K aligned) 20
Computer Architecture 2011 – VM x86 11 Mapping only 4MB pages 1-level hierarchy –All pages are 4MB aligned –Total of 2 10 (=1K) 4KB pages = 4GB DIR (10 bits): –Point to PDE in page directory –We assume all PDEs have PS=1 –=> Each PDE provides 10bit of 4MB- aligned base physical address of a 4MB page table (no superpaging) TABLE (10 bits) –None! (moved to offset) OFFSET (22 bits) –Offset within the selected 4MB page Fine print –Must set PSE flag in CR4 for 4MB support to work –Otherwise, PS=1 flag settings ignored 031 DIROFFSET 32bit linear address 21 PDE 4MB Page data CR3 (PDBR) =32 (4K aligned) 10 4KB 1K-PDE page directory
Computer Architecture 2011 – VM x86 12 Mixing 4KB & 4MB pages Works “out of the box” –When CR3.PSE=1 –Alignment constraints: 4MB for superpages, 4KB for regular pages TLB issues? –No, as CPU maintains 4MB and 4KB PTEs in separate TLBs Benefits –Superpages often used for often-used kernel code –Frees up 4KB TLB entries –Reduces TLB misses => improve overall system performance
Computer Architecture 2011 – VM x86 13 PDE & PTE format 20 bit physical address –4K-aligned pointer 12 bits flags –Virtual memory Present, accessed, dirty –Protection Read, write, user, privileged –Caching WB, WT, disable –3 bit for OS usage 0 00 Page Frame Address 31:12 AVAIL00A PCDPCD PWTPWT UWP Present Writable User Write-Through Cache Disable Accessed Page Size (0: 4 Kbyte) Available for OS Use Page Dir Entry Page Frame Address 31:12 AVAILDA PCDPCD PWTPWT UWP Present Writable User Write-Through Cache Disable Accessed Dirty Available for OS Use Page Table Entry Reserved for future use (should be zero) - -
Computer Architecture 2011 – VM x KB-page PTE format G PATPAT Page Base Address 31:12 AVAILDA PCDPCD PWTPWT U/SU/S R/WR/W P Present Writable User / Supervisor Write-Through Cache Disable Accessed Dirty Page Table Attribute Index Global Page Available for OS Use
Computer Architecture 2011 – VM x KB-page PDE format G PSPS Page Table Base Address 31:12 AVAIL AVLAVL A PCDPCD PWTPWT U/SU/S R/WR/W P Present Writable User / Supervisor Write-Through Cache Disable Accessed Dirty Page Size (0 indicates 4 Kbytes) Global Page (ignored) Available for OS Use
Computer Architecture 2011 – VM x86 16 Reserved 4MB-page PDE format G PSPS Page Base Address 31:22 AVAILDA PCDPCD PWTPWT U/SU/S R/WR/W P Present Writable User / Supervisor Write-Through Cache Disable Accessed Dirty Page Size (1 indicates 4 Mbytes) Global Page (ignored) Available for OS Use Page Table Attribute Index PATPAT 12
Computer Architecture 2011 – VM x86 17 VM attributes: present flag (P) Set => page in physical memory –Translation is carried out by the MMU (memory management unit) Clear => page not in physical memory –When encounters by MMU => generates a page-fault exception –Faulting address is available to SW exception handler MMU does not set/clear this flag (only reads it) –It’s up to the OS Upon page-fault exception => OS typically does the following: 1.Copy page from disk to memory (unless already in buffer cache) 2.Update PTE/PDE with page RAM address 3.P = 1; dirty = accessed = 0; etc. 4.Invalidate associated PTE in TLB 5.Resume program on faulty instruction
Computer Architecture 2011 – VM x86 18 VM attributes: page size flag (PS) In PDEs only Determines the page size –Clear=> page size = 4KB (& PDE points to a page table) –Set=> page size = 4MB (& PDE points to superpage)
Computer Architecture 2011 – VM x86 19 VM attributes: accessed (A) & dirty (D) MMU sets A-flag –Upon first time a page (or page-table) is accessed (load or store) MMU sets D-flag –Upon first time a page (or PT) is accessed (store only) A & D are sticky –Once set, MMU (=HW) never clears them –Only SW does OS clears them –When initially loading PTE –Possibly from time to time as part of LRU approximation (used to decide which pages to swap out and which to keep)
Computer Architecture 2011 – VM x86 20 VM attributes: global flag (G) Has affect only when PGE=1 in CR4 When set, indicates page is “global” –Not flushed from TLB when CR3 loaded –Ignored for PDEs with PS=0 (that point to page tables) Used to improve performance –Keeps important pages of OS in TLB across context switches Only software can set or clear this flag
Computer Architecture 2011 – VM x86 21 Cache attributes: PWT PWT –Means “page-level write-through” Controls write-through / write-back caching policy of page / PT –1: enable write-through caching –0 : disable write-through => enable write-back caching Ignored if –CD (“cache disable”) flag is set in CR0 –If associated PCD is on
Computer Architecture 2011 – VM x86 22 Cache attributes: PCD PCD –Means “page-level cache disable” flag Controls caching of individual pages / PTs –1: caching associated page/PT is prevented –0: caching allowed Used –When caching doesn’t help performance (e.g., streaming) –Memory mapped I/O ports to communicate with devices Assumed as set (regardless of actual value) –If the CD (“cache disable”) flag in CR0 is set
Computer Architecture 2011 – VM x86 23 Cache attributes: PAT PAT –Means “page attribute table index” flag If on, used along with PCD & PWT flags to select an entry in the PAT –Which in turn selects the memory type for the page –PAT is a 64bit register –(Not going into the details)
Computer Architecture 2011 – VM x86 24 Protection attributes : R/W & U/S Read/write (R/W) flag –Specifies read-write privileges for page (if PTE), group of pages (if PDE) –0 = read only –1 = read & write User/supervisor (U/S) flag –Specifies privileges for a page (PTE) or group of pages (PDE) (in case of a PDE that points to a page table) –0 = supervisor privilege level –1 = user privilege level –User accessing a supervisor page will trigger an interrupt Typically resulting in the termination of the program
Computer Architecture 2011 – VM x86 25 Misc issues Memory aliasing/sharing –When two (or more) PDEs point to a common PTE –When two (or more) PTEs point to a common page –But SW must maintain consistency of accessed & dirty bits in the these PDEs & PTEs Base address of page-directory –Physical address of current p-d is stored in CR3 Also called the page-directory-base-register (PDBR) –PDBR typically reloaded upon task switches –Page directory must remain in-memory as long as task is active
Computer Architecture 2011 – VM x BIT X86 EXTENDED PAGING
Computer Architecture 2011 – VM x86 27 PAE – Physical Address Extension 32bit address imposes a limit –Means we can use memory <= 2^32 = 4GB –Too small for many system, PAE (physical address extension) support –Allows access to a 2^36 RAM (= 64 GB) –But not directly (address remains 32bit) Only applicable when paging is enabled –When also turning on PAE in CR4 –Support for 4KB and 2MB (rather than 4MB)
Computer Architecture 2011 – VM x86 28 PAE – Physical Address Extension Relies on an additional Page Directory Pointer Table –Lies above the page directory in the translation hierarchy –Has 4 entries of 64-bits each to support up to 4 page directories –PTEs are increased to 64 bits to accommodate 36-bit base physical addresses –Each 4KB page directory and page table can thus have up to 512 entries –CR3 contains the page-directory-pointer-table base address
Computer Architecture 2011 – VM x KB Page Mapping with PAE Linear address divided to –Page-directory-pointer-table entry Indexed by bits 30:31 of the linear addr. Provides an offset to one of 4 entries in the page-directory-pointer table The selected entry provides the base physical address of a page directory –Dir(9 bits) – points to a PDE in the Page Directory PS in the PDE = 0 PDE provides a 27 bit, 4KB aligned base physical address of a page table –Table (9 bit) – points to a PTE in the Page Table PTE provides a 24 bit, 4KB aligned base physical address of a 4KB page –Offset (12 bits) – offset within the selected 4KB page 029 DIRTABLEOFFSET Linear Address Space (4K Page) entry Page Table 512 entry Page Directory PDE 4KByte Page data PTE CR3 (PDPTR) 32 (32B aligned) Dir ptr entry Page Directory Pointer Table Dir ptr entry 27 2
Computer Architecture 2011 – VM x MB Page Mapping with PAE Linear address divided to –Page-directory-pointer-table entry Indexed by bits 30:31 of the linear addr. Provides an offset to one of 4 entries in the page-directory-pointer table The selected entry provides the base physical address of a page directory –Dir(9 bits) – points to a PDE in the Page Directory PS in the PDE = 1 PDE provides a 15 bit, 2MB aligned base physical address of a 2MB page –Offset (21 bits) – offset within the selected 2MB page 029 DIROFFSET Linear Address Space (2MB Page) 20 Page Directory PDE 2MByte Page data 9 21 CR3 (PDPTR) 32 (32B aligned) Dir ptr Page Directory Pointer Table Dir ptr entry 27 2
Computer Architecture 2011 – VM x86 31 PTE/PDE/PDP Entry Format with PAE The major differences in these entries are as follows: –A page-directory-pointer-table entry is added –The size of the entries is increased from 32 bits to 64 bits –The maximum number of entries in a page directory or page table is 512 –The base physical address field in each entry is extended to 24 bits
Computer Architecture 2011 – VM x86 32 Paging in 64 bit Mode PAE paging structures expanded –Potentially support mapping a 64-bit linear address to a 52-bit physical address –First implementation supports mapping a 48-bit linear address into a 40-bit physical address A 4 th page mapping table added: the page map level 4 table (PML4) –The base physical address of the PML4 is stored in CR3 –A PML4 entry contains the base physical address a page directory pointer table The page directory pointer table is expanded to byte entries –Indexed by 9 bits of the linear address The size of the PDE/PTE tables remains 512 eight-byte entries –each indexed by nine linear-address bits The total of linear-address index bits becomes 48 PS flag in PDEs selects between 4-KByte and 2-MByte page sizes –CR4.PSE bit is ignored
Computer Architecture 2011 – VM x86 33 sign ext. 4KB Page Mapping in 64 bit Mode 029 DIRTABLEOFFSET Linear Address Space (4K Page) entry Page Table 512 entry Page Directory PDE 4KByte Page data PTE CR3 (PDPTR) 40 (4KB aligned) entry Page Directory Pointer Table PDP entry 31 9 PDPPML entry PML4 Table PML4 entry 9 31
Computer Architecture 2011 – VM x86 34 sign ext. 2MB Page Mapping in 64 bit Mode 029 DIROFFSET Linear Address Space (2M Page) entry Page Directory PDE 2MByte Page data 9 21 CR3 (PDPTR) 40 (4KB aligned) entry Page Directory Pointer Table PDP entry 31 9 PDPPML entry PML4 Table PML4 entry 9 31
Computer Architecture 2011 – VM x86 35 PTE/PDE/PDP/PML4 Entry Format – 4KB Pages
Computer Architecture 2011 – VM x86 36 TLBs The processor saves most recently used PDEs and PTEs in TLBs –Separate TLB for data and instruction caches –Separate TLBs for 4-KByte and 2/4-MByte page sizes OS running at privilege level 0 can invalidate TLB entries –INVLPG instruction invalidates a specific PTE in the TLB This instruction ignores the setting of the G flag –Whenever a PDE/PTE is changed (including when the present flag is set to zero), OS must invalidate the corresponding TLB entry –All (non-global) TLBs are automatically invalidated when CR3 is loaded The global (G) flag prevents frequently used pages from being automatically invalidated in on a task switch –The entry remains in the TLB indefinitely –Only INVLPG can invalidate a global page entry