1 Virtual Memory
2 Outline Case analysis –Pentium/Linux Memory System –Core i7 Suggested reading: 9.7
3 bus interface unit DRAM external system bus (e.g. PCI) instruction fetch unit L1 i-cache L2 cache cache bus L1 d-cache inst TLB data TLB processor package 32 bit address space 4 KB page size L1, L2, and TLBs 4-way set associative inst TLB 32 entries 8 sets data TLB 64 entries 16 sets L1 i-cache and d-cache 16 KB 32 B line size 128 sets L2 cache unified 128 KB -- 2 MB 32 B line size P6 Memory System
4 P6 Address Translation
5 Page directory – byte page directory entries (PDEs) that point to page tables –one page directory per process –page directory must be in memory when its process is running –always pointed to by PDBR Page tables: – byte page table entries (PTEs) that point to pages –page tables can be paged in and out page directory... Up to 1024 page tables 1024 PTEs 1024 PTEs 1024 PTEs PDEs P6 Page Table
6 Page table physical base addrAvailGPSACDWTU/SR/WP=1 Page table physical base address: 20 most significant bits of physical page table address (forces page tables to be 4KB aligned) Avail: available for system programmers G: global page (don’t evict from TLB on task switch) PS: page size 4K (0) or 4M (1) A: accessed (set by MMU on reads and writes, cleared by software) CD: cache disabled (1) or enabled (0) WT: write-through or write-back cache policy for this page table U/S: user or supervisor mode access R/W: read-only or read-write access P: page table is present in memory (1) or not (0) Available for OS (page table location in secondary storage)P= P6 page directory entry (PDE)
7 Page physical base addressAvailG0DACDWTU/SR/WP=1 Page base address: 20 most significant bits of physical page address (forces pages to be 4 KB aligned) Avail: available for system programmers G: global page (don’t evict from TLB on task switch) D: dirty (set by MMU on writes) A: accessed (set by MMU on reads and writes) CD: cache disabled or enabled WT: write-through or write-back cache policy for this page U/S: user/supervisor R/W: read/write P: page is present in physical memory (1) or not (0) Available for OS (page location in secondary storage)P= P6 page table entry (PTE)
8 PDE PDBR physical address of page table base (if P=1) physical address of page base (if P=1) physical address of page directory word offset into page directory word offset into page table page directorypage table VPN1 10 VPO 1012 VPN2 Virtual address PTE PPNPPO Physical address word offset into physical and virtual page Page tables Translation
9 P6 TLB Translation
10 TLB entry (not all documented, so this is speculative): –V: indicates a valid (1) or invalid (0) TLB entry –PD: is this entry a PDE (1) or a PTE (0)? –tag: disambiguates entries cached in the same set –PDE/PTE: page directory or page table entry Structure of the data TLB: –16 sets, 4 entries/set PDE/PTETagPDV entry... set 0 set 1 set 2 set 15 P6 TLB
11 1. Partition VPN into TLBT and TLBI. 2. Is the PTE for VPN cached in set TLBI? 3. Yes: then build physical address. 4. No: then read PTE (and PDE if not cached) from memory and build physical address. CPU VPNVPO 2012 TLBTTLBI 416 virtual address PDEPTE... TLB miss TLB hit page table translation PPNPPO 2012 physical address Translating with the P6 TLB
12 P6 Page Table Translation
13 Case 1/1: page table and page present. MMU Action: –MMU build physical address and fetch data word. OS action –none VPN VPN1VPN2 PDE PDBR PPNPPO VPO 12 p=1PTEp=1 Data page data Page directory Page table Mem Disk Translating with the P6 page tables (case 1/1)
14 Case 1/0: page table present but page missing. MMU Action: –page fault exception –handler receives the following args: VA that caused fault fault caused by non-present page or page-level protection violation read/write user/supervisor VPN VPN1VPN2 PDE PDBR 20 VPO 12 p=1PTE Page directory Page table Mem Disk Data page data p=0 Translating with the P6 page tables (case 1/0)
15 OS Action: –Check for a legal virtual address. –Read PTE through PDE. –Find free physical page (swapping out current page if necessary) –Read virtual page from disk and copy to virtual page –Restart faulting instruction by returning from exception handler. VPN VPN1VPN2 PDE PDBR 20 VPO 12 p=1PTEp=1 Page directory Page table Data page data PPNPPO 2012 Mem Disk Translating with the P6 page tables (case 1/0, cont)
16 Case 0/1: page table missing but page present. Introduces consistency issue. –potentially every page out requires update of disk page table. Linux disallows this –if a page table is swapped out, then swap out its data pages too. VPN VPN1VPN2 PDE PDBR 20 VPO 12 p=0 PTEp=1 Page directory Page table Mem Disk Data page data Translating with the P6 page tables (case 0/1)
17 Case 0/0: page table and page missing. MMU Action: –page fault exception VPN VPN1VPN2 PDE PDBR 20 VPO 12 p=0 PTE Page directory Page table Mem Disk Data page datap=0 Translating with the P6 page tables (case 0/0)
18 OS action: –swap in page table. –restart faulting instruction by returning from handler. Like case 0/1 from here on. VPN VPN1VPN2 PDE PDBR 20 VPO 12 p=1PTE Page directory Page table Mem Disk Data page data p=0 Translating with the P6 page tables (case 0/0, cont)
19 P6 L1 Cache Access
20 Partition physical address into CO, CI, and CT. Use CT to determine if line containing word at address PA is cached in set CI. If no: check L2. If yes: extract word at byte offset CO and return to processor. physical address (PA) data CTCO 205 CI 7 L2 and DRAM L1 (128 sets, 4 lines/set) L1 hit L1 miss L1 cache access
21 Core i7 –The 64-bit Nehalem microarchitecture –Current support 48-bit (256 TB) virtual address space and 52-bit (4 PB) physical address space –Compatibility mode support 32-bit (4 GB) virtual and physical address spaces Core i7 Summary
22 Processor package Core i7 Memory System Core x 4 Registers Instruction fetch L1 d-cache 32 KB,8-way L1 i-cache 32 KB,8-way L2 unified cache 256 KB, 8-way L3 unified cache 8 MB, 16-way (shared by all cores) MMU (addr translation) L1 d-TLB 64 entries, 4-way L1 i-TLB 64 entries, 4-way L2 unified TLB 512 entries, 4-way QuickPath interconnect GB/s GB/s total DDR3 memory controller 3 x GB/s 32 GB/s total (shared by all cores) Main memory To other cores To I/O bridge
CPU VPN VPO Virtual address (VA) L1 TLB (16 sets, 4 entries/set) PPN PPO TLB miss TLB hit physical address (PA) result 32/64 CT L2, L3, and main memory L1 d-cache (64 sets, 8 lines/set) L1 hit L1 miss TLBT TLBI Core i7 Address Translation CI CO VPN1 VPN2 VPN3 VPN4 PTE CR3 Page Tables 9 999
24 Page table physical base addrUnusedGPSACDWTU/SR/WP=1 XD Disable or enable instruction fetches Base addr 40 most significant bits of base address of child page table (forces page tables to be 4KB aligned) G global page (don’t evict from TLB on task switch) PS Page size either 4K(0) or 4M(1) (defined for Level 1 PTEs only) A Reference bit (set by MMU on reads and writes, cleared by software) CD Cache disabled(1) or enabled(0) for child page table WT Write-through or write-back cache policy U/S User or supervisor(kernel) mode access permission R/W Read-only or read-write access permissiom P Child page table present in memory(1) or not(0) Available for OS (page table location in secondary storage)P=0 Level 1, Level 2 and Level 3 Page Table Entry Unused 52 XD 6263
25 Page table physical base addrUnusedGDACDWTU/SR/WP=1 XD Disable or enable instruction fetches Base addr 40 most significant bits of base address of child page table (forces page tables to be 4KB aligned) G global page (don’t evict from TLB on task switch) D Dirty bit (Set by MMU on writes, cleared by software) A Reference bit (set by MMU on reads and writes, cleared by software) CD Cache disabled(1) or enabled(0) for child page table WT Write-through or write-back cache policy U/S User or supervisor(kernel) mode access permission R/W Read-only or read-write access permissiom P Child page table present in memory(1) or not(0) Available for OS (page table location in secondary storage)P=0 Level 4 Page Table Entry Unused 52 XD 6263
26 Physical address of L1 PT physical address of page 512 GB region per entry L1 PT Page global directory 12 Virtual address Physical address Offset into physical and virtual page Page tables Translation VPN 1VPN 2VPN 3VPN 4VPO 9999 PPO L1 PTE CR3 1 GB region per entry L2 PT Page upper directory L1 PTE 2 MB region per entry L3 PT Page middle directory L1 PTE 4 KB region per entry L4 PT Page directory L1 PTE PPO
Cute Trick for Speeding Up L1 Access Observation –Bits that determine CI identical in virtual and physical address –Can index into cache while address translation taking place –Generally we hit in TLB, so PPN bits (CT bits) available next –“Virtually indexed, physically tagged” –Cache carefully sized to make this possible Physical address (PA) CTCO 366 CI 6 Virtual address (VA) VPNVPO 3612 PPOPPN Address Translation No Change CI L1 Cache CT Tag Check
28 Process-specific data structures (e.g. task and mm structs, page tables, kernel stack) Process-specific data structures (e.g. task and mm structs, page tables, kernel stack) Physical memory Kernel code and data User stack Memory mapped region for shared libraries Run-time heap (via malloc) Uninitialized data (.bss) Initialized data (.data) Program text (.text) Kernel virtual memory Process virtual memory Different for each process Identical for each process %esp brk x (32) x (64) Linux Virtual Memory System
29 Next Memory Mapping Suggested reading: 9.7, 9.8