Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011
Motivation
Virtual memory – Performance overhead 5-14% for ‘typical’ applications [Bhargava08] – 89% under virtualization [Bhargava08] – Large pages not always a good solution 3
What page size to pick? – 4KB, 2MB, 1GB on x86 Can’t always use largest size – Wasted memory – increased I/O traffic Dynamic page size selection 4
SpecTLB (Speculative TLB) – A hardware/software system Reservation-based physical memory allocator [Talluri94] – Allocate small pages by default to maintain fine-grained control Predict small page translations in hardware – Performance of large pages, control of small pages 5
Background
Four-level radix-tree page table 7 0x5c8315cc2016 [47:39] [38:30] [29:21] [20:12] [11:0] {0b9, 00c, 0ae, 0c2, 016} {123, 016}
Page table levels describe physical address space at different granularity 8 512GB1GB2MB4KB
Reservation-based memory allocation [Talluri94] – Always allocate small pages in book-keeping entry at first – Place these small pages in a large page ‘reservation’ if the handler decides that reservation is needed – Promote reservation to large page when all small pages in the reservation are allocated – Extended and implemented in FreeBSD [Navarro02] Default memory allocator 9
10 Handler reserves 2MB region of physical space
11 Reservation is ‘promoted’ into Large page.
12 Reservations may not be filled.
13
SpecTLB
TLB-like structure – Tracks reservations, not actual mappings – Detect reservations – Predict translations – Verify predictions 15
16 {0b9, 00c, 0ae, 002, 313}{8002, 313} Virtual AddressPhysical Address {0b9, 00c, 0ae, 000, 000}{8000, 000} Current Reservations: {8000, 000}
17 {0b9, 00c, 0ae, 005, 313}{8005, 313}? Virtual AddressPhysical Address {0b9, 00c, 0ae, 000, 000}{8000, 000} Current Reservations: {8000, 000} ?
Provides predicted translations for pages within tracked reservations Predictions may be incorrect – Page table must still be walked Page walk can occur in parallel Latency hidden – Speculative translation can be used concurrently Microarchitecture cancels speculative work 18
Simulation & Result
BenchmarkTLB miss rate (/1k DRAM accesses) Speculative Prediction frequency Prediction Accuracy DRAM Accesses Overlapped PostgreSQL python SPECjbb bzip gcc mcf dc.B ep.C Full system simulator, unmodified FreeBSD kernel
SpecTLB and TLB prefetching hide the latency of TLB misses. – SpecTLB : large-page reservations. current TLB miss. – TLB prefetcher : access patterns, future TLB miss. Speculative work – SpecTLB : instructions are executed parallel with translation confirm. – TLB prefetcher : prefetch page table entries. 21
Generally hides fewer walks than SpecTLB – Prefetcher does well with high access regularity 22 BenchmarkTLB miss rate SpecTLBTLB Prefetcher PostgreSQL python SPECjbb bzip gcc mcf dc.B ep.C
SpecTLB hides latency of TLB misses – Predictions allow page walk to occur in parallel with speculative work – >62% of TLB miss latencies hidden for majority of benchmarks 23