1 Geiger: Monitoring the Buffer Cache in a Virtual Machine Environment Stephen T. Jones Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau Department of Computer Sciences
Buffer Cache In modern OSes, file system buffer and virtual memory system are unified –When first access a file, data is buffered in a memory page –When under memory pressure, a page will be evicted out If the page is dirty, write to swap space or file system first Then the page can be reused Later, if the data is needed, a page fault occurs –Allocate a free page, reload the data from disk to the page 2
Useful Information About Buffer Cache If VMM knows events of eviction/promotion –Tell if guest OS is thrashing and how much more memory allocation is needed to prevent it –Guide eviction-based cache placement exclusive cache: when hits, data item is removed A transparent secondary cache maybe desirable –E.g. a 32-bit OS running on a host with 16 GB mem Why exclusive cache works? –Normally, when a page is read from disk, the OS will not read it again without evicting it first –Increase cache utilization 3
4 Services in a VMM VMM layer is attractive development target –Security (isolation from OS and apps) –Portability (transparent to OS) Our target services –VMM-driven eviction-based cache placement Increase hit-ratio for remote storage caches Transparent to guest OS –Working set size estimation for thrashing VMs Complement ESX server technique
5 VMM Services Need Information Information about guest operating systems For our target services –Information about OS buffer cache Hidden from the VMM –Layered design approach –Narrow interface (virtual architecture)
6 Geiger Monitors Buffer Cache Virtual machine monitor extension Implicitly observes buffer cache events –Uses only information intrinsically available to VMM –Explicit approach possible, but drawbacks No guest OS modifications required Applicable to closed and legacy OS Accurate (usually less than 5% error) Low cost (usually less than 3% overhead) Enables service implementation in VMM
7Outline Geiger approachGeiger approach New Geiger techniques Evaluation Application
8 Buffer Cache Events Cache promotion –Disk block inserted into buffer cache Cache demotion –Disk block removed from cache
9 Detecting Promotion B Block read Block write Disk reads and writes visible to the VMM Associated Disk Location (ADL) CC A C B Buffer cache User process A A Disk ADL
10 Buffer cache Detecting Demotion B Detect when a page is removed from the cache VMM cannot observe page free directly Instead, look for page reuse If cache page data is reused, the page was logically freed in the interim Reuse inconsistent with ADL -> eviction ABCC A Disk ADL C
Read / Write Evictions –Read eviction A non-free page is reused for reading from a different disk location E.g. read a large file/memory space –Write eviction A non-free page is reused for writing. When it is written-back, the reuse (eviction) is detected Lag 11
12 Existing Techniques Promotion via reads and writes Demotion via reads and writes Chen et al. -- USENIX 2003 –Within OS (pseudo device driver) Initial basis for Geiger
13Outline Geiger approach New Geiger techniquesNew Geiger techniques Evaluation Application
14 New Geiger Techniques Other ways buffer cache pages are evicted Unified buffer cache/virtual memory system Non-I/O allocations cause eviction Two new eviction detection heuristics –Copy-on-write –Anonymous allocation
When Eviction Happens? Explicit Eviction –Read eviction –Write eviction Implicit Eviction –A non-free page is reused without disk writing or reading Page allocation or Copy-on-Write –E.g. when a process requests for a new page, a non-dirty page is allocated it 15
16 Detecting Allocation Eviction Page not-present fault Page allocation (possible reuse) New writable mapping Detect eviction Invalidate ADL z A’ ABCC A C R B z Disk Buffer cache User process
17 Filesystem Issues Filesystem features cause false positives Filesystem blocks can be deleted –Leads to dangling ADL and spurious eviction Journaling causes aliasing –Same cache page written to both the journal and filesystem locations –Interferes with write-eviction heuristic
18 Geiger Is Filesystem Aware Uses static filesystem info –Journal location and size –Block allocation bitmaps Ignore writes to the journal Track allocation bitmap updates and invalidate ADLs when blocks deallocated Significantly reduces Geiger false positives
Block Liveness Reusing a free page is not an eviction –Geiger infers the liveness of a page from the liveness of block A block dies –A file is deleted or truncated –A process with virtual memory usage terminates 19
Block Liveness for Files Observing the writes to superblock +:They are at some special disk location –: OS caches them in memory and sync to disk every 30 secs or more Pages used to cache them are marked read-only –Write attempts will cause page-faults –Invalidate affected ADLs 20
Block Liveness for Swap Space No on-disk structure to track block usage –When a disk block is written from a different memory page, the original block is considered to be “dead” –Maintain a reverse mapping from between blocks and ADLs –Invalidate ADLs when blocks are overwritten –If no overwritting, dead blocks can’t be detected Leads to as much as 37% false positive eviction 21
22Outline Geiger approach New Geiger techniques EvaluationEvaluation Application
23 Evaluation Goals Measure Geiger accuracy –Missed evictions (false negatives) –Spurious evictions (false positives) Measure Geiger timeliness –Lag between actual event and detection
24 Experimental Environment Xen VMM [Barham et al., SOSP03] –Extensions to observe page faults, page table updates, and I/O requests/completions Linux 2.4 and 2.6 guests Microbenchmarks –Isolate specific eviction types –Read, write, COW, allocation Application benchmarks –Dbench, Mogrify, TPC-W, SPC disk trace
25 Eviction Detection Accuracy 0.17% Alloc Evict 1.45%2.47%COW Evict 0.03%1.68%Write Evict 0.58%0.96%Read Evict False Pos %False Neg %Workload
26 ~3s Eviction Detection Lag
27 Application Accuracy 2.46%0.65%w/ block liveness Mogrify 0.32%2.24%SPC Web2 3.12%0.14%TPC-W 22.99%0.05%w/o block liveness Mogrify 5.72%2.30%w/ block liveness Dbench 30.23%1.10%w/o block liveness Dbench False Pos%False Neg%Geiger OptWorkload
28Outline Geiger approach New Geiger techniques Evaluation ApplicationApplication –Eviction-based cache placement
29 Application: Eviction-based Cache Placement Disk cache utilization is critical to performance Storage servers have large caches Demand-based placement => poor utilization Increase cache utilization via exclusivity Use client cache eviction as placement hint [Chen et al., USENIX ’03, Wong and Wilkes, USENIX ‘02] Use VMM-based, implicit eviction information to inform a remote storage cache No client or OS storage interfaces change
30 Cache Placement Results Geiger outperforms demand placement Mogrify: buffer misses too many evictions Mogrify: false positives are fortuitous Dbench: Lag causes OS to outperform Geiger 13% 51%
Outline Geiger approach New Geiger techniques Evaluation ApplicationApplication –Eviction-based cache placement –Working set size estimator 31
LRU Miss Ratio Curve 32 defghijklmncklmncbcdefghijklmncklmnabaabcdefghijklmnabcdefghijklmnabdefghijcklnmabcdefghijkmnlabcdefghijlmnkabdefghijklmnc4n3211 LRU Queue Pages in LRU order Hit Histogram Fault Curve mn 114 lmnklmncklmnabcdefghijklmncklmn Associated with each LRU position pages faults
Application: Working Set Size Estimator MemRx: Observe evictions/reloads Compute miss ratio curve 33 WSS = current memory allocation + LRU estimation Only works when WSS > current memory size
Estimation Results: Microbenchmarks 34 Virtual Machine is configured with 128 MB memory Each benchmark accesses 256 MB file/memory FS: file access VM: memory access
Estimation Results: Applications 35
36Summary System services in a VMM Need information about the guest OS Implicit information about the buffer cache –No guest OS modification –Accurate –Low overhead Build services and optimizations in a VMM –Eviction-based cache placement –Working set size estimation
37 Computer Sciences Department Advanced Systems Laboratory