Download presentation
Presentation is loading. Please wait.
1
Fast and Correct Performance Recovery of Operating Systems Using a Virtual Machine Monitor Kenichi Kourai Kyushu Institute of Technology, Japan
2
Recovery by OS Reboot OS reboot is a final but powerful recovery technique For recovery from OS crashes Against Mandelbugs A rebooted OS rarely crashes again For software rejuvenation Against aging-related bugs A rebooted OS restores its normal state reboot recovered OS crash memory leak
3
Performance Degradation OS reboot degrades the performance of file accesses Disk access increases due to frequent cache misses The page cache on memory is lost It takes a long time to fill the page cache Disk access also degrades the performance of the other virtual machines (VMs) page cache reboot slow disk VM
4
Performance Recovery Required OS recovery does not complete until the performance is also recovered Traditional OS reboot restores only the functionalities Fast reboot techniques have been proposed...
5
Warm-cache Reboot A new OS recovery mechanism with fast performance recovery It preserves the page cache during OS reboot An OS can reuse it after the reboot It guarantees the consistency of the page cache Using the virtual machine monitor (VMM) CacheMind VMM VM page cache reboot page cache discard corrupted cache
6
Reusing the Page Cache Collaboration between an OS and the VMM cmLinux registers cache information to the VMM On reboot, the VMM re-allocates the same memory cmLinux reserves the memory for the old page cache cmLinux searches the old page cache before disk reads CacheMind VMM register page cache cmLinux reboot re-allocate old page cache meta data
7
Cache Consistency Only consistent cache pages are reused Our definition: consistent if the contents of a cache page are the same as those in a disk Consistent when a file block is read from a disk Inconsistent when the cache page is modified Consistent when it is written back to a disk disk cmLinux page cache read write back modify
8
Reusability Management (Read) The VMM makes a cache page reusable after it reads data from a disk It protects the page before the read To detect page corruption during the read The VMM can write data to the page VMM read request read request disk possible corruption reusable protect read protect read cmLinux
9
Reusability Management (Write) The VMM makes a cache page non-reusable when an OS modifies its contents It unprotects the page at the same time To enable the OS to modify the page It makes the page reusable again after it writes back the contents VMM modify request unprotect write back VMM write request disk protect
10
More Checks for Cache Reuse Isn’t the cache page mapped elsewhere in a writable manner? The VMM counts writable mapping Hasn’t the cache page been mapped in a writable manner since protected? The VMM maintains a canary bit as a history VMM read/write request protect read/write corruption map unmap
11
Reusability Management (Mmap) cmLinux uses unprotect-on-write to exactly detect writes to memory-mapped files It maps a cache page with protection on a read It unprotects the page on a fault by the first write It protects the page again after msync RW read write msync VMM reuse RO RW no reuse RO
12
Optimization: Double Caching cmLinux writes data to a new cache page if the original page is protected This can delay unprotecting the original page until writeback The improvement depends on written bytes VMM page cache originalnew more than 1.5KB cmLinux
13
Experiments We have developed CacheMind using Xen We conducted several experiments to show Fast performance recovery Overheads Reusing only consistent cache pages CPU: 2 dual-core Opteron Memory: 12 GB Disk: Ultra 320 SCSI NIC: Gigabit Ethernet VMM: Xen 3.0.0 OS: Linux 2.6.12 domain 0domain U VMM disk cache-mapping table blkback blkfron t reuse bitmap page cache
14
Throughput of File Read We measured the read throughput of a 1GB file We rebooted the OS after the 3rd access Just after the reboot 4KB buffer size 8.7x higher throughput Only 16% degradation Recovery time is 1s
15
Throughput of File Write We measured the write throughput Just after the reboot 4KB buffer size Not improved Due to no read 2KB buffer size 8x higher throughput 33% degradation
16
Throughput of Mmap Read/write We measured the throughput of read/write of a memory-mapped file Just after the reboot Read 6x higher throughput 15% degradation Write 5x higher throughput 9% degradation 4KB buffer size
17
Overheads We measured the overheads for enabling the warm-cache reboot IOzone 0-13% for files 3-9% for mmap Writeback 0.4% for fsync 1.6% for msync 2KB buffer size
18
Worst-case Overheads We measured the overheads in extreme cases Partial writes to cache pages Cost for double caching or unprotecting 33% for 1 byte/page Unprotect-on-write for memory-mapped files Cost for extra page faults on write after read 25% for read & write
19
Throughput of a Web Server We measured the changes of the throughput during OS reboot 40% degradation for 90 seconds 5% degradation for 60 seconds
20
Fault Injection (1/2) We examined the reuses of inconsistent cache We injected various faults into the OS kernel First, we disabled the consistency mechanism Cache pages were often corrupted
21
Fault Injection (2/2) Next, we enabled the consistency mechanism Reused cache pages were inconsistent only for DST Ext3 failed to write back Faults were injected into ext3 Cache pages were not corrupted Reusing them is correct
22
Related Work Rio File Cache [Chen+ ASPLOS’96] Reusing dirty file cache after OS crash Relying on an OS OtherWorld [Depoutovitch+ EuroSys’10] Recovering application state after OS crash Relying on low probability of cache corruption Geiger [Jones+ ASPLOS’06] Inferring the page cache in the VMM Difficult to recognize cache eviction
23
Conclusion We proposed the warm-cache reboot It achieves fast performance recovery by reusing the page cache 8.7x faster recovery at maximum The VMM maintains the consistency of the page cache Consistent, or not corrupted at least Future work Reducing modification to an OS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.