Fast and Correct Performance Recovery of Operating Systems Using a Virtual Machine Monitor Kenichi Kourai Kyushu Institute of Technology, Japan.

Fast and Correct Performance Recovery of Operating Systems Using a Virtual Machine Monitor Kenichi Kourai Kyushu Institute of Technology, Japan

Recovery by OS Reboot  OS reboot is a final but powerful recovery technique  For recovery from OS crashes  Against Mandelbugs  A rebooted OS rarely crashes again  For software rejuvenation  Against aging-related bugs  A rebooted OS restores its normal state reboot recovered OS crash memory leak

Performance Degradation  OS reboot degrades the performance of file accesses  Disk access increases due to frequent cache misses  The page cache on memory is lost  It takes a long time to fill the page cache  Disk access also degrades the performance of the other virtual machines (VMs) page cache reboot slow disk VM

Performance Recovery Required  OS recovery does not complete until the performance is also recovered  Traditional OS reboot restores only the functionalities  Fast reboot techniques have been proposed...

Warm-cache Reboot  A new OS recovery mechanism with fast performance recovery  It preserves the page cache during OS reboot  An OS can reuse it after the reboot  It guarantees the consistency of the page cache  Using the virtual machine monitor (VMM) CacheMind VMM VM page cache reboot page cache discard corrupted cache

Reusing the Page Cache  Collaboration between an OS and the VMM  cmLinux registers cache information to the VMM  On reboot, the VMM re-allocates the same memory  cmLinux reserves the memory for the old page cache  cmLinux searches the old page cache before disk reads CacheMind VMM register page cache cmLinux reboot re-allocate old page cache meta data

Cache Consistency  Only consistent cache pages are reused  Our definition: consistent if the contents of a cache page are the same as those in a disk  Consistent when a file block is read from a disk  Inconsistent when the cache page is modified  Consistent when it is written back to a disk disk cmLinux page cache read write back modify

Reusability Management (Read)  The VMM makes a cache page reusable after it reads data from a disk  It protects the page before the read  To detect page corruption during the read  The VMM can write data to the page VMM read request read request disk possible corruption reusable protect read protect read cmLinux

Reusability Management (Write)  The VMM makes a cache page non-reusable when an OS modifies its contents  It unprotects the page at the same time  To enable the OS to modify the page  It makes the page reusable again after it writes back the contents VMM modify request unprotect write back VMM write request disk protect

More Checks for Cache Reuse  Isn’t the cache page mapped elsewhere in a writable manner?  The VMM counts writable mapping  Hasn’t the cache page been mapped in a writable manner since protected?  The VMM maintains a canary bit as a history VMM read/write request protect read/write corruption map unmap

Reusability Management (Mmap)  cmLinux uses unprotect-on-write to exactly detect writes to memory-mapped files  It maps a cache page with protection on a read  It unprotects the page on a fault by the first write  It protects the page again after msync RW read write msync VMM reuse RO RW no reuse RO

Optimization: Double Caching  cmLinux writes data to a new cache page if the original page is protected  This can delay unprotecting the original page until writeback  The improvement depends on written bytes VMM page cache originalnew more than 1.5KB cmLinux

Experiments  We have developed CacheMind using Xen  We conducted several experiments to show  Fast performance recovery  Overheads  Reusing only consistent cache pages CPU: 2 dual-core Opteron Memory: 12 GB Disk: Ultra 320 SCSI NIC: Gigabit Ethernet VMM: Xen 3.0.0 OS: Linux 2.6.12 domain 0domain U VMM disk cache-mapping table blkback blkfron t reuse bitmap page cache

Throughput of File Read  We measured the read throughput of a 1GB file  We rebooted the OS after the 3rd access  Just after the reboot  4KB buffer size  8.7x higher throughput  Only 16% degradation  Recovery time is 1s

Throughput of File Write  We measured the write throughput  Just after the reboot  4KB buffer size  Not improved  Due to no read  2KB buffer size  8x higher throughput  33% degradation

Throughput of Mmap Read/write  We measured the throughput of read/write of a memory-mapped file  Just after the reboot  Read  6x higher throughput  15% degradation  Write  5x higher throughput  9% degradation 4KB buffer size

Overheads  We measured the overheads for enabling the warm-cache reboot  IOzone  0-13% for files  3-9% for mmap  Writeback  0.4% for fsync  1.6% for msync 2KB buffer size

Worst-case Overheads  We measured the overheads in extreme cases  Partial writes to cache pages  Cost for double caching or unprotecting  33% for 1 byte/page  Unprotect-on-write for memory-mapped files  Cost for extra page faults on write after read  25% for read & write

Throughput of a Web Server  We measured the changes of the throughput during OS reboot 40% degradation for 90 seconds 5% degradation for 60 seconds

Fault Injection (1/2)  We examined the reuses of inconsistent cache  We injected various faults into the OS kernel  First, we disabled the consistency mechanism Cache pages were often corrupted

Fault Injection (2/2)  Next, we enabled the consistency mechanism  Reused cache pages were inconsistent only for DST  Ext3 failed to write back  Faults were injected into ext3  Cache pages were not corrupted  Reusing them is correct

Related Work  Rio File Cache [Chen+ ASPLOS’96]  Reusing dirty file cache after OS crash  Relying on an OS  OtherWorld [Depoutovitch+ EuroSys’10]  Recovering application state after OS crash  Relying on low probability of cache corruption  Geiger [Jones+ ASPLOS’06]  Inferring the page cache in the VMM  Difficult to recognize cache eviction

Conclusion  We proposed the warm-cache reboot  It achieves fast performance recovery by reusing the page cache  8.7x faster recovery at maximum  The VMM maintains the consistency of the page cache  Consistent, or not corrupted at least  Future work  Reducing modification to an OS

Fast and Correct Performance Recovery of Operating Systems Using a Virtual Machine Monitor Kenichi Kourai Kyushu Institute of Technology, Japan.

Similar presentations

Presentation on theme: "Fast and Correct Performance Recovery of Operating Systems Using a Virtual Machine Monitor Kenichi Kourai Kyushu Institute of Technology, Japan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fast and Correct Performance Recovery of Operating Systems Using a Virtual Machine Monitor Kenichi Kourai Kyushu Institute of Technology, Japan.

Similar presentations

Presentation on theme: "Fast and Correct Performance Recovery of Operating Systems Using a Virtual Machine Monitor Kenichi Kourai Kyushu Institute of Technology, Japan."— Presentation transcript:

Similar presentations

About project

Feedback