Memory-efficient Virtual Machine High Availability Karen Kai-Yuan Hou Prof. Kang G. Shin University of Michigan Mustafa Uysal (VMware) Arif Merchant (HP Labs) Sharad Singhal (HP Labs) 1
Protect VM from Host Failures Set up backup by primary VM replication Backup takes over execution promptly if primary fails High memory cost E.g. To protect a 1G VM, an additional 1G memory is reserved to just hold the backup. 2 App 1 Primary VM Hypervisor Primary Host App 2 App 1 Backup VM Hypervisor Backup Host App 2 Physical Host Failure
Use a Shared Storage “Maintain” backup VM in storage instead of RAM Improve resource and energy efficiency. Recover anywhere. 3 Other primary (active) VM App 1 Primary VM Hypervisor App 2 Host 1 Hypervisor Host 2 Shared Storage Hypervisor Host 2 Hypervisor Host n App 1 Backup VM App 2 App 1 Primary VM Hypervisor Host 1 App 2
Protection: Tracking Primary VM State Take checkpoints of the primary VM – Incremental, periodic, copy-on-write checkpoints 4 Primary VM App 1 App 2 VM memory space VM Fail-over Image
Fail-over: Bringing Up Backup VM Slim VM Restore – Load only necessary information and switch on backup VM quickly – Fetch pages on-demand as the backup VM executes 5 VM Fail-over Image Restored backup VM App 1 App 2 VM memory space
Improving I/O Efficiency with SSDs Small, random I/O’s are more efficient on SSDs 6 Primary Side Updating the VM image continuously. Restore Side Fetching from the VM image on-demand. VM Fail-over Image small, random writessmall, random reads
Preliminary Evaluation Prototype built on Xen Questions – How much overhead does continuous checkpointing introduce on the primary VM? – How does the shared storage support continuous updating of the fail-over image? – How quickly can our system bring up a backup VM? – How does the backup VM perform when it executes by fetching pages on-demand? 7
Checkpointing Overheads Kernel Compilation RUBiS 8
CoW and SSD Enhancements CoW reduces VM pause time for taking checkpoints Checkpoints commit faster on a SSD 9
Fail-over Time and Demand Fetching Time required to bring up a backup VM Overheads of fetching VM pages on-demand 10
Interesting Observations: Page Fetching Behavior How a VM uses (demand fetches) its pages while compiling a kernel: 11
Interesting Observations: Page Fetching Behavior What actually happens on disk (recorded by blktrace): 12
Conclusions s
Thank you! 14