Split Migration of Large Memory Virtual Machines

Split Migration of Large Memory Virtual Machines
Masato Suetake　Hazuki Kizu　 Kenichi Kourai Kyushu Institute of Technology I’m Masato Suetake from Kyushu Institute of Technology. I‘m gonna talk about Split Migration of Large Memory Virtual Machines.

Large Memory VMs Recent IaaS clouds provide virtual machines (VMs) with a large amount of memory E.g., new X1 instances (2 TB) in Amazon EC2 Such VMs are required for Big data analysis using Apache Spark In-memory database, e.g., SAP HANA Infrastructure-as-a-Service clouds provide virtual machines to users. Many VMs are consolidated into a small number of hosts to reduce costs. Recently, IaaS clouds also provide VMs with a large amount of memory. For example, in Amazon EC2, new X1 instances have 2 TB. Such large memory VMs are required for big data analysis, for example, using Apache Spark. Big data can be analyzed faster by maintaining data in memory as much as possible. Another application of large memory VMs is in-memory databases such as SAP HANA. VM VM VM VM VM

Migration of Large Memory VMs
Large memory VMs make VM migration difficult Not cost-efficient to always reserve hosts with a large amount of free memory If they cannot be migrated... Big data analysis is disrupted for a long time The whole data in memory is lost after restart However, large memory VMs make VM migration difficult. One issue is a long migration time. This can be resolved by parallel migration and fast networks. The other unresolved issue is the availability of the destination host. VM migration needs sufficient free memory at the destination host But always reserving hosts with a large amount of free memory isn't cost-efficient. If large memory VMs cannot be migrated, they have to be stopped during host maintenance. Then, big data analysis is disrupted for a long time. In addition, the whole data in memory is lost. It takes much time to restore the lost data in memory by reading disks or redoing computation. This largely degrades performance for a long time after VM restarts. free memory migration 1 TB source host destination host 2 TB VM core

VM Migration with Virtual Memory
Virtual memory allows a larger amount of memory than physical memory Incompatible with VM migration Page-outs occur regardless of VM's access pattern (1st iteration) Read-only pages tend to be paged out Degrade performance during/after VM migration When there is no sufficient free memory at the destination host, the virtual memory technology can be used. Virtual memory allows the system to use a larger amount of memory than physical memory. It pages out the memory pages that cannot be stored in physical memory to disks. However, virtual memory is incompatible with VM migration. In the first iteration of VM migration, all the memory pages are transferred in order. Therefore, memory pages are unconditionally paged out, regardless of VM's memory access pattern. In the following iterations, updated pages are re-transferred and overwritten with page-ins. When VM migration is completed, frequently updated pages tend to reside in physical memory. In contrast, frequently accessed read-only pages tend to be paged out. Therefore, excessive paging degrades the performance during and after VM migration. migration 2TB 1TB source host destination host VM core disk

VM Migration with Remote Paging
Remote paging can use multiple hosts with a small amount of free memory May be faster than paging with local disks Also incompatible with VM migration Even pages stored in swap hosts are transferred via the destination host The network bandwidth is consumed To reduce the overhead of paging with disks, remote paging has been proposed. It pages in and out memory pages from and to the memory at other hosts, instead of local disks. If the network is fast enough, remote paging is faster than paging with slow disks. However, remote paging is also incompatible with VM migration. During VM migration, the destination host has to transfer many paged-out pages to swap hosts. Therefore, the network bandwidth between the destination host and swap host is consumed. migration 2 TB 1 TB source host destination host swap host VM core free memory paging

S-memV Split migration Remote paging
Migrate a large memory VM using multiple hosts One main host for running a VM Zero or more sub-hosts for providing memory Divide VM's memory using its access pattern Remote paging Swap pages between the main and sub-hosts To solve this problem, we propose S-memV for enabling split migration of large memory VMs. Split migration migrates a large memory VMs using multiple hosts. The multiple hosts consist of one main host and zero or more sub-hosts. The main host runs a VM and the sub-hosts provide memory. Split migration divides VM's memory into smaller pieces, considering VM's memory access pattern. Then it directly transfers them to the main host or sub-hosts. After split migration, S-memV performs remote paging when the VM needs a non-existent page at the main host. It swaps the requested memory page at a sub host with an infrequently accessed page at the main host. migration memory main host sub-host VM core

One-to-N Migration Migrate a VM to multiple hosts To the main host
VM's core information (CPU/device states) frequently accessed pages To the sub-hosts Pages that cannot be accommodated in the main host The transfers are done in parallel migration source host main host 1 TB sub-host VM core 2 TB In one-to-N migration, split migration migrates a large memory VM from one host to multiple hosts. It transfers core information such as CPU and device states to the main host. It also transfers frequently accessed memory pages to the main host. On the other hand, split migration transfers memory pages that cannot be accommodated in the main host to the sub-hosts. To appropriately divide VM's memory, S-memV monitors VM's memory access pattern and predicts future access. To migrate a large memory VM faster, split migration transfers memory pages to multiple hosts in parallel.

Aware of Remote Paging Not occur at all during VM migration
Each page is directly transferred to either the main or sub-host Less likely to occur just after VM migration Frequently accessed pages are stored in the memory of the main host Depending on the working set Unlike traditional VM migration, split migration is aware of remote paging. Remote paging doesn't occur at all during VM migration. Any memory pages are not paged out from the main host to sub-hosts. Instead, each page is directly transferred to the main host or sub-host. Therefore, there is no wasteful network transfer between the main host and sub hosts. This enables fast VM migration. In addition, remote paging is less likely to occur after VM migration. During VM migration, frequently accessed pages are transferred to the main host. This enables keeping VM performance after split migration. We should note that the performance after VM migration depends on the working set size. main host sub-host VM core paging working set memory memory

N-to-One Migration Migrate a VM from multiple hosts to one
From the main host Normal migration except for non-existent pages From the sub-hosts Simple memory transfer Transfer pages without redundancy or omission Even for those paged in/out during migration In N-to-one migration, split migration migrates a large memory VM from multiple hosts to one host again. This is used after the maintenance of the originally used host is completed or when another host with sufficient free memory is prepared. From the main host, split migration normally migrates a VM, except it doesn't transfer non-existent memory. From the sub-hosts, split migration simply transfers part of the VM's memory. For memory pages paged in and out during VM migration, split migration transfers them without redundancy or omission. migration main host destination host sub-host VM core 2 TB 1 TB 1 TB paging

Partial Migration Migrate the whole/part of a VM across multiple hosts to different hosts From the main host One-to-N migration of a VM with partial memory From sub-hosts Only memory transfer to the destination sub-hosts 1 TB sub-host main host In partial migration, split migration migrates the whole or part of a large memory VM across multiple hosts to different hosts. This is used when some or all of those hosts need to be maintained. This figure shows an example of partial migration only from the main host. When a VM is migrated from the main host, one-to-N migration is performed for the VM with partial memory. Animation After VM migration, the VM runs across the destination main host, the destination sub-host, and the source sub-host. When part of a VM is migrated from sub-hosts, split migration transfers only its memory pages to the destination sub-hosts. new main host new sub-host VM core VM core 1 TB 512 GB 512 GB paging paging

System Architecture of S-memV
QEMU-KVM at the main host Support one-to-N migration Maintain the page location of a VM Run a VM with remote paging Memory servers at the sub-hosts Manage part of the memory of a VM Handle page-in/-out requests Host management server Choose sub-hosts main host VM core memory QEMU-KVM sub host memory Here is the system architecture of S-memV. We have implemented S-memV in QEMU-KVM running at the main host. Currently, our QEMU-KVM supports one-to-N migration of split migration. It maintains the page location of a VM during and after VM migration. It enables a VM to run with remote paging after VM migration. At each sub-host, a memory server runs and manages part of VM's memory. It handles page-in and page-out requests from the main host. The host management server runs in a cloud. It is used for choosing appropriate sub-hosts. It manages free memory of all the hosts, network latency between hosts, and so on. memory server

Collecting Memory Access Data
S-memV keeps track of memory access inside a VM Examine access bits in the extended page tables (EPT) for the VM Use the collected access history for Split migration Recently used pages are to the main host Remote paging Least recently used pages are paged out QEMU-KVM Linux KVM EPT VM To accommodate frequently accessed memory pages in the main host as much as possible, S-memV keeps track of pages recently used by a VM. Our QEMU-KVM periodically invokes KVM to obtain memory access data. KVM traverses the extended page tables for the VM and examines the access bits. S-memV uses the collected access history for two purposes. During split migration, recently used pages are transferred to the main host. For remote paging, least recently used pages are paged out to sub-hosts.

Remote Paging with userfaultfd
QEMU-KVM receives an event when a VM accesses a non-existent page Using userfaultfd introduced in Linux 4.3 It sends a page-in request to a sub-host Write received data to the faulting page Send a page-out request later to the sub-host main host sub-host To achieve remote paging, S-memV uses the userfaultfd mechanism, which was introduced in Linux 4.3. QEMU-KVM first registers the memory of a VM to userfaultfd. When the VM accesses a non-existent page, a page fault occurs. At this time, QEMU-KVM receives an event from userfaultfd. Then, QEMU-KVM sends a page-in request to the sub-host that manages the corresponding memory page. When receiving the data of the page, QEMU-KVM writes the date to the faulting page. Later, it sends a page-out request to the sub-host to balance the amount of memory. VM page out QEMU-KVM Linux kernel event fault memory page in memory server paging request

Experiments We examined the performance of split migration
Baseline: VM migration with sufficient memory Comparison: VM migration with virtual memory We used a VM with 1 vCPU and 2 GB of memory source host destination main host sub-host CPU Xeon E3-1270v3 Xeon E3-1270v2 Intel Xeon E5640 Memory 16 GB 2 GB or 4 GB (~1 GB used) HDD SATA 600 GB OS Linux 4.3 Virtualization QEMU-KVM 2.4.1 - To examine the performance of split migration, we measured the migration time and downtime in S-memV. As a baseline, we measured the migration performance when the destination host had sufficient memory. For comparison, we also executed VM migration with virtual memory. We used these three PCs. The destination main host had 2 GB of memory, but the free memory was about 1 GB. We used a VM with 2 GB of memory.

Migration Performance (Idle)
We measured performance for an idle VM VM migration with virtual memory 87% longer migration time / 2.9x longer downtime Large degradation even in fewer memory re-transfers Split migration 17% longer migration time / 0.1s longer downtime Performance degradation was suppressed First, we measured the migration performance of an idle VM. We configured S-memV to transfer 1 GB of memory to the main host and the rest to the sub-host. The left figure shows the migration time and the right figure shows the downtime. Compared with using sufficient memory, VM migration with virtual memory increased the migration time by 87% and the downtime by 2.9 times. This means that using virtual memory largely degrades the migration performance even for an idle VM. For split migration, the migration time was only 17% longer and the downtime increased only by 0.1 seconds. Split migration suppressed for performance degradation.

Migration Performance (Busy)
We stressed memcached in a VM VM migration with virtual memory 5.4x longer migration time / 3.6x longer downtime The variance was very large due to paging Split migration 17% longer migration time / 49% shorter downtime The reason of shorter downtime is under investigation Next, we stressed memcached in a VM using the memaslap benchmark. Then, we measured the migration performance. Compared with using sufficient memory, using virtual memory increased the migration time by 5.4 times and the downtime by 3.6 times. In addition, the variance was very large due to complex paging behavior. For split migration, the increase in the migration time was still 17%. On the other hand, the downtime was 49% shorter. This reason is under investigation.

Collection of Memory Access Data
We measured the time for collecting access data on VM's memory It took more time when more pages were used 3 ms for 2 GB of memory The overhead is 0.3% if data is collected every second Estimation 3s for 2 TB of memory? Probably less time EPT shrinks when pages are not accessed We also measured the time for collecting access data on VM’s memory by traversing EPT. When the VM was idle, the collection time was 1 ms. But, when memcached frequently accessed VM's memory, the time increased to 3 ms. This means that it took more time when more pages were used. If S-memV collects this data every second, the overhead is 0.3%. From this result, we can estimate that the collection time would become 3 seconds for a VM with 2 TB of memory. Fortunately, the collection time will be less than this estimation because EPT shrinks when pages are not accessed.

VM Performance after Migration
We estimated the performance of a VM with remote paging from [12] Baseline: performance in sufficient memory Quick sort is times slower The working set is much larger than local memory Barnes is almost not degraded The working set is slightly larger than local memory Finally, we discuss the performance of a VM with remote paging. Since our implementation is incomplete, we estimated the performance using the results of previous work. The baseline is the performance when the main host has sufficient memory. When the working set is much larger than local memory, quick sort is 1.5 to 2 times slower. However, when the working set is slightly larger, Barnes is almost not degraded. From these previous results, we expect that S-memV would not largely degrade VM performance if the working set is not too large.

Related Work Post-copy migration [Hines+ VEE'09]
Special case of one-to-N migration Need two hosts with a large amount of memory Scatter-Gather migration [Deshpande+ CLOUD'14] Similar to one-to-N migration Finally transfer the whole memory to one host MemX [Deshpande+ ICPP'10] Run a VM using the memory of multiple hosts Support inflexible partial migration Post-copy migration immediately switches the execution of a VM to the destination host. Then it transfers the memory from the source host on demand. Since this migration runs a VM using two hosts, this is a special case of one-to-N migration. However, post-copy migration needs two hosts with a large amount of memory. Scatter-Gather migration uses multiple intermediate hosts between the source and destination hosts in post-copy migration. This is similar to one-to-N migration, but Scatter-Gather migration finally transfers the whole memory of a VM to only one destination host. MemX can run a VM using the memory of multiple hosts. This system supports partial migration from some of the hosts. However, VM's memory in the main host is transferred only to a new main host.

Conclusion Split migration S-memV supports one-to-N migration
Divide the memory of a large memory VM Directly migrate the pieces using multiple hosts Aware of remote paging Achieve fast VM migration Keep VM performance after migration S-memV supports one-to-N migration The performance was comparable to VM migration with sufficient memory Much better than using virtual memory In conclusion, we proposed S-memV for enabling split migration of large memory VMs. It divides the memory of a large memory VM and directly migrates the memory pieces using multiple hosts. Since split migration is aware of remote paging, S-memV achieves fast VM migration and keeps VM performance after migration. Currently, S-memV supports one-to-N migration. According to our experimental results, the migration performance was comparable to VM migration with sufficient memory. It was much better than VM migration with virtual memory.

Future Work Integrate several mechanisms into S-memV Evaluate S-memV
Collecting memory access data of VMs Remote paging Evaluate S-memV Show that page-ins/outs are reduced Support N-to-one and partial migration Need to synchronize multiple source hosts Recover from failures during split migration Switch only failed destination hosts to others One of our future work is to integrate mechanisms for collecting memory access data of VMs and remote paging into S-memV. We have implemented these mechanisms, but they are still incomplete. Second, we will evaluate split migration for a larger memory VM using real-world workloads. We need to show that page-ins and page-outs are reduced. Third, S-memV has to support N-to-one and partial migration. Unlike one-to-N migration, a new mechanism is needed for synchronizing multiple source hosts during VM migration. Finally, S-memV should recover from failures during split migration. It may be preferred to switch only failed destination hosts to other hosts. Thank you for listening!!! Do you have any question? Questiojn: I’m not following you. can you rephrase that more simply? what is your question? sorry, I didn’t understand that. could you go over that again, more slowly? are you asking [ - ]? I don’t have an answer at the moment.

Split Migration of Large Memory Virtual Machines

Similar presentations

Presentation on theme: "Split Migration of Large Memory Virtual Machines"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Split Migration of Large Memory Virtual Machines

Similar presentations

Presentation on theme: "Split Migration of Large Memory Virtual Machines"— Presentation transcript:

Similar presentations

About project

Feedback