vTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core Embedded Lab. Kim Sewoog Cong Xu, Sahan Gamage, Hui Lu, Ramana Kompella, Dongyan Xu 2013 USENIX Annual Technical Conference
Pay-as-you-go: Server Consolidation Save cost in running application and operational expenditure Multiple VMs sharing the same core CPU access latencyMotivation VM1VM2VM3VM4 Hypervisor(or VMM) Low I/O Throughput
Two basic stages Device interrupts are processed synchronously in the kernel Application asynchronously copies the data in kernel buffer I/O Processing VM1VM2VM3 CPU Time IRQ Processing Kernel Buffer Application IRQ processing delay
Effect of CPU Sharing on TCP Receive TCP Client HypervisorShared Buffer Scheduled VMs DATA VM1 VM2 VM3 DATA ACK IRQ Processing Delay
Effect of CPU Sharing on UDP Receive UDP Client HypervisorShared Buffer Scheduled VMs VM1 VM2 VM3 DATA Shared Buffer Full Dropped Application Buffer DATA
Effect of CPU Sharing on Disk Write ApplicationKernel MemoryDisk DriveScheduled VMs VM1 VM2 VM3 DATA Kernel Memory VM3 DATA IRQ Processing Delay
Reduce time-slice of each VM Causes significant context switch overhead Intuitive Solution
Our Solution: vTurbo
IRQ processing offloaded to a dedicated turbo core Turbo core : Any physical core with micro-slicing (e.g., 0.1 ms) Expose turbo core as a special vCPU to the VM Turbo vCPU runs on a turbo core Regular vCPUs run on regular cores Pin IRQ context of guest OS to turbo vCPU Benefits Improved I/O throughput (TCP/UDP, Disk) Self-adaptive system Our Solution: vTurbo
vTurbo Design
VM1VM2VM3 Regular Core VM3VM1VM2VM3VM1VM2 Turbo Core IRQ Buf Application Time Data
vTurbot’s Impact on Disk Write Application Kernel Memory vTurbo Regular Core VM1 VM2 Kernel Memory VM3 Disk Drive DATA VM1 VM2 VM3 VM1 VM2 VM3 VM1 VM2 VM3 VM1 VM2 VM3 VM1 VM2 VM3
Kernel Buffer Application Buffer Effect of CPU Sharing on UDP Receive UDP Client HypervisorShared Buffer Regular Cores VM1 VM2 VM3 DATA Shared Buffer vTurbo VM1 VM2 VM3 VM1 VM2 VM3 VM1 VM2 VM3 VM1 VM2 VM3 Kernel Buffer DATA
ACK Effect of CPU Sharing on TCP Receive TCP Client HypervisorShared Buffer Regular Cores VM1 VM2 VM3 vTurbo VM1 VM2 VM3 VM1 VM2 VM3 VM1 VM2 VM3 VM1 VM2 VM3 Kernel Buffer Backlog Queue Receive Queue Application Buffer Locked DATA
Turbo cores are not free Maintain CPU fair-share among VMs Calculate the credits on both regular and turbo cores Guarantee the CPU allocation on turbo cores Deduct I/O intensive VMs’ credits on regular cores Allocate the deduction to non-IO intensive VMs VM Scheduling Policy for Fairness
VM hosts 3.2 GHz Intel Xeon Quad-cores CPU, 16GB RAM Assign an independent core to driver domain(dom0) Xen Linux 3.2 Choose 1 core as Turbo core Gigabit Ethernet switch(10Gbps for 2 experiments) Evaluation
File Read/Write Throughput: Micro-Benchmark regular core turbo core
TCP/UDP Throughput : Micro-Benchmark
NFS/SCP Throughput : Application Benchmark
Apache Olio : Application Benchmark 3 components a web server to process user requests a MySQL database server to store user profiles and event information an NFS server to store images and documents specific to events
Conclusions Problem : CPU sharing affects I/O throughput Solution : vTurbo Offload IRQ processing to a turbo-sliced dedicated core Results : Improve UDP throughput up to 4x Improve TCP throughput up to 3x Improve Disk write up to 2x Improve NFS’ throughput up to 3x Improve Olio’s throughput by up to 38.7%
Reference CHENG, L., AND WANG, C.-L. “vbalance: Using interrupt load balance to improve i/o performance for smp virtual machine”, In ACM SoCC (2012) DONG, Y., YU, Z., AND ROSE, G. “SR-IOV networking in Xen: architecture, design and implementation”, In WIOV (2008). GORDON, A., AMIT, N., HAR’EL, N., BEN-YEHUDA, M., LANDAU, A., SCHUSTER, A., AND TSAFRIR, D. “ELI: baremetal performance for I/O virtualization”, In ACM ASPLOS(2012).
THANK YOU !