1 Xen and Co.: Communication-aware CPU Scheduling for Consolidated Xen-based Hosting Platforms Sriram Govindan, Arjun R Nath, Amitayu Das, Bhuvan Urgaonkar, Anand Sivasubramaniam, Computer Systems Laboratory, The Pennsylvania State University.
2 Data centers Rent server resources Provide resource and performance guarantees Problem: Server sprawl Solution: Consolidation Reduce resource wastage Reduced floor space Better power management How?
3 Linux Hardware Windows Server virtualization 2-tiered e-commerce application Single tier streaming server Operating system Applications VMM Ability to create multiple virtual servers from a single physical server Allows consolidation by hosting heterogeneous OS instances over the same hardware Why now? Emergence of highly efficient virtual machine monitors Xen, VMware etc Hardware support Intel, AMD, IBM etc Real world example: Amazon EC2
4 Consolidation: How? Know what the applications need Ensure resource requirement of the applications are met
5 Consolidation: Example Consider a representative e-commerce benchmark, TPC-W, an online book store application Measure application resource needs and record performance, Run TPC-W tiers on dedicated servers Hardware VMM Clients Jboss Mysql Query Response Requests Record response times Record resource usage Responses
6 Consolidation: Example Hardware VMM Clients Jboss Mysql CPU utilization95 th Percentile Jboss~10% Mysql~20% Response time in seconds CDF
7 Consolidation: Example Hardware VMM Jboss mysql Clients Resource underutilized CPU intensive VMs Consolidate the TPC-W tiers on to a single server Use Hypervisor to ensure resource guarantees Reserve for the peak requirement Pack more applications to utilize the remaining server capacity 10% 20% Almost 100% Server Utilization Other resource requirements are also met
8 Consolidation: Example Clients Hardware VMM Jboss mysql CPU intensive VMs Response time in seconds CDF With consolidation Without consolidation Why did this happen?
9 Scheduler induced delays Jboss DB query1 reply1 query2 reply2 Network latency TPC-W tiers running on dedicated servers
10 Scheduler induced delays Jboss DB query1 reply1 query2 reply2 Network latency Jboss DB query1 reply1 query2 reply2 Scheduler induced delays TPC-W tiers running on dedicated servers Consolidated TPC-W tiers
11 Does this look familiar? Parallel systems: Gang scheduling/Co-scheduling Feitelson et al, Ousterhout et al, Andrea et al Schedulers: low latency dispatch eg. BVT, Duda et al Our contribution: Fairness guarantees – Applications pay for resources Self-tuning - reduced administrator intervention Adapt to varying application’s I/O behaviour Network I/O is virtualized – further increases the delays
12 Xen Virtual Machine Monitor Xen Hypervisor Domain 0/ Driver domain Modified Guest OS Modified Guest OS Modified Guest OS … Virtual machines I/O virtualization VM scheduler Virtual hardware (vCpu, vDisk, vNic, vMemory etc.) Physical hardware (Cpu, Disk, Nic, Memory etc.) Applications
13 Network Virtualization in Xen - Reception NIC Netback driver Netfront Driver Hardware drivers domain0 Guest VM Hypervisor Application Interrupt Notify Virtual Interrupt Packet delivery
14 Network Virtualization in Xen - Transmission NIC Netback driver Netfront Driver Hardware drivers domain0 Guest VM Application Packet send Send over virtual NIC Send over NIC
15 Scheduler induced Delays Delay associated with scheduling of Domain0 When a guest domain transmits a packet When a packet is received at the physical NIC Jboss Issues a query to db dom0 DB dom0
16 Scheduler induced Delays Delay associated with scheduling of Domain0 Delay at the recipient When Domain0 sends a packet to a guest domain Jboss Issues a query to db dom0 DB dom0
17 Scheduler induced Delays Delay associated with scheduling of Domain0 Delay at the recipient Delay at the sender Before a domain sends a network packet (on its virtual NIC). Unlike reception, sending a packet can only be anticipated.
18 Scheduler induced Delays Delay associated with scheduling of Domain0 Delay at the recipient Delay at the sender Network latency Jboss DB queryreply Scheduler induced delays with virtualization overhead Consolidated TPC-W tiers in a virtualized environment dom0 Jboss
19 Scheduler design Recall: Reservations must be provided Build on top of a reservation based scheduler -SEDF (slice, period) pair – need ‘slice ms’ every ‘period ms’ Communication aware SEDF scheduler: Enhance CPU scheduler to reduce scheduler induced delays Change scheduling order to preferentially schedule communicating domains Introduce short term unfairness Still preserve reservation guarantees over a coarser time scale - PERIOD
20 Scheduler Implementation Key idea: Associate impending network activity with each virtual machine Incorporate communication activity in to decision making Greedy Heuristic: Prefer VM that is likely to benefit the most – the VM with most pending packets
21 Communication aware scheduler Domain0 … Guest Domains Domain 1Domain 2Domain n - Reception NIC Packet arrive at the NIC Interrupt Domain0.pending++ Domain1.pending++ Now, schedule domain0. Schedule Domain 1. Hypervisor Domain0.pending-- domain1.pending--
22 Evaluation Environment Applications: TPC-W benchmark jboss and mysql tiers Multi-threaded UDP Streaming server, Simultaneously stream data at 3Mbps to specified number of clients Every client is provided with a 8MB buffer size Clients starts consuming data only when the buffer is full CPU intensive workloads, Used for illustrative purposes
23 Streaming media experiments - performance improvement Streaming to 45 Clients at 3Mpbs for 20 minutes Default scheduler suffered playback discontinuity every 1.5 minutes
24 Streaming media experiments - performance improvement Streaming to 45 Clients at 3Mpbs for 20 minutes Default scheduler suffered playback discontinuity every 1.5 minutes Communication-aware scheduler suffered a discontinuity only after 18 th minute
25 Streaming media experiments - improved consolidation A single buffer under run at the client is fixed as Service Level Objective (SLO) Communication aware scheduler is able to sustain 30 more clients than the default scheduler No. of clients supported at the server No. of buffer under runs at the client “SLO” ( Lower the better )
26 TPC-W performance TPC-W benchmark ran for 20 minutes Around 35 percent improvement in response time compared to the default scheduler Scheduler Average (secs) 95 th percentile (secs) Maximum (secs) Default SEDF Modified SEDF Percentage improvement %19.98 %51.15 %
27 Scheduler Fairness Evaluation CPU intensive Virtual Machine The CPU intensive VM lost less than 1% of CPU compared to the default scheduler but was still above their reservation which was 10% Just changing the order of scheduling resulted in huge response time improvement for the streaming server Time in minutes CPU utilization Reservation Default SEDF Modified SEDF
28 Conclusion A communication-aware CPU scheduler developed for a consolidated environment Low overhead run-time monitoring of network events by the hypervisor scheduler Addressed additional problems due to network I/O virtualization in Xen Source code (~300 lines) and Xen3.0.2 Patch available in the software link in,
29 Questions
30 Streaming media experiments - performance improvement Streaming to 45 Clients at 3Mpbs for 20 minutes Default scheduler suffered glitches every 1.5 minutes Communication-aware scheduler suffered a glitch only after 18 th minute With only domain0 optimization ON, glitch occurred at the 15 th minute
31 Communication aware scheduler Domain0 … Guest Domains Hypervisor Domain0’s book-keeping page … Domain 1Domain 2Domain n Guest domain book-keeping pages
32 Communication aware scheduler Domain0 … Guest Domains Hypervisor Domain0’s book-keeping page … Domain 1Domain 2Domain n - Reception NIC Packet arrive at the NIC Interrupt Domain0: network_reception_intensity++ Domain 1: network_reception_intensity ++ Now, schedule domain0. Domain 0 is de scheduled, now we are in the hypervisor.Schedule Domain 1. Receive packets. Domain 1 is de scheduled, now we are in the hypervisor. Update Packet reception. Update pending activity.
33 Communication aware scheduler Domain0 … Guest Domains Hypervisor Domain0’s book-keeping page … Domain 1Domain 2Domain n - Transmission Domain1: network_transmission intensity++ Domain0: network_transmission intensity++ Domain1: anticipated_network transmission_intensity++ Now domain 1 is de scheduled, we are in the hypervisor.
34 I/O Virtualization in Xen DiskNIC Backend driver Domain0 Frontend Driver Guest domain I/O Devices Shared pages Notify Transfer Hardware drivers