Download presentation
Presentation is loading. Please wait.
1
Virtualizing Mission-Critical Apps 1PM EST, 3/29/2011 Ilya Mirman Philip Thomas
2
2 Agenda The Rise of “The Virtualization Chasm” 3 Fundamental inefficiencies Best practices Live demonstration
3
Background
4
4 Before Virtualization 10 12 14 16 2 4 8 6 Capacity Traditional IT guarantees apps’ performance by – Dedicating physical machines (PM) to apps – Provisioning sufficient capacity to service peak loads Consider an app requiring 16 cores, 8GB memory and 10k IOPS (IO Per Sec) IO bandwidth to service its peaks PM Excess capacity to keep utilization under 80% Peak CPU Workload CPU capacity 16 cores Memory capacity: 8 GB IO capacity: 10k IOPS CPU Mem IO
5
5 Over-Provisioning Waste Workloads are ‘bursty’: Average/peak is often under 10% Dedicating hardware wastes the slack capacity between average & peak 10 12 14 16 2 4 8 6 Capacity PM Capacity over- provisioned for peak demands Average utilization: 10% Wasted capacity
6
6 Virtualization is Set to Resolve This Waste Consolidate workloads into shared PMs This increases average utilization additively But it also increases interference among VMs – E.g., Peak traffic of VM1 can interfere with CPU availability for other VMs VM1VM2VM3VM4VM5VM6VM7VM8VM9VM10 2 4 8 6 Peak Workloads of VMs PMs Consolidate into shared PMs
7
7 VMs Compete for Resources Best-effort resource allocations (vs. dedicated) – VMs get their allocations, if capacity is available – VMs experience interference when capacity is insufficient Interference can create congestion, bottlenecks and delays Performance-insensitive apps can tolerate interference – Permit simple, risk-free virtualization But mission-critical apps are highly vulnerable to interference!
8
8 The Rise of “The Virtualization Chasm” Percentage Apps Virtualized 20% 80% 100% ROI 40% Production Apps “The Virtualization-Chasm” Virtualization 1.0Virtualization 2.0 Virtualization 1.0: Virtualize performance-insensitive apps – E.g., Print servers, non-critical web apps (The low-hanging fruits) – 20%-30% of enterprise apps Performance- Insensitive Apps Virtualization 2.0: Virtualize production apps – The remaining 70%-80% important/critical production apps
9
Virtualizing Mission-Critical Apps
10
10 The Key Challenge: Ensuring That Production Apps Get Their Resources Interference results from statistical over-commitment – Apps’ demands can exceed capacity momentarily Interference may be controlled by two mechanisms – Resource allocation: protect apps against over-commitment – Workload placement: move workloads to minimize interference Let’s take a look at recommendations from the hypervisor vendors…
11
11 VMWare Best Practices: Managing Productions Apps Performance Best Practice Guide to Exchange Server Virtualization: http://www.vmware.com/files/pdf/Exchange_2010_ on_VMware_-_Best_Practices_Guide.pdf “It is recommended that standalone servers…be designed to not exceed 70% utilization during peak period.” Assure Peak Utilization: Avoid Over-Commitment: “For performance-critical Exchange virtual machines (i.e., production systems), try to ensure the total number of vCPUs assigned to all the virtual machines is equal to or less than the total number of cores on the host machine.”
12
12 VMWare Best Practices: Managing Productions Apps Performance VMWare Production Apps Strategy Rests on 2 Rules: VMs running production apps should ensure that: “Resource allocations are sufficient to serve peak demands.” R-IR-I “Aggregate allocations do not exceed the PM capacity.” R-IIR-II R-I guarantees that an app may get its peak demands served, if capacity is available. R-II guarantees that the capacity allocation will be available. i.e., if VM1 and VM2 each need 4 vCPUs, we need a PM with ≥8 CPUs!
13
13 Wait….Really? Then why virtualize? Though there’s no sharing of resources, still enjoy the other benefits of virtualization (app isolation, VM set-up, back-up, etc.) “Resource allocations are sufficient to serve peak demands.” R-IR-I “Aggregate allocations do not exceed the PM capacity.” R-IIR-II R-I guarantees that an app may get its peak demands served, if capacity is available. R-II guarantees that the capacity allocation will be available.
14
14 Virtualization Can Result in 3 Fundamental Inefficiencies Over-provisioning inefficiency Workload packing inefficiency Non-adaptive control inefficiency 1. 2. 3. These fundamental inefficiencies are considered next…
15
Over-provisioning Inefficiency
16
16 How to Avoid Over-Provisioning Waste? To Avoid Waste: Increase average workload without increasing reservations – Add performance- insensitive apps with high average workload – E.g., consolidate spam- filter apps, email archival apps alongside mission- critical apps Need additional best practice rule: Smart consolidation Best Practice #1: Maintain a consolidation- balance between performance-sensitive and insensitive workloads Best Practice #1: Maintain a consolidation- balance between performance-sensitive and insensitive workloads
17
Workload-Packing Inefficiency
18
18 A Greatly Simplified Example 2 4 8 6 10 12 14 16 PM1 PM2PM3 2 4 8 6 VM1 VM2 VM3 VM4 VM5 VM6 Virtualized Workloads Manual Ad-Hoc Workload Assignment CPU capacity: 16 cores Memory capacity: 8 GB IO capacity: 10k IOPS
19
19 What If We Get New VMs? 2 4 8 6 10 12 14 16 PM1PM2PM3 Can we do better? Optimized assignment uses 40% less resources (3 PM vs. 5) 2 4 8 6 10 12 14 16 PM1 PM2PM3PM4PM5 Ad Hoc Assignment VM7VM8VM9VM10 2 4 8 6
20
20 What Can We Learn from This Example? Changes may require (re-)assignment of workloads Even a trivialized example can be very complex Complexity and waste can grow dramatically – When the number of VMs increases – When physical machines vary – When there are constraints (e.g., storage access, security policies) – When the rate of changes is high Ad hoc processes can lead to costly inefficiencies Planning and workload placement must consider all workload types (not just CPU)
21
21 Overcoming the Packing Inefficiency Use improved workload placement algorithms – Look holistically at all workloads and resources – Exploit the flexibility of performance-insensitive workloads – Exploit the dynamics of workloads peaks & troughs Best Practice #2: Use improved workload placement algorithms Best Practice #2: Use improved workload placement algorithms
22
Non-adaptive Control Inefficiency
23
23 1 1516 17 1819 20 21 2223 24 01 0203 04 05 0607 08 09 1011 12 1314 10 k-IOPS Rate Time Mission-Critical App Example Virtualized MS Exchange app High IOPS during the night (2AM-5AM) – Peak: 10 k-IOPS – <1 k-IOPS during the rest of the time
24
24 What If Workloads Grow? Can we do better? Optimized assignment uses 25% less resources 2 4 8 6 10 12 14 16 PM1 PM2PM3PM4 VM1VM2VM3VM4VM5 VM6 2 4 8 6 What if VM1 needs more memory & storage? 2 4 8 6 10 12 14 16 PM1 PM2PM3
25
25 Adaptive vs. Non-Adaptive Workload Control Workloads demands (and interference) change over time – E.g., Exchange server is active through the night – Why keep its reservation during the day? Static workload mgmt is limited in handling emergent problems – Apps profiles reflect long-term statistics; fluctuations can cause interferences Adaptive workload control offers superior mgmt – Exploit workload dynamics to reduce waste of static policies – Eliminate emergent interferences Best Practice #3: Provide adaptive control to optimize resource use & avoid interference Best Practice #3: Provide adaptive control to optimize resource use & avoid interference Best Practice #4: Use of forward looking workload projection Best Practice #4: Use of forward looking workload projection
26
26 Adaptive Control: Too Complex for Manual Management Manual management requires administrators to: – Master voluminous details of hypervisor and applications internals – Manage interference and waste problems manually – Manage resource allocations and move applications as workloads change – Maintain tight-coordination between virtualization & app administrators This complexity is a central barrier for Virtualization 2.0 !!!
27
Virtualizing Production Apps: Improved Best Practices
28
28 Conclusions Workload placement can be very inefficient – Over-provisioning waste; workload-packing waste; non-adaptive inefficiencies Virtualization is much too complex for manual administration Must be augmented by workload management: – Eliminate the over-provisioning waste through balanced consolidation – Minimize the workload-packing waste by exploiting workload features – Support adaptive control to optimize resource use & avoid interference Virtualization 2.0 Strategy: Replace manual mgmt with automated optimized workload management
29
Live Demonstration
30
Thank you! www.vmturbo.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.