Download presentation
Presentation is loading. Please wait.
Published byElmer Hoxworth Modified over 10 years ago
1
VirtualKnotter: Online Virtual Machine Shuffling for Congestion Resolving in Virtualized Datacenter Xitao Wen, Kai Chen, Yan Chen, Yongqiang Liu, Yong Xia, Chengchen Hu 1
2
Datacenter as Infrastructure 2
3
Congestion in Datacenter 10:1~100:1 2:1~10:1 Packet loss! Queuing delay! Degrading Throughput! 3
4
Congestion in the Wild 4 General Approaches Problem Formulation Main Design Evaluation
5
Spatial Pattern Unbalanced utilization – Hotspot: Hot links account for <10% core links [IMC10] – Spatially unbalanced utilization 5 Sender Receiver
6
Temporal Pattern Long congestion event – lasts for 10s of minutes – Individual event has clear spatial pattern 6 Core Link Index
7
Traffic Stability Bursty at a fine granularity – Not predictable at 10s or 100s or milliseconds [IMC10][SIGCOMM09] Predictable at timescale of 10s of minutes – 40% to 70% pairwise traffic can be expected stable – 90%+ predictable traffic aggregated at core links 7
8
8 General Approaches Problem Formulation Main Design Evaluation Congestion in the Wild
9
General Approaches Network Layer – Increase network bandwidth Fat-tree, BCube, OSA… – Optimize flow routing Hedera, MicroTE Application Layer – Optimize VM placement Expensive Requires to upgrade entire DC network Expensive Requires to upgrade entire DC network Not scalable Requires hardware support Depends on rich path diversity Not scalable Requires hardware support Depends on rich path diversity Scalable Lightweight deployment Suitable for existing over- subscribed network Scalable Lightweight deployment Suitable for existing over- subscribed network 9
10
Virtualization Layer VM Live Migration – Keep continuous service while migrating – 1.1x – 1.4x VM memory transfer Server VM Server DC Network VM Major Cost! 10 Background on Virtualized DC
11
Optimize VM Placement Offload traffic from congested link active VM idle VM 11
12
Congestion in the Wild General Approaches Problem Formulation 12 Main Design Evaluation
13
Design Goal Mitigate congestion – Maximum link utilization (MLU) Controllable migration traffic (i.e. moving VM) – Less than reduced traffic Reasonable runtime overhead – Far less than target timescale (10s of mins) Objective Constraint 13
14
Problem Statement Input – Topology and routing of physical servers – Traffic matrix among VMs – Current Placement Variable & Output – Optimized Placement NP-hardness – Proof: reduced from Quadratic Bottleneck Assignment Problem 14
15
Related Work Optimize VM placement – Server consolidation [SOSP07] – Fault tolerance [ICS07] – Network scalability [INFOCOM10] 15
16
Main Design 16 Evaluation Congestion in the Wild General Approaches Problem Formulation
17
Inspiration Stretch the tie violently, making it loose and less tangled. Solve each tie gently, by carefully reeving the end out of the tie. 17
18
Two-step Algorithm Fast and greedy Search for localizing overall traffic May stuck in local minimum Fast and greedy Search for localizing overall traffic May stuck in local minimum Fine-grained and randomized Search for mitigating traffic on the most congested links Help avoid local minimum Fine-grained and randomized Search for mitigating traffic on the most congested links Help avoid local minimum 18
19
Multiway Θ-Kernighan-Lin (KL) Top-down graph cut improvement Introduce Θ to limit # of moves O(n 2 log(n)) Top-down graph cut improvement Introduce Θ to limit # of moves O(n 2 log(n)) 19
20
Multiway Θ-Kernighan-Lin (KL) Top-down graph cut improvement Introduce Θ to limit # of moves O(n 2 log(n)) Top-down graph cut improvement Introduce Θ to limit # of moves O(n 2 log(n)) 20
21
Multiway Θ-Kernighan-Lin (KL) Top-down graph cut improvement Introduce Θ to limit # of moves O(n 2 log(n)) Top-down graph cut improvement Introduce Θ to limit # of moves O(n 2 log(n)) 21
22
MLU=.60 MLU=.53 Simulated Annealing Searching (SA) Randomized global searching Terminate when obtains satisfied solution, or predefined max depth is reached Randomized global searching Terminate when obtains satisfied solution, or predefined max depth is reached 22
23
Evaluation 23 Congestion in the Wild General Approaches Problem Formulation Main Design
24
Methodology Baseline Algorithm – Clustering-based algorithm – Pro: best-known static optimality – Con: high runtime and migration overhead Metrics – MLU reduction without migration overhead – Overhead Migration traffic Runtime overhead – Simulation results 24
25
MLU Reduction without Overhead 25 VirtualKnotter demonstrates similar static performance as that of Clustering.
26
Migration Traffic 26 VirtualKnotter shows significantly less migration traffic than that of Clustering.
27
Runtime Overhead 27 VirtualKnotter demonstrates reasonable runtime overhead.
28
Simulation Results 53% less congestion 28 Altogether, VirtualKnotter obtains significant gain on congestion resolving.
29
Conclusions Collaborative VM migration can substantially resolve long-term congestion in DC Trade-off between optimality and migration traffic is essential to harvest the benefit DC networking projects of Northwestern LIST: http://list.cs.northwestern.edu/dcn 29
30
Thank you! 30
31
Backup 31
32
General Approaches Cost Hardware Support Scalability Other Dependency Increase Bandwidth HighYesVaries Optimize Routing LowYesLow Rich path diversity Optimize VM Placement LowNoHigh VM deployment 32
33
Problem Statement Objective – Minimize Maximum Link Utilization (MLU) – Cool down the hottest spot Constraints – Migration traffic – Server hardware capacity – Inseparable VM NP-hardness – Proof: reduced from Quadratic Bottleneck Assignment Problem 33
34
Observation Summary Unbalanced jam (spatial) Long-term congestion (temporal) Predictable at 10s of minutes scale (stability) 34
35
Two-step Algorithm Multiway Θ-Kernighan-Lin Algorithm (KL) Fast search for approximation Simulated Annealing Searching (SA) Fine search for better solution 35
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.