Authors: Xiaoqiao Meng, Vasileio Pappas and Li Zhang

Name: Authors: Xiaoqiao Meng, Vasileio Pappas and Li Zhang
Uploaded: 2017-08-12T21:00:37+00:00
Duration: PTM26S1
Channel: Derrick Patrick
Description: Authors: Xiaoqiao Meng, Vasileio Pappas and Li Zhang

Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement
Authors: Xiaoqiao Meng, Vasileio Pappas and Li Zhang Presented by : Jinpeng Liu

Introduction Modern virtualization Data Centers are hosting a wide spectrum of applications. Bandwidth usage between VMs is rapidly growing Scalability of data center networks becomes a concern. Techniques suggested: Rich connectivity at the edge of the network Dynamic routing protocols … Current solutions require changes in the network architecture and ourint protocols.

Introduction This paper tackles the issue from a different perspective. Current VM placement strategy has issues: Placement is decided by capacity planning tools: CPU Memory Power consumption Ignore Consumption of Network resources VM pairs with heavy traffic could be placed on hosts with large network cost. How often does this pattern happen in practice?

Background DC traffic pattern Data collected:
DWH hosted by IBM Global Services - server resource utilization from hundreds of server farms. A server cluster with hundreds of VMs - aggregate traffic of 68 VMs Traces were collected for 10 days.

Background DC traffic pattern
Uneven distribution of traffic volumes from VMs While 80% of VMs have average rate less than 800 KBytes/min, 4% of them have a rate 10x higher No larger than Why 800? Heatmap shows inter-VM traffic rate varies significantly

Background DC traffic pattern Stable per-VM traffic at large timescale
b. Sd of the traffic rate is no more than two times of the mean c. Stable interval means the rate within that interval is no more than one SD away from the mean in the entire interval. No less than 80% d. Long tail indicates at two large timescales, a large fraction of VM’s traffic is relatively constant 82%

Background DC traffic pattern
Weak correlation between traffic rate and latency Based on our measurement on the traffic rate and end-to-end latency among 68 VMs in a production cluster, visually no correlation, -0.32 weak correlation between the matrices

Background Tree architecture Cons: Topology scaling limitations
By scaling up each individual switch Core tier only accommodate 8 switches Address space limitation Higher server over-subscription Current data centers follow to a great extend a common network architecture . 3 tier. Each server connects to 1(2) access switch Each access switch connects to 1(2) aggregation tier Each agg switch connects with multiple core switch Cisco Data Center Infrastructure 2.5 Design Guide

Background VL2 Architecture Share many features with Tree
Complete Bipartite Graph VL2 Architecture Share many features with Tree Core tier and aggregation tier form a Clos topology. Valiant load balancing Randomly selected core switches as intermediate destination Location independent Bipartite graph is a graph whose vertices can be divided into two disjoint sets (U and V) such that every edge connects a vertex in U to one in V . If the edges connect every vertex in U with all vertices in V, it is a complete bipartite graph Valiant loading balancing ensure that load is balanced independently of destination of the traffic flows. Access first randomly select aggr switches then core switches and then forward to the destination by MAC. sender receiver

Background PortLand Architecture (Fat-Tree)
Require all switches are identical i.e., same number of ports Build with concept of pods A collection of access and aggregation switches form A Clos topology Pods and core switches form a second Clos topology Evenly distributing the up-links Pros: Full Bisection BW: 1:1 Oversubscription ratio Low Cost: commodity switches, low power/cooling Cons: Scalability: size of the network depends on ports per switch. For 48 ports => max 27,648 host

Background Modular Data Center (MDC) Thousands of servers
Interconnected by switches Packed into a shipping-container Sun, HP, IBM DELL … Higher degree of mobility Higher system and power density Lower cost (cooling and manufacturing)

Background BCube Architectures Purpose: BCube: Data-intense computing
Bandwidth-intensive communication among MDC servers Low-end COTS mini-switches Graceful performance degradation BCube: Server-centric Servers are part of the network Servers not directly connected Defined recursively

Background BCube: At level 0 𝐵𝐶𝑢𝑏𝑒 0 consists n servers connected by 1 n-port swithces A 𝐵𝐶𝑢𝑏𝑒 𝑘 is constructed from n 𝐵𝐶𝑢𝑏𝑒 𝑘−1 and 𝑛 𝑘 n-port switches C k = 1, n = 4 𝐵𝐶𝑢𝑏𝑒 0 −4 servers by 1 port 𝐵𝐶𝑢𝑏𝑒 1 −4 𝐵𝐶𝑢𝑏𝑒 0 by 4 port

Background BCube: Server Label based on the locations in the BCube structure Severs connected at ith level if their label differs at that level Label Level 0 Level 1 2.4 4th 2nd 1.3 1st 3rd 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2

Background BCube: Server Label based on the locations in the BCube structure Severs connected at ith level if their label differs at that level Label Level 0 Level 1 2.4 4th 2nd 1.3 3rd 1st 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2

Background BCube: Server Label based on the locations in the BCube structure Severs connected at ith level if their label differs at that level Label Level 0 Level 1 2.4 4th 2nd 1.4 1st 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2

Background BCube: Server Label based on the locations in the BCube structure Severs connected at ith level if their label differs at that level Label Level 0 Level 1 2.4 4th 2nd 1.4 4st 1st 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2

Background BCube: Server Label based on the locations in the BCube structure Severs connected at ith level if their label differs at that level Label Level 0 Level 1 2.4 4th 2nd 1.4 4st 1st Impact of the 4 Arc will be studied. 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2

Virtual Machine Placement Problem
How to place VMs on a set of Physical hosts ? Assumptions: existing CPU/memory based capacity tools have decided the number of VMs that a host can accommodate. use a slot to refer to one CPU/memory allocation on a host. A host can have multiple slots A VM can take any un-occupied slot Static and single-path routing All external traffic are routed through a common gateway switch Scenario: Place n VMs in n Slots

Traffic-aware VM Placement Problem (TVMPP)
𝐶 𝑖𝑗 : fixed communication cost from slot i to j. 𝐷 𝑖𝑗 : traffic rate from VM i to j. 𝑒 𝑖 : external traffic rate for 𝑉𝑀 𝑖 . 𝑔 𝑖 : communication cost between 𝑉𝑀 𝑖 and the gateway 𝜋: 1,…, 𝑛 [1, …, 𝑛] : permutation function for assigning n VMs to n slots The TVMPP is defined as finding a 𝝅 to minimize (1)

The meaning of the function depends on the definition of 𝐶 𝑖𝑗 . 𝐶 𝑖𝑗 is defined as # of switches on the routing path from VM i to j. (1) is the sum of traffic rate perceived by each switches If (1) is normalized by the sum of VM-to-VM bandwidth demand, it is equivalent to the average number of switches that a data unit traverses. If assuming equal delay on every switch, (1) can be interpreted as the average latency for a data unit traversing the network. Optimizing TVMPP is equivalent to minimizing traffic latency … How to explain ???

How about # of slots > # of VMs? Add dummy VMs: no traffic Not affect VM placement TVMPP can be simplified by ignoring which is relatively constant. Cost between every host and gateway is the same (WHY??) Cost between every host and gateway is the same ???

Offline Mode Data center operators estimate traffic matrix Collect network topology Solve TVMPP to decide which host(s) to create the VMs Online Mode Re-solve TVMPP periodically Reshuffle VMs placement when needed

TVMPP Complexity Matrix notation for (1):
D is traffic rates matrix C is communication cost matrix  is the set of permutation matrices A Quadratic Assignment Problem (QAP): There are a set of n facilities and a set of n locations. For each pair of locations, a distance is specified and for each pair of facilities a weight or flow is specified (e.g., the amount of supplies transported between the two facilities). The problem is to assign all facilities to different locations with the goal of minimizing the sum of the distances multiplied by the corresponding flows.

TVMPP Complexity Matrix notation for (1):
D is traffic rates matrix C is communication cost matrix  is the set of permutation matrices A Quadratic Assignment Problem (QAP): There are a set of n facilities and a set of n locations. For each pair of locations, a distance is specified and for each pair of facilities a weight or flow is specified (e.g., the amount of supplies transported between the two facilities). The problem is to assign all facilities to different locations with the goal of minimizing the sum of the distances multiplied by the corresponding flows. NP-Hard

TVMPP Complexity TVMPP problem belongs to the general QAP problem.
No existing exact solution can be scale to the size of current data centers.

Algorithms: Cluster-and-Cut
Proposition 1: Suppose 0  𝑎 1  𝑎 2 … 𝑎 𝑛 and 0  𝑏 1  𝑏 2 … 𝑏 𝑛 , the following inequalities hold for any permuation 𝜋: 1,…, 𝑛 . Design principle 1: Solving TVMPP is equivalent to finding a mapping of VMs to slots such that VM pairs with heavy mutual traffic be assigned to slot pairs with low-cost connection. rearrangement inequality TVMPP is to sum up all multiplications between Cij and corresponding Dij. ????

Design principle 2: Divide-and-Conquer Partition VMs into VM-clusters / Slots into slot-clusters Map each VM-cluster to a slot-cluster by TVMPP Then map VMs to slots in each mapped VM and slot cluster by TVMPP

Design principle 2: Divide-and-Conquer Partition VMs into VM-clusters / Slots into slot-clusters Cluster VMs Classical min-cut graph algorithm. VM pairs with high mutual traffic rate are within the same VM-cluster Consistent with previously finding that traffic generated from a small group of VMs comprise a large fraction of the total traffic. Approximation ratio = 𝑘−1 𝑘 𝑛

Design principle 2: Divide-and-Conquer Partition VMs into VM-clusters / Slots into slot-clusters Cluster Slots Classical clustering algorithm. Slot pairs with low-cost connections belong to the same slot-cluster. Networks contains many groups of densely connected end hosts. Approximation ratio= 2

Design principle 2:

Design principle 2: 𝑂(𝑛𝑘) 𝑶( 𝒏 𝟒 ) 𝑂( 𝑛 4 ) How about recursive?

Impact of Network Architectures & Traffic Patterns
Performance Gains are affected by: Cost matrices (C) Tree VL2 Fat-Tree BCube Traffic matrices (D) Global traffic model Partitioned traffic model Through the problem formulation, we can notice that the traffic and cost matrices are the two determining factors for optimizing the VM placement. Consequently, we seek to answer a fundamental question: given that traffic patterns and network architectures in data centers have significant differences, how the performance gains due to optimal VM placement are affected?

Define Tree Cost Matrices C: 𝑝 0 :𝑇ℎ𝑒 𝑓𝑎𝑛−𝑜𝑢𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑎𝑐𝑐𝑒𝑠𝑠 𝑠𝑤𝑖𝑡𝑐ℎ𝑒𝑠 𝑝 1 :𝑇ℎ𝑒 𝑓𝑎𝑛−𝑜𝑢𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑎𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑖𝑜𝑛 𝑠𝑤𝑖𝑡𝑐ℎ𝑒𝑠 n x n

Define Tree Cost Matrices C: 𝑝 0 :𝑇ℎ𝑒 𝑓𝑎𝑛−𝑜𝑢𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑎𝑐𝑐𝑒𝑠𝑠 𝑠𝑤𝑖𝑡𝑐ℎ𝑒𝑠 𝑝 1 :𝑇ℎ𝑒 𝑓𝑎𝑛−𝑜𝑢𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑎𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑖𝑜𝑛 𝑠𝑤𝑖𝑡𝑐ℎ𝑒𝑠 n x n 1

Define VL2 Cost Matrices C: 𝑝 0 :𝑇ℎ𝑒 𝑓𝑎𝑛−𝑜𝑢𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑎𝑐𝑐𝑒𝑠𝑠 𝑠𝑤𝑖𝑡𝑐ℎ𝑒𝑠 𝑝 1 :𝑇ℎ𝑒 𝑓𝑎𝑛−𝑜𝑢𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑎𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑖𝑜𝑛 𝑠𝑤𝑖𝑡𝑐ℎ𝑒𝑠 n x n 5

Define Fat-Tree Cost Matrices C: 𝑘:𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑟𝑡𝑠 𝑜𝑛 𝑒𝑎𝑐ℎ 𝑠𝑤𝑖𝑡𝑐ℎ n x n 3

Define Fat-Tree Cost Matrices C: 𝑘:𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑟𝑡𝑠 𝑜𝑛 𝑒𝑎𝑐ℎ 𝑠𝑤𝑖𝑡𝑐ℎ n x n 5

Define BCube Cost Matrices C: Hamming distance of server address n x n

Define BCube Cost Matrices C: Hamming distance of server address n x n Distance=1

Define BCube Cost Matrices C: Hamming distance of server address n x n Distance=2

Global Traffic Model: Each VM communicates with every other at a constant rate Complexity: 𝑂( 𝑛 3 ) Solved by Hungarian algorithm. : optimized objective value Random placement objective value

Global Traffic Model: @0 variance - 𝑆 𝑜𝑝𝑡 = 𝑆 𝑟𝑎𝑛𝑑 E.g., map-reduce type workload Otherwise 𝑆 𝑜𝑝𝑡 ≤ 𝑆 𝑟𝑎𝑛𝑑 E.g., Multi-tiered web application Gaps indicate improvement space for random placement. BCube has largest improvement space. Benefit in terms of scalability VL2 has the smallest improvement space 1024 VMs 4 port switchs then Bcube 4-level intermediate Switches

Partitioned Traffic Model: VMs form isolated partitions, and only VMs within the same partition communicate with each other Pairwise traffic rate following a normal distribution GLB: lower bound for the 𝑆 𝑜𝑝𝑡 Gaps indicate the performance improvement potential 𝑆 𝑟𝑎𝑛𝑑 has improvement space under different traffic variance BCube has larger improvement potential

At size 31 VL2 overlap with random Smaller partition size has higher improvement potential More performance improvement potential in a system with different partition size

Greater benefits under such conditions: increased traffic variance Increased number of partitions Multi-layer architecture

Evaluation Compare Cluster-and Cut to other QAP solving algorithms:
Local Optimal Pairwise Interchange (LOPI) Simulated Annealing (SA)

Local Optimal Pairwise Interchange (LOPI) Simulated Annealing (SA) 10% smaller

Local Optimal Pairwise Interchange (LOPI) Simulated Annealing (SA) 50% less

Summary Used traffic-aware virtual machine placement to improve network scalability Formulated the VM placement as an NP-Hard optimization problem. Proposed Cluster-and-Cut algorithm as a efficient solution Evaluated the potential performance on different traffic patterns and network architectures.

Thank you !

Authors: Xiaoqiao Meng, Vasileio Pappas and Li Zhang

Similar presentations

Presentation on theme: "Authors: Xiaoqiao Meng, Vasileio Pappas and Li Zhang"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Authors: Xiaoqiao Meng, Vasileio Pappas and Li Zhang

Similar presentations

Presentation on theme: "Authors: Xiaoqiao Meng, Vasileio Pappas and Li Zhang"— Presentation transcript:

Similar presentations

About project

Feedback