CloudMirror: Application-Driven Bandwidth Guarantees in Datacenters

CloudMirror: Application-Driven Bandwidth Guarantees in Datacenters
JK Lee, Yoshio Turner, Myungjin Lee1, Lucian Popa2, Sujata Banerjee, Joon-Myung Kang, Puneet Sharma HP Labs, 1University of Edinburgh, 2Databricks Presented by Jack Clark

Overview What Problem Does CloudMirror Try to Solve?
How Do Other Solutions Solve the Problem? How Does CloudMirror Work? How Well Does CloudMirror Solve the Problem?

The Problem

Clients want predictable performance from their applications:
Consistent throughput Bounded tail latencies Modern cloud application performance is highly dependent on network performance

Underpovisioning => Contention => Unpredictable Performance
Cloud providers want to squeeze as many tenants onto their infrastructure as possible, which leads to under provisioning of network resources Underpovisioning => Contention => Unpredictable Performance

Why is Bandwidth So Important?
If you have more data to send than can fit in the pipe, you are going to have to wait This intuition is backed up by Parley, which demonstrates that adequate bandwidth is critical for low tail latencies

Idea Provide bandwidth guarantees to clients:
Clients are happy because they get more predictable performance Cloud providers are happy because clients are more confident about moving to the cloud (and they get a new dimension for billing)

Existing Solutions

Pipe Model Specify bandwidth requirement between every pair of VMs
This makes efficient VM placement slow ~O(n3) Also makes it difficult for client to specify requirements Must provision peak VM bandwidth for each VM

Hose Model Problem #1 - Hose models fails to guarantee bandwidth in the case of congestion! TCP-like fair allocation would split the bandwidth 300:200 instead of 400:100

Problem #2: Hose model would provision 2x actual bandwidth required on L2

CloudMirror Tenant Application Graph (TAG) for tenants to specify the bandwidth guarantees they desire VM placement algorithm for the cloud provider to efficiently fill tenant requests

Tenant Application Graph (TAG)
TAG Model allows users to specify their desired bandwidth guarantees in terms of the communication structure of their application

VM Placement Algorithm
Goal: Deploy as many tenants as possible onto the topology as possible while maintaining bandwidth guarantees This is NP-hard Insights/heuristics: 1. Can save core bandwidth if we place more than ½ of VMs from communicating tiers together (collocate function) 2. If no bandwidth saving is possible, collocate high and low bandwidth tiers (balance function)

Surely This is Slow... Time complexity is O(T2): Scales with the number of tiers rather than the number of VMs Runs in < 1 sec on bing.com data

High Availability Worst Case Survivability (WCS) = fraction of VMs in a tier that remain alive during the failure of a subtree Huge assumption is made that this should be at the server level! They claim that because core switches are usually fault tolerant, it is ok to only ensure WCS amongst individual servers However, most outages in a modern data center are not because of switch failures, but rather due to operator error e.g. bad configuration

Evaluation Use bing.com workload - 3 level tree topology, 2048 hosts, 25 VM slots per host For a given number of requests, how much bandwidth does CloudMirror require compared to rival solutions?

Evaluation 2. For a given amount of bandwidth, how many tenant requests can CloudMirror accept compared to rival solutions?

Criticisms and Future Work
How to integrate this with other resources such as CPU and Memory? Would this actually work for HA? WCS assumes that failures will be simple mechanical failures of switches. What if a bad network configuration causes a problem? It would have been nice to see them actually run applications on their system and verify that the bandwidth guarantees are honoured Pricing Model

Questions?

CloudMirror: Application-Driven Bandwidth Guarantees in Datacenters

Similar presentations

Presentation on theme: "CloudMirror: Application-Driven Bandwidth Guarantees in Datacenters"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CloudMirror: Application-Driven Bandwidth Guarantees in Datacenters

Similar presentations

Presentation on theme: "CloudMirror: Application-Driven Bandwidth Guarantees in Datacenters"— Presentation transcript:

Similar presentations

About project

Feedback