Towards Predictable Datacenter Networks Hitesh Ballani, Paolo Costa, Thomas Karagiannis, Ant Rowstron SIGCOMM 2011 Presenter: Lili Sun 2019/11/16
Outline Motivation and Goals Virtual Network Abstractions Oktopus Evaluation Conclusion Discussion Clues
Production Datacenter Backgrounds Datacenter Cloud datacenter Production datacenter Interface computing resources storage resources Cloud Datacenter Production Datacenter Virtual Network (VMs) Physical Network Storage Resource Computing Resource Interface Provider Tenant
Motivation and Goals Motivation: Network performance variability Cloud datacenter (system load and VM placement) Production datacenter (variable network bandwidth) Challenges application performance unstable tenant costs unpredictable provider revenue loss Goals Guaranteed application performance Tenants' cost Providers' revenue
Virtual Network Abstractions Virtual cluster (VC) Virtual oversubscribed cluster (VOC) Design goals Tenant suitability: An intuitive way about network performance Provider flexibility: multiplex many virtual networks on their physical network
Virtual cluster Tenant request: <N, B> All-to-all traffic patterns Suitable for data-intensive applications
Virtual oversubscribed cluster Tenant request: <N,B,S,O> Local communication patterns Suitable for the apps have special communications patterns.
Oktopus Support tenants opt for Two main components Network manager Virtual cluster Virtual oversubscribed cluster No virtual cluster Two main components Management plane (request & account for network resources and maintain bandwidth reservations) Data plane (enforce the bandwidth available) Network manager Meet the bandwidth demands Maximize the number of tenants
Cluster Allocation A virtual cluster request r : <N,B> Topology: tree-like physical network Bandwidth required on link : L 200Mbps 100Mbps 100Mbps 100Mbps 100Mbps 100Mbps 100Mbps
Allocation Algorithm Allocated VMs to a sub-tree (a machine, a rack, a pod) Number of empty VM slots in the sub-tree Residual bandwidth on the physical link For a machine For the same level Choose the sub-tree with the least amount of residual bandwidth For the different levels Start from the lowest level Physical machine < racks <pods (level) Goals a greater outbound bandwidth available allow accommodate more future tenants.
Oversubscribed Cluster Allocation An oversubscribed cluster request: <N,S,B,O> The total bandwidth required by group i on link : The bandwidth to be reserved on link L for request r is the sum across all the groups
Allocation Algorithm Individual group is similar to a virtual cluster Reuse the cluster allocation algorithm Conditional bandwidth needed for jth group of request r on link L : The bandwidth required by groups [1,…,i] on L: Allocate VMs to sub-tree v:
Enforcing Virtual Network Rate limiting mechanism Traditional ways: bandwidth reservation at switches Oktopus: endhost-based rate enforcement Design Enforcement module: measures traffic rate to other VMs Controller VM: calculates the max-min fair share Enforcement module: uses per-destination-VM limiter to enforce them Advantage Calculating at Controller VM for each tenant reduce the control rate Enforcement modules enable distributed rate limits Tenant-specific computation reduces scale of the problem compute rates for each virtual network VM1 EM1 Controller VM EM (Sends traffic rate) (Per-destination-VM limiter) (Measures traffic rate) (Calculates traffic rate) …… Minimal rate …… Maximal rate (Max-min fair share) VM i EM i Enough BW …… Fair BW (Returns traffic rate) (Per-destination-VM limiter) (Measures traffic rate)
Enforcing Virtual Network Tenants without virtual network Two-level priorities Traffic from tenants with a virtual network is high level Other traffic is low level (fair share) Unused capacity in a VM with a virtual network Weighted sharing mechanisms Unused capacity is distributed among all tenants
Design Discussion NM and Routing Failures assumes that the datacenter has a simple tree topology For the topologies with limited path diversity For the even richer network topologies Multiple physical links can be treated as a single aggregate link NM can control datacenter routing to build tenant-specific trees Failures For failures of physical links and switches, our allocation algorithms can be extended to determine the tenant VMs that need to be migrated, and reallocated
Evaluation Simulation setup Virtual network request Simulation breadth Tc : minimum compute time for the job Tn: the time for last flow to finish T = max (Tc, Tn): the completion time Tn < Tc: to minimize the tenants cost Baseline: the purely VM-based resource allocation locality-aware allocation algorithm A flow’s bandwidth is calculated according to max-min fairness Virtual network request <N> can be expressed as <N,B> or <N,B,S,O> Simulation breadth The entire space for most parameters of interest in today’s datacenters tenant bandwidth requirements, datacenter load, and physical topology oversubscription
Production Datacenter Experiment Job completion time
Production Datacenter Experiment Utilization the allocation of VMs does not account for network demands
Production Datacenter Experiment Diverse communication patterns. each tenant VM requires a different bandwidth
Cloud Datacenter Experiment Rejected Requests tenant dynamics with requests arriving over time admission control scheme
Cloud Datacenter Experiment Tenant costs and provider revenue Tenant will be charged based on the time they occupy their VMs
Cloud Datacenter Experiment Charging for bandwidth virtual network abstractions allow explicitly charging for network bandwidth <N,B> for time T, Tenant cost: or
Results and conclusion Virtual network abstractions practical, can be efficiently implemented and provide significant benefits provide a simple way of information exchange between tenants and providers Tenant expose network requirement and pick the trade-off between the performance of applications and cost Provider account for the network resources and improve their revenue
Discussion clues Actual bandwidth requirement Failure of tenant VMs Description of network bandwidth resources Network security Compare to the physical switch, virtual switch has a weaker monitoring capability, so how to ensure the network security? Network security How to solve the problem of description of network bandwidth resources? There is no datasets describing job bandwidth requirements. Description of network bandwidth resources For many tenant, they don't know how much bandwidth they need exactly for all kinds of applications, so how to deal with this problem? Different from the computing and storage resources, the use of bandwidth for one tenant will impact other tenants because of the limited total bandwidth resources. So besides the pricing model, how to make sure that the tenant’s bandwidth requirement is appropriate (not too much or too little) (for example the monitor system to provide the actual demands to tenants) Actual bandwidth requirement For the oversubscribed network cluster, if a tenant VM fails, does the failed VM or all tenant VMs in the intra-group need to be migrated and be reallocated? Because the communication between reallocated VM and other VMs will increases the bandwidth from the underlying physical infrastructure. Failure of tenant VMs
Thank you!