Towards Predictable Datacenter Networks

Slides:

Advertisements

Similar presentations

CCAMP WG, IETF 80th, Prague, Czech Republic draft-gonzalezdedios-subwavelength-framework-00 Framework for GMPLS and path computation support of sub-wavelength.

Advertisements

Towards Predictable Datacenter Networks

Big Data + SDN SDN Abstractions. The Story Thus Far Different types of traffic in clusters Background Traffic – Bulk transfers – Control messages Active.

Traffic Engineering with Forward Fault Correction (FFC)

Sharing Cloud Networks Lucian Popa, Gautam Kumar, Mosharaf Chowdhury Arvind Krishnamurthy, Sylvia Ratnasamy, Ion Stoica UC Berkeley.

Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in Wireless Ad Hoc Networks By C. K. Toh.

A Flexible Model for Resource Management in Virtual Private Networks Presenter: Huang, Rigao Kang, Yuefang.

Reciprocal Resource Fairness: Towards Cooperative Multiple-Resource Fair Sharing in IaaS Clouds School of Computer Engineering Nanyang Technological University,

Ashish Gupta Under Guidance of Prof. B.N. Jain Department of Computer Science and Engineering Advanced Networking Laboratory.

GridFlow: Workflow Management for Grid Computing Kavita Shinde.

Jan 13, 2006Lahore University of Management Sciences1 Protection Routing in an MPLS Network using Bandwidth Sharing with Primary Paths Zartash Afzal Uzmi.

An Optimization Problem in Adaptive Virtual Environments Ananth I. Sundararaj Manan Sanghi Jack R. Lange Peter A. Dinda Prescience Lab Department of Computer.

Comparison of MSTP and (G)ELS Benchmarking Carrier Ethernet Technologies Workshop Session AI.1: Scientific and Technical Results Krakow, Poland April 30,

S. Suri, M, Waldvogel, P. Warkhede CS University of Washington Profile-Based Routing: A New Framework for MPLS Traffic Engineering.

A Scalable, Commodity Data Center Network Architecture.

1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

TitleEfficient Timing Channel Protection for On-Chip Networks Yao Wang and G. Edward Suh Cornell University.

ElasticTree: Saving Energy in Data Center Networks 許倫愷 2013/5/28.

Cost-Performance Tradeoffs in MPLS and IP Routing Selma Yilmaz Ibrahim Matta Boston University.

Naixue GSU Slide 1 ICVCI’09 Oct. 22, 2009 A Multi-Cloud Computing Scheme for Sharing Computing Resources to Satisfy Local Cloud User Requirements.

Network Sharing Issues Lecture 15 Aditya Akella. Is this the biggest problem in cloud resource allocation? Why? Why not? How does the problem differ wrt.

DaVinci: Dynamically Adaptive Virtual Networks for a Customized Internet Jennifer Rexford Princeton University With Jiayue He, Rui Zhang-Shen, Ying Li,

CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.

Integrated Dynamic IP and Wavelength Routing in IP over WDM Networks Murali Kodialam and T. V. Lakshman Bell Laboratories Lucent Technologies IEEE INFOCOM.

E-STAB: Energy-Efficient Scheduling for Cloud Computing Applications with Traffic Load Balancing Dzmitry KliazovichUniversity of Luxembourg, Luxembourg.

Network Aware Resource Allocation in Distributed Clouds.

The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.

QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.

1 Heterogeneity in Multi-Hop Wireless Networks Nitin H. Vaidya University of Illinois at Urbana-Champaign © 2003 Vaidya.

Challenges towards Elastic Power Management in Internet Data Center.

Group 3 Sandeep Chinni Arif Khan Venkat Rajiv. Delay Tolerant Networks Path from source to destination is not present at any single point in time. Combining.

TECHNION – Israel Institute of Technology Department of Electrical Engineering The Computer Network Laboratory Crankback Prediction in ATM According to.

The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.

Jean Walrand – NSF – April 24-25, 20031/5 Economic Mechanisms Grand Challenge: Design economic mechanisms that provide incentives for increasing the utility.

DaVinci: Dynamically Adaptive Virtual Networks for a Customized Internet Jiayue He, Rui Zhang-Shen, Ying Li, Cheng-Yen Lee, Jennifer Rexford, and Mung.

Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can

Symbiotic Routing in Future Data Centers Hussam Abu-Libdeh Paolo Costa Antony Rowstron Greg O’Shea Austin Donnelly MICROSOFT RESEARCH Presented By Deng.

Unit III Bandwidth Utilization: Multiplexing and Spectrum Spreading In practical life the bandwidth available of links is limited. The proper utilization.

Data Center Load Balancing T Seminar Kristian Hartikainen Aalto University, Helsinki, Finland

Use Cases for High Bandwidth Query and Control of Core Networks Greg Bernstein, Grotto Networking Young Lee, Huawei draft-bernstein-alto-large-bandwidth-cases-00.txt.

Point-to-point Architecture topics for discussion Remote I/O as a data access scenario Remote I/O is a scenario that, for the first time, puts the WAN.

Towards Predictable Data Centers Why Johnny can’t use the cloud and what we can do about it? Hitesh Ballani, Paolo Costa, Thomas Karagiannis, Greg O’Shea.

On Exploiting Diversity and Spatial Reuse in Relay-enabled Wireless Networks Karthikeyan Sundaresan, and Sampath Rangarajan Broadband and Mobile Networking,

1 Traffic Engineering By Kavitha Ganapa. 2 Introduction Traffic engineering is concerned with the issue of performance evaluation and optimization of.

6.888 Lecture 6: Network Performance Isolation Mohammad Alizadeh Spring

R2C2: A Network Stack for Rack-scale Computers Paolo Costa, Hitesh Ballani, Kaveh Razavi, Ian Kash Microsoft Research Cambridge EECS 582 – W161.

Chen Qian, Xin Li University of Kentucky

Yiting Xia, T. S. Eugene Ng Rice University

Lab A: Planning an Installation

Use Case for Distributed Data Center in SUPA

Architecture and Algorithms for an IEEE 802

Performance Study of Congestion Price Based Adaptive Service

Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng

Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.

Authors: Sajjad Rizvi, Xi Li, Bernard Wong, Fiodar Kazhamiaka

Improving Datacenter Performance and Robustness with Multipath TCP

A Study of Group-Tree Matching in Large Scale Group Communications

NOX: Towards an Operating System for Networks

Grid Computing.

Globa Larysa prof, Dr.; Skulysh Mariia, PhD; Sulima Svitlana

Routing and Logistics with TransCAD

Provision of Multimedia Services in based Networks

Multi-hop Coflow Routing and Scheduling in Data Centers

CloudMirror: Application-Driven Bandwidth Guarantees in Datacenters

An Optimization Problem in Adaptive Virtual Environments

Requirements of Computing in Network

Presentation transcript:

Towards Predictable Datacenter Networks Hitesh Ballani, Paolo Costa, Thomas Karagiannis, Ant Rowstron SIGCOMM 2011 Presenter: Lili Sun 2019/11/16

Outline Motivation and Goals Virtual Network Abstractions Oktopus Evaluation Conclusion Discussion Clues

Production Datacenter Backgrounds Datacenter Cloud datacenter Production datacenter Interface computing resources storage resources Cloud Datacenter Production Datacenter Virtual Network (VMs) Physical Network Storage Resource Computing Resource Interface Provider Tenant

Motivation and Goals Motivation: Network performance variability Cloud datacenter (system load and VM placement) Production datacenter (variable network bandwidth) Challenges application performance unstable tenant costs unpredictable provider revenue loss Goals Guaranteed application performance Tenants' cost Providers' revenue

Virtual Network Abstractions Virtual cluster (VC) Virtual oversubscribed cluster (VOC) Design goals Tenant suitability: An intuitive way about network performance Provider flexibility: multiplex many virtual networks on their physical network

Virtual cluster Tenant request: <N, B> All-to-all traffic patterns Suitable for data-intensive applications

Virtual oversubscribed cluster Tenant request: <N,B,S,O> Local communication patterns Suitable for the apps have special communications patterns.

Oktopus Support tenants opt for Two main components Network manager Virtual cluster Virtual oversubscribed cluster No virtual cluster Two main components Management plane (request & account for network resources and maintain bandwidth reservations) Data plane (enforce the bandwidth available) Network manager Meet the bandwidth demands Maximize the number of tenants

Cluster Allocation A virtual cluster request r : <N,B> Topology: tree-like physical network Bandwidth required on link : L 200Mbps 100Mbps 100Mbps 100Mbps 100Mbps 100Mbps 100Mbps

Allocation Algorithm Allocated VMs to a sub-tree (a machine, a rack, a pod) Number of empty VM slots in the sub-tree Residual bandwidth on the physical link For a machine For the same level Choose the sub-tree with the least amount of residual bandwidth For the different levels Start from the lowest level Physical machine < racks <pods (level) Goals a greater outbound bandwidth available allow accommodate more future tenants.

Oversubscribed Cluster Allocation An oversubscribed cluster request: <N,S,B,O> The total bandwidth required by group i on link : The bandwidth to be reserved on link L for request r is the sum across all the groups

Allocation Algorithm Individual group is similar to a virtual cluster Reuse the cluster allocation algorithm Conditional bandwidth needed for jth group of request r on link L : The bandwidth required by groups [1,…,i] on L: Allocate VMs to sub-tree v:

Enforcing Virtual Network Rate limiting mechanism Traditional ways: bandwidth reservation at switches Oktopus: endhost-based rate enforcement Design Enforcement module: measures traffic rate to other VMs Controller VM: calculates the max-min fair share Enforcement module: uses per-destination-VM limiter to enforce them Advantage Calculating at Controller VM for each tenant reduce the control rate Enforcement modules enable distributed rate limits Tenant-specific computation reduces scale of the problem compute rates for each virtual network VM1 EM1 Controller VM EM (Sends traffic rate) (Per-destination-VM limiter) (Measures traffic rate) (Calculates traffic rate) …… Minimal rate …… Maximal rate (Max-min fair share) VM i EM i Enough BW …… Fair BW (Returns traffic rate) (Per-destination-VM limiter) (Measures traffic rate)

Enforcing Virtual Network Tenants without virtual network Two-level priorities Traffic from tenants with a virtual network is high level Other traffic is low level (fair share) Unused capacity in a VM with a virtual network Weighted sharing mechanisms Unused capacity is distributed among all tenants

Design Discussion NM and Routing Failures assumes that the datacenter has a simple tree topology For the topologies with limited path diversity For the even richer network topologies Multiple physical links can be treated as a single aggregate link NM can control datacenter routing to build tenant-specific trees Failures For failures of physical links and switches, our allocation algorithms can be extended to determine the tenant VMs that need to be migrated, and reallocated

Evaluation Simulation setup Virtual network request Simulation breadth Tc : minimum compute time for the job Tn: the time for last flow to finish T = max (Tc, Tn): the completion time Tn < Tc: to minimize the tenants cost Baseline: the purely VM-based resource allocation locality-aware allocation algorithm A flow’s bandwidth is calculated according to max-min fairness Virtual network request <N> can be expressed as <N,B> or <N,B,S,O> Simulation breadth The entire space for most parameters of interest in today’s datacenters tenant bandwidth requirements, datacenter load, and physical topology oversubscription

Production Datacenter Experiment Job completion time

Production Datacenter Experiment Utilization the allocation of VMs does not account for network demands

Production Datacenter Experiment Diverse communication patterns. each tenant VM requires a different bandwidth

Cloud Datacenter Experiment Rejected Requests tenant dynamics with requests arriving over time admission control scheme

Cloud Datacenter Experiment Tenant costs and provider revenue Tenant will be charged based on the time they occupy their VMs

Cloud Datacenter Experiment Charging for bandwidth virtual network abstractions allow explicitly charging for network bandwidth <N,B> for time T, Tenant cost: or

Results and conclusion Virtual network abstractions practical, can be efficiently implemented and provide significant benefits provide a simple way of information exchange between tenants and providers Tenant expose network requirement and pick the trade-off between the performance of applications and cost Provider account for the network resources and improve their revenue

Discussion clues Actual bandwidth requirement Failure of tenant VMs Description of network bandwidth resources Network security Compare to the physical switch, virtual switch has a weaker monitoring capability, so how to ensure the network security? Network security How to solve the problem of description of network bandwidth resources? There is no datasets describing job bandwidth requirements. Description of network bandwidth resources For many tenant, they don't know how much bandwidth they need exactly for all kinds of applications, so how to deal with this problem? Different from the computing and storage resources, the use of bandwidth for one tenant will impact other tenants because of the limited total bandwidth resources. So besides the pricing model, how to make sure that the tenant’s bandwidth requirement is appropriate (not too much or too little) (for example the monitor system to provide the actual demands to tenants) Actual bandwidth requirement For the oversubscribed network cluster, if a tenant VM fails, does the failed VM or all tenant VMs in the intra-group need to be migrated and be reallocated? Because the communication between reallocated VM and other VMs will increases the bandwidth from the underlying physical infrastructure. Failure of tenant VMs

Thank you!