BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers Chuanxiong Guo1, Guohan Lu1, Dan Li1, Haitao Wu1, Xuan Zhang2,

Slides:



Advertisements
Similar presentations
Data Center Networking with Multipath TCP
Advertisements

Symbiotic Routing in Future Data Centers Hussam Abu-Libdeh, Paolo Costa, Antony Rowstron, Greg OShea, Austin Donnelly Cornell University Microsoft Research.
Chuanxiong Guo, Haitao Wu, Kun Tan, Lei Shi, Yongguang Zhang, Songwu Lu Microsoft Research Asia, Tsinghua University, UCLA 1 DCell: A Scalable and Fault-Tolerant.
PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric
BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers Chuanxiong Guo1, Guohan Lu1, Dan Li1, Haitao Wu1, Xuan Zhang2,
Cs/ee 143 Communication Networks Chapter 6 Internetworking Text: Walrand & Parekh, 2010 Steven Low CMS, EE, Caltech.
1 © 2004, Cisco Systems, Inc. All rights reserved. Chapter 3 Ethernet Technologies/ Ethernet Switching/ TCP/IP Protocol Suite and IP Addressing.
PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric. Presented by: Vinuthna Nalluri Shiva Srivastava.
Data Center Fabrics. Forwarding Today Layer 3 approach: – Assign IP addresses to hosts hierarchically based on their directly connected switch. – Use.
1 Exploring Efficient and Scalable Multicast Routing in Future Data Center Networks Dan Li, Jiangwei Yu, Junbiao Yu, Jianping Wu Tsinghua University Presented.
Data Center Network Topologies: VL2 (Virtual Layer 2) Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems.
Module 8: Concepts of a Network Load Balancing Cluster
Datacenter Network Topologies
Chuanxiong Guo, Haitao Wu, Kun Tan,
PortLand Presented by Muhammad Sadeeq and Ling Su.
ROUTING PROTOCOL IGRP. REVIEW 4 Purpose of Router –determine best path to destination –pass the frames to the destination 4 Protocols –routed - used by.
Data Center Network Topologies: FatTree
ProActive Routing In Scalable Data Centers with PARIS Joint work with Dushyant Arora + and Jennifer Rexford* + Arista Networks *Princeton University Theophilus.
Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International.
UNIVERSITY OF ELECTRONIC SCIENCE & TECHNOLOGY OF CHINA IEEE INFOCOM 2015, Hong Kong RAPIER: Integrating Routing and Scheduling for Coflow-aware Data Center.
A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat Presented by Gregory Peaker and Tyler Maclean.
1 K. Salah Module 4.3: Repeaters, Bridges, & Switches Repeater Hub NIC Bridges Switches VLANs GbE.
Ji-Yong Shin * Bernard Wong +, and Emin Gün Sirer * * Cornell University + University of Waterloo 2 nd ACM Symposium on Cloud ComputingOct 27, 2011 Small-World.
A Scalable, Commodity Data Center Network Architecture Mohammad AI-Fares, Alexander Loukissas, Amin Vahdat Presented by Ye Tao Feb 6 th 2013.
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network Architecture.
Camdoop: Exploiting In-network Aggregation for Big Data Applications Paolo Costa, Austin Donnelly, Antony Rowstron, Greg O’Shea Presenter – Manoj Kumar(mkumar11)
B 謝宗廷 B96b02016 張煥基 1. Outline Introduction Bcube structure Bcube source routing OTHER DESIGN ISSUES GRACEFUL DEGRADATION Implementation Architecture.
Datacast: A Scalable and Efficient Reliable Group Data Delivery Service for Data Centers Jiaxin Cao, Chuanxiong Guo, Guohan Lu, Yongqiang Xiong, Yixin.
Presenter: Po-Chun Wu. Outline Introduction BCube Structure BCube Source Routing (BSR) Other Design Issues Graceful degradation Implementation.
Networking the Cloud Presenter: b 電機三 姜慧如.
VL2 – A Scalable & Flexible Data Center Network Authors: Greenberg et al Presenter: Syed M Irteza – LUMS CS678: 2 April 2013.
Routing & Architecture
1 Department of Computer Science, Jinan University 2 School of Computer Science & Technology, Huazhong University of Science & Technology Junjie Xie 1,
© Copyright 2010 Hewlett-Packard Development Company, L.P. 1 Jayaram Mudigonda, HP Labs Praveen Yalagandula, HP Labs Mohammad Al-Fares, UCSD Jeff Mogul,
Floodless in SEATTLE : A Scalable Ethernet ArchiTecTure for Large Enterprises. Changhoon Kim, Matthew Caesar and Jenifer Rexford. Princeton University.
A.SATHEESH Department of Software Engineering Periyar Maniammai University Tamil Nadu.
VL2: A Scalable and Flexible Data Center Network Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David.
Chi-Cheng Lin, Winona State University CS 313 Introduction to Computer Networking & Telecommunication Local Area Networks.
Datacenter Network Simulation using ns3
Dual Centric Data Center Network Architectures DAWEI LI, JIE WU (TEMPLE UNIVERSITY) ZHIYONG LIU, AND FA ZHANG (CHINESE ACADEMY OF SCIENCES) ICPP 2015.
Kai Chen (HKUST) Nov 1, 2013, USTC, Hefei Data Center Networking 1.
Chuanxiong Guo, Haitao Wu, Kun Tan, Lei Shi, Yongguang Zhang, Songwu Lu SIGCOMM 2008 Presented by Ye Tian for Course CS05112.
Authors: Xiaoqiao Meng, Vasileio Pappas and Li Zhang
SecondNet: A Data Center Network Virtualization Architecture with Bandwidth Guarantees Chuanxiong Guo 1, Guohan Lu 1, Helen J. Wang 2, Shuang Yang 3, Chao.
Jiaxin Cao, Rui Xia, Pengkun Yang, Chuanxiong Guo,
Approaches to Improve Data Center Performance through Networking - Gurubaran.
CubicRing ENABLING ONE-HOP FAILURE DETECTION AND RECOVERY FOR DISTRIBUTED IN- MEMORY STORAGE SYSTEMS Yiming Zhang, Chuanxiong Guo, Dongsheng Li, Rui Chu,
Semester 3 The Final Exam?? Fastest Card Question 1 A) MAC addressB) Software specifications C) Hardware specificationsD) Configuration of peripherals.
PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis.
VL2: A Scalable and Flexible Data Center Network
Data Center Architectures
Yiting Xia, T. S. Eugene Ng Rice University
Data Center Networking
Instructor Materials Chapter 1: LAN Design
CIS 700-5: The Design and Implementation of Cloud Networks
Architecture and Algorithms for an IEEE 802
Data Center Network Architectures
Chuanxiong Guo, et al, Microsoft Research Asia, SIGCOMM 2008
Improving Datacenter Performance and Robustness with Multipath TCP
NTHU CS5421 Cloud Computing
BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers Chuanxiong Guo1, Guohan Lu1, Dan Li1, Haitao Wu1, Xuan Zhang2,
Chuanxiong Guo, Haitao Wu, Kun Tan,
Data Center Networking
NTHU CS5421 Cloud Computing
VL2: A Scalable and Flexible Data Center Network
RDMA over Commodity Ethernet at Scale
Data Center Architectures
A Closer Look at NFV Execution Models
Data Center Traffic Engineering
Presentation transcript:

BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers Chuanxiong Guo1, Guohan Lu1, Dan Li1, Haitao Wu1, Xuan Zhang2, Yunfeng Shi3, Tian Chen4, Yongguang Zhang1, Songwu Lu5 1: Microsoft Research Asia (MSR-A), 2: Tsinghua U, 3: PKU, 4: HUST, 5: UCLA August 17, 2009 Barcelona Spain

Container-based modular DC 1000-2000 servers in a single container Core benefits of Shipping Container DCs: Easy deployment High mobility Just plug in power, network, & chilled water Increased cooling efficiency Manufacturing & H/W Admin. Savings Sun Project Black Box 242 systems in 20’ Rackable Systems Container 2800 servers in 40’

BCube design goals High network capacity for various traffic patterns One-to-one unicast One-to-all and one-to-several reliable groupcast All-to-all data shuffling Only use low-end, commodity switches Graceful performance degradation Performance degrades gracefully as servers/switches failure increases

BCube structure Connecting rule A BCubek network supports servers <1,0> A BCubek network supports servers n is the number of servers in a BCube0 k is the level of that BCube A server is assigned a BCube addr (ak,ak-1,…,a0) where ai [0,k] Neighboring server addresses differ in only one digit <1,1> <1,2> <1,3> Level-1 BCube0 <0,0> 00 01 02 03 <0,1> 10 11 12 13 <0,2> 20 21 22 23 <0,3> 30 31 32 33 Level-0 Connecting rule The i-th server in the j-th BCube0 connects to the j-th port of the i-th level-1 switch switch server

BCube: Server centric network MAC03 MAC13 1 MAC23 2 MAC33 3 port Switch <1,3> MAC table <1,0> Server-centric BCube Switches never connect to other switches and only act as L2 crossbars Servers control routing, load balancing, fault-tolerance <1,1> <1,2> <1,3> MAC23 MAC03 20 03 data MAC20 MAC21 1 MAC22 2 MAC23 3 port Switch <0,2> MAC table BCube0 <0,0> 00 01 02 03 <0,1> 10 11 12 13 <0,2> 20 21 22 23 <0,3> 30 31 32 33 MAC20 MAC23 20 03 data dst src MAC20 MAC23 20 03 data MAC addr Bcube addr MAC23 MAC03 20 03 data

Multi-paths for one-to-one traffic Theorem 1. The diameter of a BCubek is k+1 Theorem 3. There are k+1 parallel paths between any two servers in a BCubek <1,0> <1,1> <1,2> <1,3> <0,0> <0,1> 10 11 12 13 <0,2> <0,3> 30 31 32 33 00 01 02 03 20 21 22 23

Speedup for one-to-several traffic Theorem 4. Server A and a set of servers {di|di is A’s level-i neighbor} form an edge disjoint complete graph of diameter 2 <1,0> <1,1> <1,2> <1,3> <0,0> <0,1> <0,2> <0,3> 30 31 32 33 00 01 02 03 10 11 12 13 20 21 22 23 P1 P1 P1 P2 P2 P2

Speedup for one-to-all traffic Theorem 5. There are k+1 edge-disjoint spanning trees in a Bcubek src 00 01 02 03 10 11 12 13 The one-to-all and one-to-several SPTs can be implemented by TCP unicast to achieve reliability 20 21 22 23 30 31 32 33

Aggregate bottleneck throughput for all-to-all traffic Aggregate bottleneck throughput (ABT) is the total number of flows times the throughput of the bottleneck flow under the all-to-all communication pattern Theorem 6. The ABT for a BCube network is where n is the switch port number and N is the total server number

BCube Source Routing (BSR) Server-centric source routing Source server decides the best path for a flow by probing a set of parallel paths Source server adapts to network condition by re-probing periodically or due to failures BSR design rationale Network structural property Scalability Routing performance

Path compression and fast packet forwarding Traditional address array needs 16 bytes: Path(00,13) = {02,22,23,13} Forwarding table of server 23 NHI Output port MAC 0:0 Mac20 0:1 Mac21 0:2 Mac22 1:0 1 Mac03 1:1 Mac13 1:3 Mac33 The Next Hop Index (NHI) Array needs 4 bytes: Path(00,13)={0:2,1:2,0:3,1:1} <1,0> <1,1> <1,2> <1,3> Fwd node 2 3 Next hop 1 3 <0,0> <0,1> <0,2> <0,3> 00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Graceful degradation The metric: aggregation bottleneck throughput (ABT) under different server and switch failure rates Server failure Switch failure BCube BCube Fat-tree Fat-tree DCell DCell

Routing to external networks Ethernet has two levels link rate hierarchy 1G for end hosts and 10G for uplink aggregator 10G <1,3> <1,0> <1,1> <1,1> <1,2> <1,3> <1,1> 1G <0,0> <0,1> 10 11 12 13 <0,2> 20 21 22 23 <0,3> 30 31 32 33 01 11 21 31 00 01 02 03 gateway gateway gateway gateway

TCP/IP protocol driver Implementation BCube configuration app software TCP/IP protocol driver kernel Intermediate driver BCube driver Packet send/recv BSR path probing & selection packet fwd Neighbor maintenance Flow-path cache Ava_band calculation Intel® PRO/1000 PT Quad Port Server Adapter Ethernet miniport driver IF 0 IF 1 IF k hardware packet fwd Neighbor maintenance server ports NetFPGA Ava_band calculation

Testbed A BCube testbed NIC 16 servers (Dell Precision 490 workstation with Intel 2.00GHz dualcore CPU, 4GB DRAM, 160GB disk) 8 8-port mini-switches (DLink 8-port Gigabit switch DGS-1008D) NIC Intel Pro/1000 PT quad-port Ethernet NIC NetFPGA

Bandwidth-intensive application support Per-server throughput

Support for all-to-all traffic Total throughput for all-to-all

Related work Speedup

Related work UCSD08 and Portland VL2 DCell Rearrangeable non-blocking Clos network No server change needed Destination addr based routing VL2 Reduces cables by using 10G Ethernet Leveraging existing OSFP and ECMP Randomized Valiant routing DCell For different purposes but share the same design philosophy BCube provides better load-balancing and network capacity

Summary Internetworking for modular mega data centers A novel network architecture for container-based, modular data centers Enables speedup for one-to-x and x-to-one traffic Provides high network capacity to all-to-all traffic Purely constructed from low-end commodity switches Graceful performance degradation IEEE Spectrum Feb.

Q & A