Presenter: Po-Chun Wu. Outline Introduction BCube Structure BCube Source Routing (BSR) Other Design Issues Graceful degradation Implementation.

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

RIP V1 W.lilakiatsakun.
BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers Chuanxiong Guo1, Guohan Lu1, Dan Li1, Haitao Wu1, Xuan Zhang2,
Larger Site Networks Part 1. 2 Small Site –Single-hub or Single- Switch Ethernet LANs Large Site –Multi-hub Ethernet LANs –Ethernet Switched Site Networks.
Internetworking Pertemuan 07 Matakuliah: H0484/Jaringan Komputer Tahun: 2007.
Data Center Fabrics. Forwarding Today Layer 3 approach: – Assign IP addresses to hosts hierarchically based on their directly connected switch. – Use.
UNIT-IV Computer Network Network Layer. Network Layer Prepared by - ROHIT KOSHTA In the seven-layer OSI model of computer networking, the network layer.
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) SriramGopinath( )
Datacenter Network Topologies
William Stallings Data and Computer Communications 7 th Edition (Selected slides used for lectures at Bina Nusantara University) Internetworking.
BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers Chuanxiong Guo1, Guohan Lu1, Dan Li1, Haitao Wu1, Xuan Zhang2,
Chuanxiong Guo, Haitao Wu, Kun Tan,
Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International.
A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat Presented by Gregory Peaker and Tyler Maclean.
Chapter 9 Classification And Forwarding. Outline.
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network Architecture.
Camdoop: Exploiting In-network Aggregation for Big Data Applications Paolo Costa, Austin Donnelly, Antony Rowstron, Greg O’Shea Presenter – Manoj Kumar(mkumar11)
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Connecting LANs, Backbone Networks, and Virtual LANs
FAR: A Fault-avoidance Routing Method for Data Center Networks with Regular Topology Bin Liu, ZTE.
B 謝宗廷 B96b02016 張煥基 1. Outline Introduction Bcube structure Bcube source routing OTHER DESIGN ISSUES GRACEFUL DEGRADATION Implementation Architecture.
CN2668 Routers and Switches Kemtis Kunanuraksapong MSIS with Distinction MCTS, MCDST, MCP, A+
TCOM 509 – Internet Protocols (TCP/IP) Lecture 03_a
Link Layer and Wireless CS144 Review Session 7 May 16, 2008 Ben Nham.
1 Introducing Routing 1. Dynamic routing - information is learned from other routers, and routing protocols adjust routes automatically. 2. Static routing.
Understanding Routing. Agenda What Is Routing? Network Addressing Routing Protocols.
VL2 – A Scalable & Flexible Data Center Network Authors: Greenberg et al Presenter: Syed M Irteza – LUMS CS678: 2 April 2013.
Routing & Architecture
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) Sriram Gopinath( )
CMPT 471 Networking II Address Resolution IPv4 ARP RARP 1© Janice Regan, 2012.
ECE 526 – Network Processing Systems Design Networking: protocols and packet format Chapter 3: D. E. Comer Fall 2008.
Floodless in SEATTLE : A Scalable Ethernet ArchiTecTure for Large Enterprises. Changhoon Kim, Matthew Caesar and Jenifer Rexford. Princeton University.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
VL2: A Scalable and Flexible Data Center Network Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David.
Chi-Cheng Lin, Winona State University CS 313 Introduction to Computer Networking & Telecommunication Local Area Networks.
1 Network Administration Module 3 ARP/RARP. 2 Address Resolution The problem Physical networks use physical addresses, not IP addresses Need the physical.
SW REVERSE JEOPARDY Chapter 1 CCNA2 SW Start-up Routing table Routing table Router parts Router parts Choosing a path Choosing a path Addressing Pot.
Chapter 7 Backbone Network. Announcements and Outline Announcements Outline Backbone Network Components  Switches, Routers, Gateways Backbone Network.
Networking Fundamentals. Basics Network – collection of nodes and links that cooperate for communication Nodes – computer systems –Internal (routers,
Chuanxiong Guo, Haitao Wu, Kun Tan, Lei Shi, Yongguang Zhang, Songwu Lu SIGCOMM 2008 Presented by Ye Tian for Course CS05112.
Network Layer (OSI and TCP/IP) Lecture 9, May 2, 2003 Data Communications and Networks Mr. Greg Vogl Uganda Martyrs University.
SecondNet: A Data Center Network Virtualization Architecture with Bandwidth Guarantees Chuanxiong Guo 1, Guohan Lu 1, Helen J. Wang 2, Shuang Yang 3, Chao.
A Framework for Reliable Routing in Mobile Ad Hoc Networks Zhenqiang Ye Srikanth V. Krishnamurthy Satish K. Tripathi.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. Overview of Ethernet Networking A Rev /31/2011.
5: DataLink Layer 5a-1 Bridges and spanning tree protocol Reference: Mainly Peterson-Davie.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
CHAPTER -II NETWORKING COMPONENTS CPIS 371 Computer Network 1 (Updated on 3/11/2013)
Address Resolution Protocol Yasir Jan 20 th March 2008 Future Internet.
ROUTING AND ROUTING TABLES 2 nd semester
Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai.
VL2: A Scalable and Flexible Data Center Network
Architecture and Algorithms for an IEEE 802
Data Center Network Architectures
ETHANE: TAKING CONTROL OF THE ENTERPRISE
Chuanxiong Guo, et al, Microsoft Research Asia, SIGCOMM 2008
3. Internetworking (part 2: switched LANs)
Network Load Balancing Topology
Introduction to Networks
NTHU CS5421 Cloud Computing
Chapter 7 Backbone Network
BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers Chuanxiong Guo1, Guohan Lu1, Dan Li1, Haitao Wu1, Xuan Zhang2,
Chuanxiong Guo, Haitao Wu, Kun Tan,
Dingming Wu+, Yiting Xia+*, Xiaoye Steven Sun+,
NTHU CS5421 Cloud Computing
VL2: A Scalable and Flexible Data Center Network
Dr. Rocky K. C. Chang 23 February 2004
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Ch 17 - Binding Protocol Addresses
Presentation transcript:

Presenter: Po-Chun Wu

Outline Introduction BCube Structure BCube Source Routing (BSR) Other Design Issues Graceful degradation Implementation and Evaluation Conclusion

Introduction

Container-based modular DC servers in a single container Core benefits of Shipping Container DCs: – Easy deployment High mobility Just plug in power, network, & chilled water – Increased cooling efficiency – Manufacturing & H/W Admin. Savings

BCube design goals High network capacity for: – One-to-one unicast – One-to-all and one-to-several reliable groupcast – All-to-all data shuffling Only use low-end, commodity switches Graceful performance degradation – Performance degrades gracefully as servers/switches failure increases

BCube Structure

BCube0 BCube1 server switch Level-1 Level-0 Connecting rule - The i-th server in the j-th BCube 0 connects to the j-th port of the i-th level-1 switch -Server “13” is connected to switches and A BCube k has: - K+1 levels: 0 through k - n-port switches, same count at each level (n k ) - n k+1 total servers, (k+1)n k total switches. -(n=8,k=3 : 4-levels connecting 4096 servers using port switches at each layer.) A server is assigned a BCube addr (a k,a k-1,…,a 0 ) where a i  [0,k] Neighboring server addresses differ in only one digit Switches only connect to servers A BCube k has: - K+1 levels: 0 through k - n-port switches, same count at each level (n k ) - n k+1 total servers, (k+1)n k total switches. -(n=8,k=3 : 4-levels connecting 4096 servers using port switches at each layer.) A server is assigned a BCube addr (a k,a k-1,…,a 0 ) where a i  [0,k] Neighboring server addresses differ in only one digit Switches only connect to servers

Bigger BCube: 3-levels (k=2) BCube2 BCube1

MAC addr Bcube addr BCube0 BCube1 MAC030 MAC131 MAC232 MAC333 port Switch MAC table MAC200 MAC211 MAC222 MAC233 port Switch MAC table BCube: Server centric network MAC23MAC data MAC23MAC data dstsrc MAC20MAC data MAC20MAC data Server-centric BCube - Switches never connect to other switches and only connect to servers - Servers control routing, load balancing, fault- tolerance Server-centric BCube - Switches never connect to other switches and only connect to servers - Servers control routing, load balancing, fault- tolerance

Bandwidth-intensive application support One-to-one: – one server moves data to another server. (disk backup) One-to-several: – one server transfers the same copy of data to several receivers. (distributed file systems) One-to-all: – a server transfers the same copy of data to all the other servers in the cluster (boardcast) All-to-all: – very server transmits data to all the other servers (MapReduce)

Multi-paths for one-to-one traffic T HEOREM 1. The diameter(longest path) of a BCube k is k+1 T HEOREM 3. There are k+1 parallel paths between any two servers in a BCube k

Speedup for one-to-several traffic T HEOREM 4. Server A and a set of servers {d i |d i is A’s level-i neighbor} form an edge disjoint complete graph of diameter P1 P2 P1P2P1 P2 Writing to ‘r’ servers, is r-times faster than pipeline replication

Speedup for one-to-all traffic T HEOREM 5. There are k+1 edge-disjoint spanning trees in a Bcube k src The one-to-all and one-to- several spanning trees can be implemented by TCP unicast to achieve reliability

Aggregate bottleneck throughput for all-to-all traffic The flows that receive the smallest throughput are called the bottleneck flows. Aggregate bottleneck throughput (ABT) – ( the bottleneck flow) * ( the number of total flows in the all-to-all traffic ) Larger ABT means shorter all-to-all job finish time. T HEOREM 6. The ABT for a BCube network is where n is the switch port number and N is the total server number In BCube there are no bottlenecks since all links are used equally

Bcube Source Routing (BSR)

BCube Source Routing (BSR) Server-centric source routing – Source server decides the best path for a flow by probing a set of parallel paths – Source server adapts to network condition by re-probing periodically or due to failures – Intermediate servers only forward the packets based on the packet header. source intermediate K+1 path Probe packet destination

BSR Path Selection Source server: – 1.construct k+1 paths using BuildPathSet – 2.Probes all these paths (no link status broadcasting) – 3.If a path is not found, it uses BFS to find alternative (after removing all others) Intermediate servers: – Updates Bandwidth: min(PacketBW, InBW, OutBW) – If next hops is not found, returns failure to source Destination server: – Updates Bandwidth: min(PacketBW, InBW) – Send probe response to source on reverse path 4.Use a metric to select best path. (maximum available bandwidth / end-to-end delay)

Path Adaptation Source performs path selection periodically (every 10 seconds) to adapt to failures and network condition changes. If a failure is received, the source switches to an available path and waits for next timer to expire for the next selection round and not immediately. Usually uses randomness in timer to avoid path oscillation.

Packet Forwarding Each server has two components: – Neighbor status table (k+1)x(n-1) entries Maintained by the neighbor maintenance protocol (updated upon probing / packet forwarding) Uses NHA(next hop index) encoding for indexing neighbors ([DP:DV]) – DP: diff digit (2bits) – DV: value of diff digit (6 bits) – NHA Array (8 bytes: maximun diameter = 8) Almost static (except Status) – Packet forwarding procedure Intermediate servers update next hop MAC address on packet if next hop is alive Intermediate servers update status from packet One table lookup

Path compression and fast packet forwarding Traditional address array needs 16 bytes: Path(00,13) = {02,22,23,13} Forwarding table of server 23 The Next Hop Index (NHI) Array needs 4 bytes: Path(00,13)={0:2,1:2,0:3,1:1} NHIOutput portMAC 0:00Mac20 0:10Mac21 0:20Mac22 1:01Mac03 1:11Mac13 1:31Mac Fwd node Next hop 20

Other Design Issues

Partial Bcube k BCube0 BCube1 Level-1 Level-0 (1) build the need BCube k−1 s (2) use partial layer-k switches ? Solution – connect the BCube k−1 s using a full layer-k switches. Advantage – BCubeRouting performs just as in a complete BCube, and BSR just works as before. Disadvantage – switches in layer-k are not fully utilized. Solution – connect the BCube k−1 s using a full layer-k switches. Advantage – BCubeRouting performs just as in a complete BCube, and BSR just works as before. Disadvantage – switches in layer-k are not fully utilized.

Packing and Wiring (1/2) 2048 servers and port switches – a partial BCube with n = 8 and k = 3 40 feet container (12m*2.35m*2.38m) 32 racks in a container

Packing and Wiring (2/2) One rack = BCube 1 Each rack has 44 units – 1U = 2 servers or 4 switches – 64 servers occupy 32 units – 40 switches occupy 10 units Super-rack(8 racks) = BCube 2

Routing to external networks (1/2) Ethernet has two levels link rate hierarchy – 1G for end hosts and 10G for uplink aggregator gateway G G

Routing to external networks (2/2) When an internal server sends a packet to an external IP address 1)choose one of the gateways. 2)The packet is then routed to the gateway using BSR (BCube Source Routing) 3)After the gateway receives the packet, it strips the BCube protocol header and forwards the packet to the external network via the 10G uplink

Graceful degradation

DCell fat-tree

Graceful degradation Server failure Switch failure BCube DCell Fat-tree BCube Fat-tree Graceful degradation : when server or switch failure increases, ABT reduces slowly and there are no dramatic performance falls. (Simulation Based)

Implementation and Evaluation

hardware IF 0IF 1 IF k Ethernet miniport driver TCP/IP protocol driver BCube configuration server ports BCube driver BSR path probing & selection Flow-path cache Neighbor maintenance Ava_band calculation Packet send/recv app kernel packet fwd software Neighbor maintenance Neighbor maintenance Ava_band calculation packet fwd Implementation Intermediate driver

Testbed A BCube testbed – 16 servers (Dell Precision 490 workstation with Intel 2.00GHz dualcore CPU, 4GB DRAM, 160GB disk) – 8 8-port mini-switches (DLink 8-port Gigabit switch DGS-1008D) NIC – Intel Pro/1000 PT quad-port Ethernet NIC – NetFPGA Intel® PRO/1000 PT Quad Port Server Adapter NetFPGA

Bandwidth-intensive application support Per-server throughput

Support for all-to-all traffic Total throughput for all-to-all

Related work Speedup

Conclusion BCube is a novel network architecture for shipping-container-based MDC Forms a server-centric architecture network Use mini-switches instead of 24 port switches BSR enables graceful degradation and meets the special requirements of MDC