Chuanxiong Guo, Haitao Wu, Kun Tan, Lei Shi, Yongguang Zhang, Songwu Lu Microsoft Research Asia, Tsinghua University, UCLA 1 DCell: A Scalable and Fault-Tolerant.

Slides:



Advertisements
Similar presentations
Network Layer Delivery Forwarding and Routing
Advertisements

Computer Networks TCP/IP Protocol Suite.
1 UNIT I (Contd..) High-Speed LANs. 2 Introduction Fast Ethernet and Gigabit Ethernet Fast Ethernet and Gigabit Ethernet Fibre Channel Fibre Channel High-speed.
Virtual Trunk Protocol
McGraw-Hill©The McGraw-Hill Companies, Inc., 2003 Chapter 11 Ethernet Evolution: Fast and Gigabit Ethernet.
Chapter 1 The Study of Body Function Image PowerPoint
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
OSPF 1.
Path Splicing Nick Feamster, Murtaza Motiwala, Megan Elmore, Santosh Vempala.
Interconnection: Switching and Bridging
Nick Feamster Georgia Tech
Interconnection: Switching and Bridging CS 4251: Computer Networking II Nick Feamster Fall 2008.
Energy-Efficient Distributed Algorithms for Ad hoc Wireless Networks Gopal Pandurangan Department of Computer Science Purdue University.
1/25 Generic and Automatic Address Configuration for Data Center Networks 1 Kai Chen, 2 Chuanxiong Guo, 2 Haitao Wu, 3 Jing Yuan, 4 Zhenqian Feng, 1 Yan.
Scalable Routing In Delay Tolerant Networks
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Video Services over Software-Defined Networks
1 IP - The Internet Protocol Relates to Lab 2. A module on the Internet Protocol.
Protocol layers and Wireshark Rahul Hiran TDTS11:Computer Networks and Internet Protocols 1 Note: T he slides are adapted and modified based on slides.
4.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 4: Organizing a Disk for Data.
Chapter 1: Introduction to Scaling Networks
Local Area Networks - Internetworking
Data Structures Using C++
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 2 The OSI Model and the TCP/IP.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Distance Vector Routing Protocols Routing Protocols and Concepts –
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 8: Subnetting IP Networks Network Fundamentals.
Chapter 10: Virtual Memory
Mobile IP: Multicast Service Reference: Multicast routing protocol in mobile networks; Hee- Sook Shin; Young-Joo Suh;, Proc. IEEE International Conference.
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
VOORBLAD.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Chapter 20 Network Layer: Internet Protocol
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Differential Forms for Target Tracking and Aggregate Queries in Distributed Networks Rik Sarkar Jie Gao Stony Brook University 1.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 EN0129 PC AND NETWORK TECHNOLOGY I IP ADDRESSING AND SUBNETS Derived From CCNA Network Fundamentals.
IP. Orientation 2 IP (Internet Protocol) is a Network Layer Protocol. IP’s current version is Version 4 (IPv4). It is specified in RFC 891.
IPv6 Routing.
© 2012 National Heart Foundation of Australia. Slide 2.
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 10 Routing Fundamentals and Subnets.
1 Introduction to Network Layer Lesson 09 NETS2150/2850 School of Information Technologies.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Chapter 9: Subnetting IP Networks
25 seconds left…...
© 2006 Cisco Systems, Inc. All rights reserved. MPLS v2.2—5-1 MPLS VPN Implementation Configuring BGP as the Routing Protocol Between PE and CE Routers.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Addressing the Network – IPv4 Network Fundamentals – Chapter 6.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Connecting LANs, Backbone Networks, and Virtual LANs
PSSA Preparation.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Link-State Routing Protocols Routing Protocols and Concepts – Chapter.
Scalable Rule Management for Data Centers Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan 4/3/2013.
Link State Routing Jean-Yves Le Boudec Fall
New Opportunities for Load Balancing in Network-Wide Intrusion Detection Systems Victor Heorhiadi, Michael K. Reiter, Vyas Sekar UNC Chapel Hill UNC Chapel.
BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers Chuanxiong Guo1, Guohan Lu1, Dan Li1, Haitao Wu1, Xuan Zhang2,
BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers Chuanxiong Guo1, Guohan Lu1, Dan Li1, Haitao Wu1, Xuan Zhang2,
Chuanxiong Guo, Haitao Wu, Kun Tan,
Presenter: Po-Chun Wu. Outline Introduction BCube Structure BCube Source Routing (BSR) Other Design Issues Graceful degradation Implementation.
Routing & Architecture
1 Department of Computer Science, Jinan University 2 School of Computer Science & Technology, Huazhong University of Science & Technology Junjie Xie 1,
Floodless in SEATTLE : A Scalable Ethernet ArchiTecTure for Large Enterprises. Changhoon Kim, Matthew Caesar and Jenifer Rexford. Princeton University.
Chuanxiong Guo, Haitao Wu, Kun Tan, Lei Shi, Yongguang Zhang, Songwu Lu SIGCOMM 2008 Presented by Ye Tian for Course CS05112.
SecondNet: A Data Center Network Virtualization Architecture with Bandwidth Guarantees Chuanxiong Guo 1, Guohan Lu 1, Helen J. Wang 2, Shuang Yang 3, Chao.
CIS 700-5: The Design and Implementation of Cloud Networks
Chuanxiong Guo, et al, Microsoft Research Asia, SIGCOMM 2008
BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers Chuanxiong Guo1, Guohan Lu1, Dan Li1, Haitao Wu1, Xuan Zhang2,
Chuanxiong Guo, Haitao Wu, Kun Tan,
Generic and Automatic Address Configuration for Data Center Networks
Presentation transcript:

Chuanxiong Guo, Haitao Wu, Kun Tan, Lei Shi, Yongguang Zhang, Songwu Lu Microsoft Research Asia, Tsinghua University, UCLA 1 DCell: A Scalable and Fault-Tolerant Network Structure for Data Centers

Background 2 Data Center Data Center Networking(DCN) Networking infrastructure inside a data center, which connects a large number of servers via high-speed links and switches.

DCN Motivation 3 Ever increasing scale Google has 450,000 servers in 2006 Microsoft doubles its number of servers in 14 months The expansion rate exceeds Moore’s Law Network capacity: Bandwidth hungry data-centric applications Data shuffling in MapReduce/Dryad Data replication/re-replication in distributed file systems Index building in Search Fault-tolerance: When data centers scale, failures become the norm Cost: Using high-e nd switches/routers to scale up is costly

DCN Structure 4 The current DCN practice is to connect all the servers using a tree hierarchy of switches, core-switches or core-routers. (Can not meet the design requirements!) A novel network structure called DCell.

DCell Structure 5 DCell is a recursively defined structure, in which a high-level DCell is constructed from many low-level DCells and DCells at the same level are fully connected with one another. Scalable: scales doubly exponentially with server node degree Fault Tolerant: no single point of failure, address various fails High Capacity: distributes traffic evenly, and no bottleneck

DCell Physical Structure 6 DCellk (k >=0) denotes a level-k Dcell DCell0 is the building block to construct larger DCells. It has n servers and a mini-switch. All servers in DCell0 are connected to the mini-switch. A level-1 DCell1 is constructed using n + 1 DCell0s. In DCell1, each DCell0 is connected to all the other DCell0s with one link. Assign each server a 2-tuple [a1; a0], where a1 and a0 are the level-1 and level-0 IDs, respectively. Then two servers with 2- tuples [i; j-1] and [j; i] are connected with a link for every i and every j > i.

DCell: the Construction 7 Dcell_0 Server Mini-switch n servers in a DCell_0 n=2, k=0 DCell_1 n=2, k=1

DCell Physical Structure(Cont.) 8 Build level-2 or higher DCellk recursively in the same way to the above DCell1 construction. If we have built DCellk-1 and each DCellk-1 has tk-1 servers, then we can create a maximum tk of Dcellk-1s. Again we treat each DCellk-1 as a virtual node and fully connect these virtual nodes to form a complete graph. gk : the number of DCellk-1s in a DCellk tk: the number of servers in a DCellk gk = tk tk = gk * tk-1

Build a DCelll network 9 A DCellk is assigned a (k + 1)-tuple [ak, ak-1, …, a1, a0], where ai < gi(0 < i <= k) indicates which Dcelli-1 this server is located at and a0 < n indicates the index of the server in that DCell0. We further denote [ak; ak-1; … ; ai+1] (i > 0) as the prefix to indicate the DCelli this node belongs to.

Build a DCelll network(Cont.) 10 Build sub-DCells Connect sub-DCells to form complete graph End recursion by building DCell 0

Routing in a DCell 11 Cannot use global link-state routing scheme Cannot use hierarchical OSPF Use DCell Fault-tolerant Routing protocol(DFR) Firstly, routing without failure Secondly, broadcast scheme Finally, DFR!

Routing without Failure 12 DCellRouting: DCell uses a simple and effcient single-path routing algorithm for unicast by exploiting the recursive structure of DCell. To find the routing from src to dst in a DCellk 1. calculate the intermediate link (n1; n2) that interconnects the two DCellk-1s. 2. routing is then divided into how to find the two sub-pathes from src to n1 and from n2 to dst.

Routing without Failure(Cont.) 13 GetLink:Let sk-m and dk-m (sk-m < dk-m) be the indices of the two sub-DCells. Based on BuildDCells, the link that interconnects these two sub-DCells is ([sk-m; dk-m-1], [dk-m; sk-m]).

14 n1 src dst n2 GetLink:Let sk-m and dk-m (sk-m < dk-m) be the indices of the two sub-DCells. Based on BuildDCells, the link that interconnects these two sub-DCells is ([sk-m; dk-m-1], [dk- m; sk-m]).

Routing without Failure(Cont.) 15

Broadcast 16 Spanning Tree? Not fault tolerant! DCellBroadcast, a sender delivers the broadcast packet to all its k +1 neighbors when broadcasting a packet in a DCellk. Upon receiving a broadcast packet, a receiver first checks whether this packet has been received before. The receiver drops a duplicate packet but broadcasts a new packet to its other k links. DCellBroadcast is fault-tolerant in that a broadcast packet can reach all the receivers as long as the network is connected.

Fault-tolerant Routing 17 DFR uses DCellRouting and DCellBroadcast as building blocks. DFR handles three types of failures: server failure, rack failure, and link failure. Solutions: local reroute -> link failure (to bypass failed links ) local link-state -> server failure (avoid loops with only local- reroute) jump-up -> rack failure (To bypass a whole failed rack)

DFR: DCell Fault-tolerant Routing p1p1 q2q2 i3i3 DCell b q1q1 Proxy src dst m1m1 m2m2 n2n2 n1n1 r1r1 DCell b i1i1 i2i2 L L Proxy L+1 s1s1 Servers in a same share local link-state 18

Local-reroute and Proxy 19 From src to dst (in the same DCellk). First compute a path from src to dst using DCellRouting. Now assume an intermediate link (n1; n2) has failed. Local-reroute (bypass the failed link) 1. Calculates the level of (n1; n2), denoted by l. Then n1 and n2 are known to be in the same DCelll but in two different DCelll-1s. 2. It can always choose an other DCelll-1 (e.g., the one nearest to n1 but different from the one n2 is in). There must exist a link, denoted as (p1; p2), that connects this Dcelll-1 and the one where n1 resides. 3. Then chooses p2 as its proxy and re-routes packets from n1 to the selected proxy p2. p2 simply uses DCellRouting to route the packet to dst.

Local-reroute and Proxy(Cont.) 20 Problem! In pure local-reroute, if there is node which is in the path to the dst fails, we can never reroute the packet to dst! Local-reroute alone cannot completely address node failures. This is because it is purely based on DCell topology and does not utilize any kind of link or node states. Consider from src to dst there is sub DCellRouting path {(q1; q2), (q2; q3)}. The level of (q1; q2) is 1 and the level of (q2; q3) is 3. Now q1 finds that (q1; q2) is down (while actually q2 failed). Then, no matter how we re-route inside this DCell2, we will be routed back to the failed node q2!

Local Link-state 21 In a DCellb, each node uses DCellBroadcast to broadcast the status of all its (k + 1) links periodically or when it detects link failure. A node thus knows the status of all the outgoing/incoming links in its DCellb. Intra-Dcell routing: Link-state routing(Dijkstra algorithm)) Inter-Dcell routing: DCellRouting and local reroute

Jump-up for Rack Failure 22 in Figure 4. Upon receiving the rerouted packet (implying (n1; n2) has failed), p2 checks whether (q1; q2) has failed or not. If (q1; q2) also fails, it is a good indication that the whole i2 failed. p2 then chooses a proxy from Dcells with higher level (i.e., it jumps up). Therefore, with jump-up, the failed DCell i2 can be bypassed. To remove a packet 1. a retry count is added in the packet header 2. each packet has a time-to-live (TTL) field

DFR(DCell Fault-tolerant Routing) 23 Combine DCellRouting, Local-reroute, and Local Link-state together. 1. perform DCellRouting 2. get the first link(highest level of link) 3. if link fail, perform Local-reroute (Then perform DCellRouting recursively) 4. no fail, perform Local Link-state(Dijkstra routing)

DFR(Cont.) 24

Incremental Expansion re-wiring should not be allowed 2. addresses of existing machines should not change. Bottom-up(from DCell0 to DCell1,DCell2…), not Fault Tolerant! Top-down. When constructing a DCellk, we start from building many incomplete DCellk-1s and make them fully connected.

Simulation 26

Simulation(Cont.) 27

Implementation 28 The DCN protocol suite serves as a network layer for DCell-based data centers. similar to IP over the Internet.

Implementation(Cont.) 29 Layer-2.5 DCN Prototyping Apps only see TCP/IP Routing is in DCN More than lines of C code.

Experimental Environment 30 DCell1 with over 20 server nodes. This DCell1 is composed of 5 DCell0s, Each of which has 4 servers (Figure 1). Each server is a DELL 755DT desktop with Intel 2.33GHz dual-core CPU, 2GB DRAM, and 160GB hard disk. Each server also installs an Intel PRO/1000 PT Quad Port Ethernet adapter. The Ethernet switches used to form the DCell0s are D-Link 8-port Gigabit switches DGS-1008D (with each costing about $50).

Experimental Result 31

32 Thank You!