Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International.

Similar presentations


Presentation on theme: "Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International."— Presentation transcript:

1 Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International Conference on Algorithms & Architectures for Parallel Processing. August 24-27, Dalian, China.

2 Motivation Challenges Related work Our idea System architecture Evaluation Conclusion 2

3 The Explosive Growth of Data  IDC: 1,800EB data in 2011, 40-60% annual increase  Larger Data Center  Google: 19 data centers > 1 million servers  Higher traffic  Cisco forecasts that annual traffic in global data centers will nearly triple over the next 5 years and reach 7.7ZB by the end of 2017 3 Google Data Center

4 Data Center Network Node increment  Scalability? Failures are common  Fault tolerance?  Google MapReduce in a 4,000-node cluster:  5 nodes fail during a job  1 disk fails every 6 hours Bandwidth-hungry services  Network capacity? Infrastructure services: MapReduce, GFS, … Network applications: Cloud disk, Video, …

5 Tree-based Structure Traditional tree  Bandwidth bottleneck, Single points of failure, Expensive Modified tree: Fat-tree  High capacity  Limited scalability 5 Traditional Tree-based Structure Fat-tree

6 Other novel, hybrid network structures Physical topology  Level-based, but not tree-based  Recursively defined Routing mechanism  No routers, without traditional internet routing mechanism  Put routing intelligence on servers  Take advantage of structural properties Typical structures  DCell, FiConn, BCube, Totoro… 6

7 DCell 7 Totoro FiConn BCube Physical structures

8 Routing mechanisms 8 DCellTotoroFiConnBCube Core ideaDivide-and-Conquer Correct different address digits CalculationHop by hopFull path Link stateBroadcast domainPath probing Path selectionDijkstra + ReroutingGreedyAvailable one Traffic-awareNo mentionYesNo mention Shortest distance NoYes

9 What we achieve: Athena Routing Mechanism Routing algorithm  Based on Dynamic Programming  Find the shortest path with lower complexity than classic algorithms  Support Multi-path Path probing mechanism  Bypass the failed nodes & links  Traffic-aware Properties  More resilient, shorter latency, higher capacity, Lower complexity 9

10 Athena Routing Mechanism Implement on the structure of Totoro Compare with the original Totoro Fault-tolerant Routing Algorithm (TFR) and Shortest Path Algorithm (SPA, based on Floyd-Warshall). Applicable to DCell, FiConn, BCube… Similar topology: level-based, recursively defined.. Put routing intelligence on servers 10

11 Totoro  Two-port servers  Low-end switches  Level-based  Recursively defined two-port NIC 11 Totoro Structure of One Level

12 Building Totoro Connect N servers to an N-port switch Here, N=4 Basic partition: Totoro 0 Intra-switch A Totoro 0 Structure 12

13 Building Totoro Available ports in Totoro 0 : c. Here, c=4 Connect n Totoro 0 s to n-port switches by using c/2 ports Inter-switch A Totoro 1 structure consists of n Totoro 0 s. 13

14 Building Totoro Connect n Totoro i-1 s to n-port switches to build a Totoro i Recursively defined Half of available ports ⇒ Open & Scalable The number of paths among Totoro i s is n/2 times of the number of paths among Totoro i-1 s ⇒ Multi- redundant links ⇒ High network capacity 14

15 15 Totoro 2 structure with N = 4, n = 4, K = 2.

16 16 Athena Routing Algorithm (ARA) Based on Dynamic Programming (DP) Applicable to problems which exhibit the properties of  Overlapping subproblems  Optimal substructure Recursively calculate

17 17 Steps of ARA: 1.Suppose src and dst belong to two partitions. 2.Get all paths connecting these two partitions. 3.For each path, recursively calculate it. 4.Store all paths. 5.Sort all path by length. 6.Remove the extra paths. This function is based on the corresponding structural properties. Cartesian product

18 18 Case study of ARA work out the path from src to dst

19 19 Case study of ARA Step. 1: src and dst belong to two different sub- partitions respectively

20 20 Case study of ARA Step. 2: there exist two paths between these two sub- partitions

21 21 Case study of ARA Step. 3: for Path 1, recursively work out the sub-paths in these sub-partitions, and join them for a full path

22 22 Case study of ARA Step. 4: similarly, work out the full path for Path 2

23 23 Case study of ARA Step. 5: add all paths into the result set

24 24 Case study of ARA Step. 5: sort the paths by lengths

25 25 Case study of ARA Step. 5: remove the extra paths (here, we suppose the size of set to return is 1, i.e., it is the shortest path)

26 26 Path Probing Mechanism Source host sends the probing request packets Destination host sends probing reply packets Intermediate servers record the link capacities in the probing packets and forward them

27 27 Path Probing Mechanism Detect the failed paths  No extra rerouting technique is required Detect the link capacity  Support load balance…

28 28

29 29

30 30 Protocol Implementation ARM Packet format  Path-probing packet  Data packet

31 31 Protocol Implementation Protocol  2.5-layer protocol  How an intermediate server determines the next hop?  A fact: two adjacent servers in a path only differ at one “bit”  Hence, we only store the different “bit”s in the vector. 2 2.53 4 Ethernet ARMIPTCP

32 Evaluating Path Failure & Average Path Lengths ARM vs. TFR vs. SPA TFR : the original Totoro Fault-tolerant Routing algorithm SPA: Shortest Path Algorithm, Floyd-Warshall, performance bound Evaluating Resource Usage 32

33 33 Evaluating Path Failure & Average Path Lengths Experimental parameters Types of failuresLink, Node, Switch & Rack failures PlatformTotoro 2 (4096 servers) Failures ratios2% - 20% Communication modeAll-to-all Simulation times20 times

34 34 Evaluating Path Failure P ath failure ratio vs. server/rack failure ratio The performance of ARM/TFR are almost identical to that of SPA!

35 35 Evaluating Path Failure P ath failure ratio vs. switch failure ratio The performance of ARM is almost identical to that of SPA! But TFR isn’t.

36 36 Evaluating Path Failure Path failure ratio vs. link failure ratio When a high link failure occurs: ARM achieves slightly better capacity than TFR. Performance gap between ARM and SPA still exists! SPA traverse all feasible links in the whole structure until finding a valid path! This is a tradeoff that ARM makes to facilitate algorithmic complexity and save computation resources.

37 37 Evaluating Average Path Lengths ARM: 1.Better than TFR. 2.Almost identical to SPA. 3.Shorter than SPA, this is because the path failure ratio of ARM is a bit higher than that of SPA, thus our total path length is shorter.

38 38 Evaluating Resource Usage Experimental parameters TestbedLenovo T350, Quad-core, 8GB memory PlatformTotoro 2 (4096 servers) Size of each result10 paths Communication modeOne-to-all in 4 Totoro 1

39 39 Evaluating Resource Usage +10nodes/s 28% 18s 0% CPU: 1.Increase by 10 per second 2.Peak value of 28% at 18s 3.Benefited from the cache Memory: For each host, it only costs 164KB at most.

40 More resilient Shorter latency Higher capacity Lower complexity In the future work, we will focus on the implementation of ARM in DCell, FiConn and other structures! 40

41 41 ICA3PP 2014: The 14th International Conference on Algorithms & Architectures for Parallel Processing. August 24-27, Dalian, China.


Download ppt "Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International."

Similar presentations


Ads by Google