Download presentation
Presentation is loading. Please wait.
Published byJordan Fox Modified over 9 years ago
1
1 Department of Computer Science, Jinan University 2 School of Computer Science & Technology, Huazhong University of Science & Technology Junjie Xie 1, Yuhui Deng 1, Ke Zhou 2 1 NPC 2013: The 10th IFIP International Conference on Network and Parallel Computing. October 2, 2015. Guiyang, China.
2
Motivation Challenges Related work Our idea System architecture Evaluation Conclusion 2
3
The Explosive Growth of Data ⇒ Large Data Center Industrial manufacturing, E-commerce, Social network... IDC: 1,800EB data in 2011, 40-60% annual increase YouTube : 72 hours of video are uploaded per minute. Facebook : 1 billion active users upload 250 million photos per day. Image from http://www.buzzfeed.com 3
4
Feb.2011, 《 Science 》: On the Future of Genomic Data 。 Feb.2011, 《 Science 》: Climate Data Challenges in the 21st Century Jim Gray : The global amount of information would double every 18 months (1998).
5
IDC report: Most of the data would be stored in data centers. Large Data Center ⇒ Scalability Google: 19 data centers>1 million servers Facebook, Microsoft, Amazon… : >100k servers Large Data Center ⇒ Fault Tolerance Google MapReduce: 5 nodes fail during a job 1 disk fails every 6 hours Google Data Center Therefore, the data center network has to be very scalable and fault tolerant
6
Tree-based Structure Bandwidth bottleneck, Single points of failure, Expensive Fat-tree High capacity, Limited scalability 6 Tree-based Structure Fat-tree
7
7 DCell Scalable, Fault-tolerant, High capacity, Complex, Expensive DCell is a level-based, recursively defined interconnection structure. It requires multiport (e.g., 3, 4 or 5) servers. DCell scales doubly exponentially with the server node degree. It is also fault tolerant and supports high network capacity. Downside: It trades-off the expensive core switches/routers with multiport NICs and higher wiring cost. C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang and S. Lu. DCell: A Scalable and Fault-Tolerant Network Structure for Data Centers. In: Proc. of the ACM SIGCOMM’08, Aug 2008
8
FiConn Scalable, Fault-tolerant, Low capacity 8 D. Li, C. Guo, H. Wu, K. Tan, and S. Lu. FiConn: Using Backup Port for Server Interconnection in Data Centers. In: Proc. of the IEEE INFOCOM, 2009. FiConn utilizes servers with two built-in ports and low-end commodity switches to form the structure. FiConn has a lower wiring cost than DCell. Routing in FiConn also makes a balanced use of links at different levels and is traffic-aware to better utilize the link capacities. Downside: it has lower aggregate network capacity. Other architectures: Portland, VL2, Camcube…
9
What we achieve: Scalability: Millions of servers Fault-tolerance: Structure & Routing Low cost: Commodity devices High capacity: Multi- redundant links Totoro Structure of One Level 9
10
10 structure with N = 4, n = 4, K = 2.
11
Architecture: Two-port servers Low-end switches Recursively defined Building Algorithm k-level Totoro two-port NIC 11
12
Connect N servers to an N-port switch Here, N=4 Basic partition: Totoro 0 Intra-switch A Totoro 0 Structure 12
13
Available ports in Totoro 0 : c. Here, c=4 Connect n Totoro 0 s to n-port switches by using c/2 ports Inter-switch A Totoro 1 structure consists of n Totoro 0 s. 13
14
Connect n Totoro i-1 s to n-port switches to build a Totoro i Recursively defined Half of available ports ⇒ Open & Scalable The number of paths among Totoro i s is n/2 times of the number of paths among Totoro i-1 s ⇒ Multi-redundant links ⇒ High network capacity 14
15
Building Algorithm 15 0 TotoroBuild(N, n, K) { 1 Define t K = N * n K 2 Define server = [a K, a K-1, …, a i, …, a 1, a 0 ] 3 For tid = 0 to (t K - 1) 4 For i = 0 to (K – 1) 5 a i+1 = (tid / (N * n i )) mod n 6 a 0 = tid mod N 7 Define intra-switch = (0 - a K, a K-1, …, a 1, a 0 ) 8 Connect(server, intra-switch) 9 For i = 1 to K 10 If ((tid – 2 i-1 + 1) mod 2 i == 0) 11 Define inter-switch (u - b K-u, …, b i, …, b 0 ) 12 u = i 13 For j = i to (K - 1) 14 b j = (tid / (N * n j-1 )) mod n 15 b 0 = (tid / 2 u ) mod (N / n * (n/2) u ) 16 Connect(server, inter-switch) 17 } The key: work out the level of the outgoing link of this server
16
Building Algorithm 16 Nnututu 16 24096 24 213824 32 232768 16 365536 24 3331776 32 31048576 Millions of servers
17
Totoro Routing Algorithm (TRA) Basically, Not Fault-tolerant Totoro Broadcast Domain (TBD) Detect & Share link states Totoro Fault-tolerant Routing (TFR) TRA + Dijkstra algorithm (Based on TBD) 17
18
Totoro Routing Algorithm (TRA) 18 Divide & Conquer algorithm Path from src to dst?
19
19 Step 1: src and dst belong to two different partitions respectively Totoro Routing Algorithm (TRA)
20
20 Step 2: Take a link between these two partitions
21
Totoro Routing Algorithm (TRA) 21 m and n are the intermediate servers The intermediate path is from m to n
22
Totoro Routing Algorithm (TRA) 22 Step 3: src(dst) and m(n) are in the same basic partition, just return the directed path
23
Totoro Routing Algorithm (TRA) 23 Step 3: Otherwise, return to Step 1 to work out the path from src(dst) to m(n)
24
Totoro Routing Algorithm (TRA) 24 Step 4: Join the P(src, m), P(m, n) and P(n, dst) for a full path
25
Totoro Routing Algorithm (TRA) 25 The performance of TRA is close to the SP under the conditions of different sizes. Simple & Efficient Nnututu MuMu TRA Shortest Path Algorithm MeanStdDevMeanStdDev 24 157664.361.034.361.03 32 1102464.401.004.391.00 48 1230464.430.964.430.96 24 213824107.611.567.391.32 32 232768107.681.507.451.26 The mean value and standard deviation of path length in TRA and SP Algorithm in Totoro u of different sizes. M u is the maximum distance between any two servers in Totoro u. t u indicates the total number of servers
26
Totoro Broadcast Domain (TBD) 26 Fault-tolerance ⇒ Detect and share link states Time cost & CPU load ⇒ Global strategy is impossible Divide Totoro into several TBDs Green: inner-server Yellow: outer-server
27
Totoro Fault-tolerant Routing (TFR) 27 Two strategies: Dijkstra algorithm within TBD TRA between TBDs Proxy: a temporary destination Next hop: the next server on P(src, proxy/dst)
28
Totoro Fault-tolerant Routing (TFR) 28 If the proxy is unreachable
29
Totoro Fault-tolerant Routing (TFR) 29 Reroute the packet to another proxy by using local redundant links
30
Evaluating Path Failure Totoro vs. Shortest Path Algorithm( Floyd-Warshall ) Evaluating Network Structure Totoro vs. Tree-based structure, Fat-Tree, DCell & FiConn 30
31
Evaluating Path Failure 31 Types of failures Link, Node, Switch & Rack failures Comparison TFR vs. SP Platform Totoro 1 (N=48, n=48, K=1, t K =2,304 servers) Totoro 2 (N=16, n=16, K=2, t K =4,096 servers) Failures ratios 2% - 20% Communication mode All-to-all Simulation times 20 times
32
Evaluating Path Failure 32 Path failure ratio vs. node failure ratio. The performance of TFR is almost identical to that of SP Maximize the usage of redundant links when a node failure occurs
33
Evaluating Path Failure 33 Path failure ratio vs. link failure ratio. TFR performs well when the link failure ratio is small (i.e., <4%). The performance gap between TFR and SP becomes larger and larger. Not global optimal Not guaranteed to find out an existing path A huge performance improvement potential
34
Evaluating 34 Path failure ratio vs. switch failure ratio. TFR performs almost as well as SP in Totoro 1 The performance gap between TFR and SP becomes larger and larger in the same Totoro 2
35
Evaluating Path Failure 35 Path failure ratio vs. switch failure ratio. Path failure ratio of SP is lower in a larger-level Totoro More redundant high-level switches help bypass the failure
36
Evaluating Path Failure 36 Path failure ratio vs. rack failure ratio. In a low-level Totoro, TFR achieves results very close to SP. The capacity of TFR in a relative high-level Totoro can be improved.
37
Evaluating Network Structure 37 Low degree Approaches to but never reach 2 Lower degree ⇒ Lower deployment and maintenance overhead. StructureDegreeDiameter Bisection Width Tree--2log d-1 T1 Fat-Tree--2log 2 TT/2 DCellk + 1<2log n T-1T/4long n T FiConn2 – 1/2 k O(logT)O(T/logT) Totoro2 – 1/2 k O(T)T/2 k+1 N: the number of ports on an intra-switch n:the number of ports on an inter-switch T : the total number of servers. For Totoro, there is
38
Evaluating Network Structure 38 Relative large diameter Smaller diameter ⇒ More efficient routing mechanism In practice, the diameter of a Totoro 3 with 1M servers is only 18. This can be improved. StructureDegreeDiameter Bisection Width Tree--2log d-1 T1 Fat-Tree--2log 2 TT/2 DCellk + 1<2log n T-1T/4long n T FiConn2 – 1/2 k O(logT)O(T/logT) Totoro2 – 1/2 k O(T)T/2 k+1
39
Evaluating Network Structure 39 Large bisection width Large bisection width ⇒ Fault-tolerant & Resilient Take a small number of k, the bisection width is large. BiW=T/4, T/8, T/16 when k = 1, 2, 3. StructureDegreeDiameter Bisection Width Tree--2log d-1 T1 Fat-Tree--2log 2 TT/2 DCellk + 1<2log n T-1T/4long n T FiConn2 – 1/2 k O(logT)O(T/logT) Totoro2 – 1/2 k O(T)T/2 k+1
40
Scalability: Millions of servers & Open structure Fault-tolerance: Structure & Routing mechanism Low cost: Two-port servers & Commodity switches High capacity: Multi-redundant links Totoro is a viable interconnection solution for data centers! 40
41
Fault-tolerance: Structure How to be more resilient? Routing under complex failures: More robust rerouting techniques? Network capacity Data locality: Mapping between servers and switches? Data storage allocation policies? 41
42
42 NPC 2013: The 10th IFIP International Conference on Network and Parallel Computing. October 2, 2015. Guiyang, China.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.