Download presentation
Presentation is loading. Please wait.
Published byBarnaby McKenzie Modified over 9 years ago
1
Pipelined Broadcast on Ethernet Switched Clusters Pitch Patarasuk, Ahmad Faraj, Xin Yuan Department of Computer Science Florida State University Tallahassee, FL 32306
2
Broadcast communication(MPI_Bcast) n0n0 n1n1 n2n2 n3n3 n0n0 n1n1 n2n2 n3n3 Before After ABCD ABCDABCDABCDABCD Let T(msize) = time to send a message of size msize Broadcast(msize) >= T(msize)
3
Ethernet Switched Cluster switch
4
Problem statement: How to efficiently realize the broadcast operation with large message sizes on Ethernet switched clusters. Using pipelined broadcast can achieve near optimal results (T(msize) time for broadcasting a message of size msize). Finding contention free broadcast tree Finding a good segment size
5
Traditional Broadcast algorithms 01234567 Linear tree 1234567 Flat tree 0 Time = (P-1) x T(msize)
6
0 12 3456 7 Binary tree 0 123 4567 k-ary tree Time = 2x(log 2 (P+1)-1)xT(msize)
7
0 42 65 1 3 7 Binomial tree Time = log 2 P x T(msize)
8
Scatter/Allgather n0n0 n1n1 n2n2 n3n3 Before ABCD ABCD Scatter Allgather ABCDABCDABCDABCD Time = 2 x T(msize)
9
Time Complexity for large messages Linear tree(P-1) x T(msize) Flat tree(P-1) x T(msize) Binary tree2x(log 2 (P+1)-1)xT(msize) Approx. 2xlog 2 P x T(msize) Binomial treelog 2 P x T(msize) Scatter/allgather2xT(msize)
10
Pipelined Broadcast Algorithm Linear pipeline 0123
11
Performance of pipelined broadcast: Assume no network contention a message of size msize be broken into X messages of msize/X. H: tree hight, D: the number of children Size of pipelined stage: D * T(msize/X) Total time T: (X + H –1) * (D * T(msize /X)) linear tree: H = P, D = 1, T = T(msize) Binary tree: H = log(P), D= 2, T = 2T(msize) K-ary tree: H = log_k(P), D = k, in general not as efficient as binary tree.
12
Time Complexity for large messages Pipelined (linear)T(msize) Pipelined (binary)2 x T(msize) k-ary pipelinek x T(msize) Binomial treelog 2 P x T(msize) Scatter/allgather2xT(msize)
13
Pipelined broadcast How to find a contention-free broadcast tree? How to select the best segment size?
14
Example of network contention 0 12 3456 7 Binary tree switch n 0,n 1,n 2,n 3 n 4,n 5,n 6,n 7 There is a link contention cause by communication (1 4), (2 5), (2 6), and (3 7)
15
Linear tree switch n 0,n 1,n 4,n 5 n 2,n 3,n 6,n 7 The linear tree 0 1 2 3 … 7 will have a contention caused by (1 2) and (5 6)
16
Algorithm for constructing contention free linear tree Step 1: Traverse through all switches using depth-first-search (DFS) algorithm, name the switch by the order of their arrival in DFS tree Step 2: The linear tree consists of all machines in switch S 0, follows by all machines in S 1, then S 2,and so on
17
Example of contention free linear tree Switch S0 Switch S1 n 0,n 1,n 4,n 5 n 2,n 3,n 6,n 7 Switch S3 Switch S2 n 12,n 13,n 14,n 15 n 8,n 9,n 10,n 11 Linear tree: n0 n1 n4 n5 2 3 6 7 8 9 … 15
18
Algorithm for constructing contention free binary tree Start with a contention free linear tree Recursively divide the tree into 2 sub-trees Make sure that the cannot be a contention The sub-trees are chosen such that the height of the whole tree will be minimal 0123456789101112131415
19
Binary tree height Performance of binary pipeline broadcast depends on the height of a binary tree Even though contention free binary tree may not be a complete binary tree, its height is not that much more than a complete binary tree
20
Average tree heights for 20 randomly generated topologies
21
Evaluation Contention free pipelined algorithms: Routine generators from topology information The generated routines are based on MPICH p2p primitives. Linear tree Binary tree 3-nary tree Targets for comparison: MPICH: Binomial tree, Scatter/allgather LAM: Flat-tree, Binomial Topology unaware pipelined linear and binary algorithms
22
Evaluation
23
Performance of different pipelined trees (topology 1)
24
Comparing pipelined broadcast with other schemes
25
Topology unaware and contention-free pipelined broadcast
26
Segment size for pipelined broadcast
27
Conclusions Pipelined broadcast is faster than the current broadcast algorithm for medium and large messages Linear pipeline has a completion time roughly equal to T(msize) binary pipeline broadcast is best for medium messages Contention free broadcast tree is necessary for pipelined algorithms A good segment size for pipelined broadcast is not difficult to find.
28
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.