Download presentation
Presentation is loading. Please wait.
1
CS 584
2
Algorithm Analysis Assumptions n Consider ring, mesh, and hypercube. n Each process can either send or receive a single message at a time. n No special communication hardware. n When discussing a mesh architecture we will consider a square toroidal mesh. n Latency is t s and Bandwidth is t w
3
Basic Algorithms n Broadcast Algorithms u one to all (scatter) u all to one (gather) u all to all n Reduction u all to one u all to all
4
Broadcast (ring) n Distribute a message of size m to all nodes. source
5
Broadcast (ring) n Distribute a message of size m to all nodes. n Start the message both ways source 1 2 2 3 3 4 4 T = (t s + t w m)(p/2)
6
Broadcast (mesh)
7
Broadcast to source row using ring algorithm
8
Broadcast (mesh) Broadcast to source row using ring algorithm Broadcast to the rest using ring algorithm from the source row
9
Broadcast (mesh) Broadcast to source row using ring algorithm Broadcast to the rest using ring algorithm from the source row T = 2(t s + t w m)(p 1/2 /2)
10
Broadcast (hypercube)
11
A message is sent along each dimension of the hypercube. Parallelism grows as a binary tree. 1 2 2 3 3 3 3
12
Broadcast (hypercube) A message is sent along each dimension of the hypercube. Parallelism grows as a binary tree. 1 2 2 3 3 3 3 T = (t s + t w m)log 2 p
13
Broadcast n Mesh algorithm was based on embedding rings in the mesh. n Can we do better on the mesh? n Can we embed a tree in a mesh? u Exercise for the reader. (-: hint, hint ;-)
14
Other Broadcasts n Many algorithms for all-to-one and all-to-all communication are simply reversals and duals of the one-to-all broadcast. n Examples u All-to-one F Reverse the algorithm and concatenate u All-to-all F Butterfly and concatenate
15
Reduction Algorithms n Reduce or combine a set of values on each processor to a single set. u Summation u Max/Min n Many reduction algorithms simply use the all-to-one broadcast algorithm. u Operation is performed at each node.
16
Reduction n If the goal is to have only one processor with the answer, use broadcast algorithms. n If all must know, use butterfly. u Reduces algorithm from 2log p to log p
17
How'd they do that? n Broadcast and Reduction algorithms are based on Gray code numbering of nodes. n Consider a hypercube. 000 100 010 001 110 101 111 011 Neighboring nodes differ by only one bit location. 01 2 3 4 5 67
18
How'd they do that? n Start with most significant bit. n Flip the bit and send to that processor n Proceed with the next most significant bit n Continue until all bits have been used.
19
Procedure SingleNodeAccum(d, my_id, m, X, sum) for j = 0 to m-1 sum[j] = X[j]; mask = 0 for i = 0 to d-1 if ((my_id AND mask) == 0) if ((my_id AND 2 i ) <> 0 msg_dest = my_id XOR 2 i send(sum, msg_dest) else msg_src = my_id XOR 2 i recv(sum, msg_src) for j = 0 to m-1 sum[j] += X[j] endif mask = mask XOR 2 i endfor end
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.