Computer Science 320 Broadcasting
Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri + d ic )
Floyd’s Algorithm on Cluster Root node reads distance matrix from input file and scatters row slices to other nodes Other nodes compute distances and update their slices The slices are gathered back to the root node for output
Parallel I/O File Pattern Eliminate the gather of data by having each node write its slice to a separate file Eliminate the scatter of data by having each node read its slice from the input file
Execution Timeline
Sharing Data in Computation On each pass through the outer loop, the ith row must be available to all of the processes (they all execute the same line of code in the inner loop) They can do this in SMP because they share the entire matrix They can’t do this in a cluster setup, because they don’t share for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri + d ic )
Share Row via a Broadcast Message The process that owns a row broadcasts it before the parallel loop is run, on each pass through the outer loop Process that owns the row acts as the root for the broadcast, setting up the source buffer The other processes set up a destination buffer Broadcast also enforces synchronization; they all wait for the broadcast for i = 0 to n – 1 broadcast row i of d parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri + d ic )
// Allocate storage for row broadcast from another process. row_i = new double [n]; row_i_buf = DoubleBuf.buffer (row_i); int i_root = 0; for (int i = 0; i < n; ++ i){ double[] d_i = d[i]; // Determine which process owns row i. if (! ranges[i_root].contains(i)) ++ i_root; // Broadcast row i from owner process to all processes. if (rank == i_root) world.broadcast(i_root, DoubleBuf.buffer (d_i)); else{ world.broadcast(i_root, row_i_buf); d_i = row_i; } // Inner loops over rows in my slice and over all columns. for (int r = mylb; r <= myub; ++ r){ double[] d_r = d[r]; for (int c = 0; c < n; ++ c) d_r[c] = Math.min (d_r[c], d_r[i] + d_i[c]); }
Problem: Too Many Messages The amount of time spent in communication is too high when compared to the time spent in computation