Pipeline Computations Increasing Throughput By Using More Processes
Original Pipeline P2P6P4P5P3P1 time P2P6P4P5P3P1
latency throughput 24 sec 1 every 8 sec
P2P6P4P5P3P1
Add More Processes P2-1 & P2-2P6 P4-1, P4-2 & P4-3 P5-1, P5-2, P5-3 & P5-4 P3P P2 takes twice as long as the fastest, so make 2 of those. P4 three times as long, so make 3. P5 four times as long, so make 4 of those.
Add More Processes P2P6P4P5P3P Does half the work
Add More Processes 2 4 P2-1P1 4 Gets every second message (the even ones) Gets every second message (the odd ones) P2-2
Add Dispersers and Collectors 2 4 P2-1 P1 4 DisperserCollector P1 P2-2 P2-1P2-2 2 P3
New Pipeline P1 P2 P3 P4 P5 P6 Time = 0
New Pipeline Can move from P1 to P2-1 P1 P2 P3 P4 P5 P6 Time = 2
New Pipeline Can move from P1 to P2-2 P1 P2 P3 P4 P5 P6 Time = 4
New Pipeline Can move from P1 to P2-1 Can move from P2-1 to P3 P1 P2 P3 P4 P5 P6 Time = 6
New Pipeline Can move from P1 to P2-2 Can move from P2-2 to P3 Can move from P3 to P4-1 P1 P2 P3 P4 P5 P6 Time = 8
New Pipeline Can move from P1 to P2-1 Can move from P2-1 to P3 Can move from P3 to P4-2 P1 P2 P3 P4 P5 P6 Time = 10
New Pipeline Can move from P1 to P2-2 Can move from P2-2 to P3 Can move from P3 to P4-3 P1 P2 P3 P4 P5 P6 Time = 12
New Pipeline Can move from P1 to P2-1 Can move from P2-1 to P3 Can move from P3 to P4-1 Can move from P4-1 to P5-1 P1 P2 P3 P4 P5 P6 Time = 14
New Pipeline Can move from P4-2 to P5-2 Can move from P1 to P2-2 Can move from P2-2 to P3 Can move from P3 to P4-2 P1 P2 P3 P4 P5 P6 Time = 16
New Pipeline Time = 18 Can move from P4-3 to P5-3 Can move from P2 to P3 Can move from P3 to P4-3
New Pipeline Time = 20 Can move from P4-1 to P5-4 Can move from P2 to P3 Can move from P3 to P4-1
New Pipeline Time = 22 Can move from P3 to P4-2 Can move from P4-2 to P5-1 Can move from P5-1 to P6
New Pipeline Time = 24 Can move from P4-3 to P5-2 Can move from P5-2 to P6
New Pipeline Time = 26 Can move from P4-1 to P5-3 Can move from P5-3 to P6
New Pipeline Time = 28 Can move from P4-2 to P5-4 Can move from P5-4 to P6
New Pipeline Time = 30 Can move from P5-1 to P6
New Pipeline Time = 32 Can move from P5-2 to P6
New Pipeline Time = 34 Can move from P5-3 to P6
New Pipeline Time = 36 Can move from P5-4 to P6
Result Latency: 24 secs. (same as before) Throughput: 1 every 2 secs. (1 every 8 secs. before) Input rate: 1 every 2 secs. (1 every 8 secs. before)
Simulation in MPI void proc(int delay, int from, int too) { while (true) { MPI_Recv(…, from, …); // work for ‘delay’ secs. MPI_Send(…, to, …); } A process:
Simulation in MPI void disperser(int noProc, int from, int procs[]) { while (true) { for (i=0;i<noProc; i++) { MPI_Recv(…, from, …); MPI_Send(…, procs[i], …); } A disperser:
Simulation in MPI void collector(int noProc, int procs[], int to) { while (true) { for (i=0;i<noProc; i++) { MPI_Recv(…, procs[i], …); MPI_Send(…, to, …); } A collector:
New Pipeline Diagram
New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break;
New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break;
New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break;
New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break;
New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break;
New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break; 7: disperser(3, 7, [8,9,10]); break;
New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break; 7: disperser(3, 7, [8,9,10]); break; 8,9,10: proc(6, 7, 11); break;
New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break; 7: disperser(3, 7, [8,9,10]); break; 8,9,10: proc(6, 7, 11); break; 11: collector(3, [8,9,10], 12); break;
New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break; 7: disperser(3, 7, [8,9,10]); break; 8,9,10: proc(6, 7, 11); break; 11: collector(3, [8,9,10], 12); break; 12: disperser(4, 11, [13,14,15,16]); break;
16 New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break; 7: disperser(3, 7, [8,9,10]); break; 8,9,10: proc(6, 7, 11); break; 11: collector(3, [8,9,10], 12); break; 12: disperser(4, 11, [13,14,15,16]); break; 13,14,15,16: proc(8,12,17); break; 16
New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break; 7: disperser(3, 7, [8,9,10]); break; 8,9,10: proc(6, 7, 11); break; 11: collector(3, [8,9,10], 12); break; 12: disperser(4, 11, [13,14,15,16]); break; 13,14,15,16: proc(8,12,17); break; 17: collector(4, [13,14,15,16], 18); break;
New Pipeline Diagram switch (rank) { 1: proc(2, input, 2); break; 2: disperser(2, 1, [3,4]); break; 3,4: proc(4, 2, 5); break; 5: collector(2, [3,4], 6); break; 6: proc(2, 5, 7); break; 7: disperser(3, 7, [8,9,10]); break; 8,9,10: proc(6, 7, 11); break; 11: collector(3, [8,9,10], 12); break; 12: disperser(4, 11, [13,14,15,16]); break; 13,14,15,16: proc(8,12,17); break; 17: collector(4, [13,14,15,16], 18); break; 18: proc(2, 17, output); break; }
Improvement ? Theory: We went from 1 packet every 8 seconds to 1 every 2 seconds Practice: Run the program and see
Original Pipeline Results donald-duck.cs.unlv.edu(5) mpirun -np 8 pipe 10 Packet 0 - Transport Time: Elapsed since last packet: Packet 1 - Transport Time: Elapsed since last packet: Packet 2 - Transport Time: Elapsed since last packet: Packet 3 - Transport Time: Elapsed since last packet: Packet 4 - Transport Time: Elapsed since last packet: Packet 5 - Transport Time: Elapsed since last packet: Packet 6 - Transport Time: Elapsed since last packet: Packet 7 - Transport Time: Elapsed since last packet: Packet 8 - Transport Time: Elapsed since last packet: Packet 9 - Transport Time: Elapsed since last packet: Latency : Total Time : Average Transport Time: Average Rate : 91072
Improved Pipeline Results donald-duck.cs.unlv.edu(6) mpirun -np 20 pipe2 10 Packet 0 - Transport Time: Elapsed since last packet: Packet 1 - Transport Time: Elapsed since last packet: Packet 2 - Transport Time: Elapsed since last packet: Packet 3 - Transport Time: Elapsed since last packet: Packet 4 - Transport Time: Elapsed since last packet: Packet 5 - Transport Time: Elapsed since last packet: Packet 6 - Transport Time: Elapsed since last packet: Packet 7 - Transport Time: Elapsed since last packet: Packet 8 - Transport Time: Elapsed since last packet: Packet 9 - Transport Time: Elapsed since last packet: Latency : Total Time : Average Transport Time: Average Rate : 29991
Results Original (1000 packets): Latency : 296,258 Total Time : 90,209,466 Average Transport Time: 569,236 Average Rate : 89,986 Improved Pipeline (1000 packets): Latency : 293,727 Total Time : 30,264,697 Average Transport Time: 329,841 Average Rate : 29,974