Download presentation
Presentation is loading. Please wait.
1
12e.1 More on Parallel Computing UNC-Wilmington, C. Ferner, 2007 Mar 21, 2007
2
12e.2 Block Mapping (Review) blksz = (int)ceil((float)N / P); for (i = lb + my_rank * blksz; i < min(N, lb + (my_rank + 1) * blksz); i++) {... } (lb is the lower bound of the original loop)
3
12e.3 Example for (i = 1; i < N; i++) { for (j = 0; j < N; j++) { a[i][j] += f(a[i-1][j]); }
4
12e.4 Example 0,00,10,20,3 0,N-1... 1,0 1,1 1,2 1,31,N-1... 2,02,12,22,3 2,N-1... N-1,0N-1,1N-1,2N-1,3N-1,N-1... j i
5
12e.5 Example If we mapped iterations of the i loop to processors, the dependencies cross processors boundaries Therefore interprocessor communication would be required
6
12e.6 N-1,N-1 Example 0,00,10,20,3 0,N-1... 1,0 1,1 1,2 1,31,N-1... 2,02,12,22,3 2,N-1... N-1,0N-1,1N-1,2N-1,3... PE 0 : PE 1 : PE 2 : PE P :
7
12e.7 Example A better solution would be to map iterations of the j loop to processors
8
12e.8 N-1,N-1 Example 0,00,10,20,3 0,N-1... 1,0 1,1 1,2 1,31,N-1... 2,02,12,22,3 2,N-1... N-1,0N-1,1N-1,2N-1,3... PE 0 : PE 1 : PE 2 : PE 3 :
9
12e.9 Example for (i = 1; i < N; i++) { for (j = my_rank * blksz; i < min(N, (my_rank + 1) * blksz); i++) { a[i][j] += f(a[i-1][j]); }
10
12e.10 Block Mapping (Review) blksz = (int)ceil((float)N / P); for (i = lb + my_rank * blksz; i < min(N, lb + (my_rank + 1) * blksz); i++) {... } (lb is the lower bound of the original loop)
11
12e.11 Block Mapping
12
12e.12 Block Mapping The problem is that block mapping can lead to a load imbalance Example, let N=26, P=6 blksz = ceiling(26/6) = 5 (lb = 0)
13
12e.13 Block Mapping Processors 0-4 have 5 iterations of work Processor 5 has 1 iteration
14
12e.14 Cyclic Mapping An alternative to block mapping is cyclic mapping This is where each iteration is assigned to each processors in a round robin fashion This leads to a better load balance
15
12e.15 Cyclic Mapping Processors 0-2 have 6 iterations of work Processor 3-6 have only 5, but it is only 1 iteration fewer!
16
12e.16 Cyclic Mapping for (i = lb + my_rank; i < N; i += P) {... } (lb is the lower bound of the original loop)
17
12e.17 Cyclic Mapping Conceptually, this is an easier mapping to implement than block mapping It leads to better load balancing However, it can (and often does) lead to more communication Suppose that each iteration in the above example is dependent on the previous iteration
18
12e.18 Cyclic Mapping A message is sent from iteration 0 to 1, from 1 to 2, from 2 to 3, from 3 to 4, from 4 to 5, from 5 to 6,...
19
12e.19 Block Mapping With block mapping, only messages are sent from iteration 5 to 6, from 11 to 12, from 17 to 18, and from 23 to 24
20
12e.20 Block vs Cyclic Block mapping increases the granularity and reduces overall communication (O(P)). However, it can lead to load imbalances (O(N/P)). Cyclic mapping decreases granularity and increases overall communication (O(N)). However, it improves load balance (O(1)). Block-Cyclic is a combination of the two
21
12e.21 Block-Cyclic Mapping Block-cyclic with N=26, P=6, and blksz=2 The load imbalance will be <= blksz
22
12e.22 Block-Cyclic Mapping (N, P, and blksz are given) nLayers = (int)ceil(((float)N)/(blksz*P)); for (layer = 0; layer < nLayers; layer++) { beginBlk = layer*blksz*N; for (i = beginBlk + mypid*blksz; i < min(N, beginBlk + (mypid + 1)*blksz); i++) {... }
23
12e.23 Block vs Cyclic Block-Cyclic is in between Block and Cyclic in terms of granularity, communication, and load balancing. Block and Cyclic are special cases of Block-Cyclic –Block = Block-Cyclic with blksz = ceiling(N/P) –Cyclic = Block-Cyclic with blksz = 1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.