Presentation is loading. Please wait.

Presentation is loading. Please wait.

Arun Kejariwal Paolo D’Alberto Alexandru Nicolau Paolo D’Alberto Alexandru Nicolau Constantine D. Polychronopoulos A Geometric Approach for Partitioning.

Similar presentations


Presentation on theme: "Arun Kejariwal Paolo D’Alberto Alexandru Nicolau Paolo D’Alberto Alexandru Nicolau Constantine D. Polychronopoulos A Geometric Approach for Partitioning."— Presentation transcript:

1 Arun Kejariwal Paolo D’Alberto Alexandru Nicolau Paolo D’Alberto Alexandru Nicolau Constantine D. Polychronopoulos A Geometric Approach for Partitioning N-Dimensional Non-Rectangular Iteration Spaces 1 Center for Embedded Computer Systems University of California at Irvine Center for Supercomputing Research and Development University of Illinois at Urbana-Champaign 211 1 2

2 2 Outline  Introduction  Terminology  Motivation  Problem statement  Uniform Partitioning  Processor Allocation  Our Approach  Experimental Results  Conclusion

3 3 Introduction  Scientific and numerical Applications  Computation intensive  Large amounts of parallelism  Multiprocessor systems  Exploit parallelism  Expose high-level loop parallelism  Loop spreading  Minimize communication overhead  Minimize the number of processors

4 4 Terminology Index point do i = 1, N do j = 1, N H(i, j) enddo 1,1 i j Iteration Space ( Γ ) ( Γ ) (5,5) (2,5) * Notation used in “Loop Transformations for Restructuring Compilers” [Banerjee’93] *

5 5 Motivating Example do i 1 = 1, N do i 2 = 1, i 1 do i 3 = 1, N H(i 1, i 2, i 3 ) end do i1i1 i2i2 i3i3 N = 6 Top View (i 1 – i 2 plane) : Triangular geometry Front View (i 1 – i 3 plane) : Rectangular geometry

6 6 Motivating Example 1,1 i1i1 i2i2 Top View Assume P = 3 S2S2 S1S1 S3S3 S1S1 S2S2 S3S3 Contiguous partitioning Non-contiguous partitioning Load imbalance Perfect load balance Multiple loops per set Loss of locality

7 7 Motivating Example Assume P = 3 1,1 i1i1 i3i3 Front View Loop permutation-based contiguous partitioning Perfect load balance Remapping of index expressions Finding a permutation for uniform partitioning is non-trivial

8 8 Motivating Example 1,1 i1i1 i2i2 Top View P = 4 S2S2 S1S1 S3S3 S4S4 1,1 i1i1 i2i2 P = 5 S3S3 S1S1 S4S4 S5S5 S2S2 Processor Allocation during Iteration Space Partitioning

9 9 Previous Work  Cyclic Partitioning  False sharing  Balanced Chunk Scheduling [Haghighat92]  Restricted to double loops  Canonical loop partitioning [Sakellariou96]  Non-contiguous partitioning  Communication minimization [Dion96, Koziris97] Do not address Processor Allocation

10 10 Our Model do i 1 = 1, N, s 1 do i 2 = f 1 (i 1 ), g 1 (i 1 ), s 2 · do i n = f n-1 (i 1, i 2, …, i n-1 ), g n-1 (i 1, i 2, …, i n-1 ), s n LOOP BODY enddo · enddo A Perfectly Nested DOALL Loop Non-Rectangular Iteration Spaces f r (i 1, i 2, …, i r-1 ) = a r0 + a r1 i 1 + … + a r(r-1) i r-1 g r (i 1, i 2, …, i r-1 ) = a r0 + a r1 i 1 + … + a r(r-1) i r-1 f r ≤ g r

11 11 Problem Statement 1,1 i N,1 Input : N-dimensional Iteration Space ( Γ ) P processors P processors j P1P1 P2P2 P Output : P partitions with “uniform” load Outermost Loop

12 12 Problem Statement I Uniform Partitioning Given : A partition with minimum execution time Objective : Minimize the number of processors for the given partition while maintaining the given partition while maintaining the performance the performance II Processor Allocation Given: An iteration space Γ and P processors Objective: Find a contiguous partition with uniform load across different processors load across different processors

13 13 Our Approach Basic Idea  Model the iteration space as a convex polytope  Partition the polytope into sets of equal volumes  Equal volumes Ξ Uniform distribution of index points  Each set of the partition is mapped to a different processor.

14 14 Our Approach 1: Compute the total volume V of Γ do i = 1, N do j = 1, i do k = 1, j LOOP BODY enddo 1,1,1 i j k N = 7 7,7,7

15 15 1,1,1 i j k Our Approach 2: Compute a partial volume V(x) of Γ 7,7,7 x 1,1,1 i j k P = 3 Each set has equal volume 7,7,7 3: Determine the breakpoints, for 1≤k≤ P-1 γkγkγkγk γ1γ1 γ2γ2

16 16 Our Approach 4: Eliminate void sets P = 5 1,1 i1i1 i2i2 S3S3 S1S1 S4S4 S2S2 Eliminate  Minimizes the number of processors  Size of the largest set remains constant

17 17 Our Approach 5: Determine loop bounds 1,1 i1i1 i2i2 S3S3 S1S1 S4S4 S2S2 γ1γ1γ1γ1 γ2γ2γ2γ2 γ3γ3γ3γ3 Given the breakpoints, compute lb i, ub i γkγkγkγk (lb 1, ub 1 ) = (1, 3) (lb 2, ub 2 ) = (4, 4) (lb 3, ub 3 ) = (5, 5) (lb 4, ub 4 ) = (6, 6) 6,1

18 18  Applications – Numerical packages (LINPACK etc.) and literature and literature  Platform – 4-way shared-memory multiprocessor  Problem size – N =1000 Results VOL : Our volume-based approach CAN : Canonical loop partitioning Setup

19 19 Results (contd.) Performance comparison Highlights : a) Yields better performance b) A generic approach b) A generic approach  Number of index points in the largest set # of Processors L1L1L1L1 L2L2L2L2 L3L3L3L3 L4L4L4L4 VOLCAN8336828483596878418100444380739321003180223977601073802411538395 VOLCAN4760005160002199002250371095001120565500057232 VOLCAN200000NA100000NA50000NA25000NA VOLCAN25000NA12500NA6250NA3150NA Loop Nest 2 4 8 16

20 20 Conclusions Geometric approach for Iteration Space Partitioning Geometric approach for Iteration Space Partitioning  Load balancing  Processor Allocation More general than existing techniques More general than existing techniques Future Work Run-time Partitioning Run-time Partitioning

21 21 Results (contd.) Performance comparison Highlights : a) Yields better performance b) A generic approach b) A generic approach  Number of index points in the largest set


Download ppt "Arun Kejariwal Paolo D’Alberto Alexandru Nicolau Paolo D’Alberto Alexandru Nicolau Constantine D. Polychronopoulos A Geometric Approach for Partitioning."

Similar presentations


Ads by Google