Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE 2014 - The 45 th ACM Technical Symposium on Computer Science.

Similar presentations


Presentation on theme: "1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE 2014 - The 45 th ACM Technical Symposium on Computer Science."— Presentation transcript:

1 1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE 2014 - The 45 th ACM Technical Symposium on Computer Science Education Saturday March 8, 2014, 9:00 am - 10:15 am Dr. Clayton Ferner University of North Carolina Wilmington

2 Paraguin Compiler Create a similar abstraction as OpenMP for creating MPI code Uses pragma statements Allows for easy hybrid compilation Source-to-source compiler User can inspect and modify resulting MPI code 2

3 Example 1 (Monte Carlo Estimation of PI) int main(int argc, char *argv[]) { char *usage = "Usage: %s N\n"; int i, error = 0, count, count_tmp, total; double x, y, result; … total = atoi(argv[1]); #pragma paraguin begin_parallel #pragma paraguin bcast total 3 count = 0; srandom(…); for (i = 0; i < total; i++) { x = ((double) random()) / RAND_MAX; y = ((double) random()) / RAND_MAX; if (x*x + y*y <= 1.0) { count++; } ; #pragma paraguin reduce sum count count_tmp #pragma paraguin end_parallel result = 4.0 * (((double) count_tmp) / (__guin_NP * total)); Parallel Region Broadcast input Reduce Partial Results End Parallel Region

4 Example 2 (Matrix Addition) int main(int argc, char *argv[]) { int i, j, error = 0; double A[N][N], B[N][N], C[N][N]; char *usage = "Usage: %s file\n"; … // Read input matrices A and B #pragma paraguin begin_parallel // Scatter the input to all processors. #pragma paraguin scatter A B 4 // Parallelize the following loop nest assigning // iterations of the outermost loop (i) to different // partitions. #pragma paraguin forall for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { C[i][j] = A[i][j] + B[i][j]; } ; #pragma paraguin gather C #pragma paraguin end_parallel … // Process Results Parallel Region Scatter input Gather Partial Results End Parallel Region Forall

5 Compilation and Running $ scc -D__x86_64__ -cc mpicc montecarlo.c -o montecarlo.out $ mpirun -np 12 montecarlo.out 10000000 Estimation of PI = 3.141623 5 Paraguin Source Code w/ pragmas mpicc gcc w/ OpenMP Hybrid Executable Creating a Hybrid Program

6 Compile Web Page http://babbage.cis.uncw.edu/~cferner/co mpoptions.html Upload your source code Compile Download the resulting MPI source code and compiler log messages 6

7 Implemented Patterns 7 Scatter/Gather Stencil

8 Scatter/Gather Monte Carlo and Matrix Addition are examples of Scatter/Gather The scatter/gather pattern can also use either broadcast or reduction or both Done as a template: –Master prepares input –Scatter/Broadcast input –Compute partial results –Gather/Reduce partial results into the final result 8

9 Stencil #define TOTAL_TIME 3000 #define N 200 #define M 200 double computeValue (double A[][M], int i, int j) { return (A[i-1][j] + A[i+1][j] + A[i][j-1] + A[i][j+1]) * 0.25; } int main(int argc, char *argv[]) { int i, j,n, m, max_iterations, done; double A[2][N][M]; 9 … // Initialize input A #pragma paraguin begin_parallel n = N; m = M; max_iterations = TOTAL_TIME; ; #pragma paraguin stencil A n m \ max_iterations computeValue #pragma paraguin end_parallel Stencil Pattern

10 The Stencil Pragma is Replaced with Code to do: 1.The array given as an argument to the stencil pragma is broadcast to all available processors. 2.A loop is created to iterate max_iteration number of times. Within that loop, code is inserted to perform the following steps: a.Each processor (except the last one) will send its last row to the processor with rank one more than its own rank. b.Each processor (except the first one) will receive the last row from the processor with rank one less than its own rank. c.Each processor (except the first one) will send its first row to the processor with rank one less than its own rank. d.Each processor (except the last one) will receive the first row from the processor with rank one more than its own rank. e.Each processor will iterate through the values of the rows for which it is responsible and use the function provided compute the next value. 3.The data is gathered back to the root processor (rank 0). 10

11 Future Work 11

12 Acknowledgements Extending work to teaching environment supported by the National Science Foundation under grant "Collaborative Research: Teaching Multicore and Many-Core Programming at a Higher Level of Abstraction" #1141005/1141006 (2012-2015). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Work initiated by Jeremy Villalobos in his PhD thesis “Running Parallel Applications on a Heterogeneous Environment with Accessible Development Practices and Automatic Scalability,” UNC-Charlotte, 2011. Jeremy developed “Seeds” pattern programming software. 12


Download ppt "1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE 2014 - The 45 th ACM Technical Symposium on Computer Science."

Similar presentations


Ads by Google