Presentation is loading. Please wait.

Presentation is loading. Please wait.

Csinparallel.org Patternlets: A Teaching Tool for Introducing Students to Parallel Design Patterns Joel C. Adams Department of Computer Science Calvin.

Similar presentations


Presentation on theme: "Csinparallel.org Patternlets: A Teaching Tool for Introducing Students to Parallel Design Patterns Joel C. Adams Department of Computer Science Calvin."— Presentation transcript:

1 csinparallel.org Patternlets: A Teaching Tool for Introducing Students to Parallel Design Patterns Joel C. Adams Department of Computer Science Calvin College

2 csinparallel.org Overview Parallel and distributed computing (PDC) can be overwhelming to those new to the area Parallel design patterns provide a stable intellectual framework for organizing and understanding PDC concepts Patternlets are a teaching tool for introducing students to parallel design patterns.

3 csinparallel.org PDC Can Be Overwhelming: Hardware Shared-memory (multicore) multiprocessors – Tasks communicate via the shared memory – Today’s desktops, laptops, tablets, phones, … – Accelerators (GPUs, co-processors) are a special case Distributed-memory multiprocessors – Tasks communicate via message-passing – Older Beowulf clusters and supercomputers Heterogeneous multiprocessors – Newer Beowulf clusters and supercomputers – Distributed-memory systems w/shared-memory nodes

4 csinparallel.org Shared-memory communication systems – Thread languages (Java, C++11, …) – Thread libraries (POSIX threads, OpenMP, …) – GPU libraries (CUDA, OpenCL, OpenACC, …) Message-passing communication systems – Languages (Scala, Erlang, …) – Libraries (MPI) MPI+X for heterogeneous systems – MPI+OpenMP − MPI+OpenCL – MPI+CUDA− … PDC Can Be Overwhelming: Software

5 csinparallel.org Some Good News PDC has existed since the 1980s Over this interval, parallel practitioners have identified industry-standard best-practices for solving frequently occurring problems Researchers have organized these practices into collections called parallel design patterns – Johnson, Chen, et al. (UIUC group) – Kreutzer, Mattson, et al. (OPL / Berkeley group) – Textbooks: Structured Parallel Programming, McCool, Robison, Reinders Multcore and GPU Programming, Barlas

6 csinparallel.org Parallel Design Patterns Are the collective wisdom of 30+ years of experience by parallel practitioners – Originated in industry practice, not academia – Consistently useful, high-value practices – Not ephemeral Have been organized into hierarchies: – Provide a stable intellectual framework within which we can understand current and future parallel and distributed computing technologies

7 csinparallel.org Parallel Pattern Hierarchy Hierarchical classification – Computational Patterns: N-body, Monte Carlo, Graph Algorithms, Linear Algebra, MapReduce, … – Algorithmic Strategies: Data Decomposition, Task Decomposition, Pipeline, Pure Data Parallel, … – Implementation Strategies: SPMD, Fork-Join, Parallel Loop, Master-Worker, Actors, … – Concurrent Execution Patterns: Message-Passing, Reduction, Scatter, Gather, Broadcast, Barrier, Atomic, Mutual Exclusion, … A typical parallel program uses one or more patterns from each level Higher Level Lower Level

8 csinparallel.org Ex: Data Decomposition (1 task) Task 0

9 csinparallel.org Ex: Data Decomposition (2 Tasks) Task 0 Task 1

10 csinparallel.org Ex: Data Decomposition (4 Tasks) Task 0 Task 1 Task 2 Task 3

11 csinparallel.org Ex: Reduction (8 Tasks) To sum the local value-results of N parallel tasks: 89157624 Task: Time 01234567 Value: 1410126 + + + + 1 24 18 ++ 2 42 + 3

12 csinparallel.org Parallel Thinking Parallel experts think in terms of parallel design patterns, just as sequential experts think in terms of sequential design patterns The more we can do to help our students think in terms of parallel design patterns, the more like experts their thought processes will be! How can we get our students to think this way?

13 csinparallel.org Patternlets… … are minimalist, scalable, correct programs, each illustrating a particular pattern’s behavior: – Minimalist so that students can grasp the concept without non-essential details getting in the way – Scalable so that students can see how the behavior changes as the number of tasks is varied – Syntactically correct so that students can use it as a model for creating their own programs … are a tool for introducing students to patterns, that let them experiment with the patterns

14 csinparallel.org Example: The Parallel Loop Pattern

15 csinparallel.org /* parallelLoopEqualChunks.c (OpenMP) … */ #include // printf() #include // atoi() #include // OpenMP int main(int argc, char** argv) { const int REPS = 8; printf("\n"); if (argc > 1) { omp_set_num_threads( atoi(argv[1]) ); } #pragma omp parallel for for (int i = 0; i < REPS; i++) { int id = omp_get_thread_num(); printf("Thread %d performed iteration %d\n", id, i); } printf("\n"); return 0; }

16 csinparallel.org Sample Executions: parallelLoop $./parallelLoopEqualChunks 1 Thread 0 performed iteration 0 Thread 0 performed iteration 1 Thread 0 performed iteration 2 Thread 0 performed iteration 3 Thread 0 performed iteration 4 Thread 0 performed iteration 5 Thread 0 performed iteration 6 Thread 0 performed iteration 7 $./parallelLoopEqualChunks 2 Thread 0 performed iteration 0 Thread 1 performed iteration 4 Thread 0 performed iteration 1 Thread 1 performed iteration 5 Thread 0 performed iteration 2 Thread 1 performed iteration 6 Thread 0 performed iteration 3 Thread 1 performed iteration 7 Thread 0 does the first half (iterations 0, 1, 2, 3) Thread 1 does the second half (iterations 4, 5, 6, 7) Thread 0 does all iterations

17 csinparallel.org Sample Executions: parallelLoop (2) $./parallelLoopEqualChunks 4 Thread 0 performed iteration 0 Thread 0 performed iteration 1 Thread 1 performed iteration 2 Thread 2 performed iteration 4 Thread 3 performed iteration 6 Thread 1 performed iteration 3 Thread 2 performed iteration 5 Thread 3 performed iteration 7 $./parallelLoopEqualChunks 8 Thread 4 performed iteration 4 Thread 1 performed iteration 1 Thread 6 performed iteration 6 Thread 2 performed iteration 2 Thread 5 performed iteration 5 Thread 3 performed iteration 3 Thread 0 performed iteration 0 Thread 7 performed iteration 7 Each thread does 1/8 the iteration range Thread 0 does the 1 st quarter Thread 1 does the 2 nd quarter Thread 2 does the 3 rd quarter Thread 3 does the 4 th quarter

18 csinparallel.org /* parallelLoopEqualChunks.c (MPI) … */ #include // printf() #include // MPI #include // ceil() int main(int argc, char** argv) { const int REPS = 8; int id = -1, numProcesses = -1, chunkSize = -1, start = -1, stop = -1, i = -1; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &id); MPI_Comm_size(MPI_COMM_WORLD, &numProcesses); chunkSize = (int) ceil( (double) REPS / numProcesses ); start = id * chunkSize; if ( id < numProcesses - 1 ) { stop = (id + 1) * chunkSize; } else { stop = REPS; } for (i = start; i < stop; i++) { printf("Process %d performed iteration %d\n", id, i); } MPI_Finalize(); return 0; }

19 csinparallel.org Sample Executions $ mpirun –np 1./parallelForEqualChunks Process 0 performed iteration 0 Process 0 performed iteration 1 Process 0 performed iteration 2 Process 0 performed iteration 3 Process 0 performed iteration 4 Process 0 performed iteration 5 Process 0 performed iteration 6 Process 0 performed iteration 7 $ mpirun –np 2./parallelForEqualChunks Process 0 performed iteration 0 Process 0 performed iteration 1 Process 0 performed iteration 2 Process 0 performed iteration 3 Process 1 performed iteration 4 Process 1 performed iteration 5 Process 1 performed iteration 6 Process 1 performed iteration 7 Process 0 does the first half Process 1 does the second half Process 0 does all iterations

20 csinparallel.org Example: The Master Worker Pattern

21 csinparallel.org /* masterWorker.c (OpenMP) … */ #include int main(int argc, char** argv) { int id = -1, numThreads = -1; // #pragma omp parallel { id = omp_get_thread_num(); numThreads = omp_get_num_threads(); if ( id == 0 ) { printf(”Greetings from the master, #%d of %d threads\n\n”, id, numThreads); } else { printf(”Greetings from a worker, #%d of %d threads\n\n”, id, numThreads); } return 0; }

22 csinparallel.org Sample Executions: masterWorker $./masterWorker// #pragma omp parallel disabled Greetings from the master, #0 of 1 threads $./masterWorker// #pragma omp parallel enabled Greetings from a worker, #1 of 8 threads Greetings from a worker, #2 of 8 threads Greetings from a worker, #5 of 8 threads Greetings from a worker, #3 of 8 threads Greetings from a worker, #6 of 8 threads Greetings from the master, #0 of 8 threads Greetings from a worker, #4 of 8 threads Greetings from a worker, #7 of 8 threads But if we “uncomment” the #pragma, rebuild, and run:

23 csinparallel.org /* masterWorker.c (MPI) … */ #include int main(int argc, char** argv) { int id = -1, numProcs= -1, length = -1; char hostName[MPI_MAX_PROCESSOR_NAME]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &id); MPI_Comm_size(MPI_COMM_WORLD, &numProcs); MPI_Get_processor_name (hostName, &length); if ( id == 0 ) { printf("Greetings from the master, #%d (%s) of %d processes\n”, id, hostName, numProcs); } else { printf("Greetings from a worker, #%d (%s) of %d processes\n”, id, hostName, numProcs); } MPI_Finalize(); return 0; }

24 csinparallel.org Sample Executions: masterWorker $ mpirun -np 1./masterWorker Greetings from the master, #0 (node-01) of 1 processes $ mpirun –np 8./masterWorker Greetings from the master, #0 (node-01) of 8 processes Greetings from a worker, #1 (node-02) of 8 processes Greetings from a worker, #5 (node-06) of 8 processes Greetings from a worker, #3 (node-04) of 8 processes Greetings from a worker, #4 (node-05) of 8 processes Greetings from a worker, #7 (node-08) of 8 processes Greetings from a worker, #2 (node-03) of 8 processes Greetings from a worker, #6 (node-07) of 8 processes

25 csinparallel.org Example: The Broadcast Pattern

26 csinparallel.org /* broadcast.c (MPI) … */ … #define MAX 8 int main(int argc, char** argv) { int array[MAX] = {0}; int numProcs, myRank; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numProcs); MPI_Comm_rank(MPI_COMM_WORLD, &myRank); if (myRank == 0) { fill(array, MAX); } print("array before", myRank, array); MPI_Bcast(array, MAX, MPI_INT, 0, MPI_COMM_WORLD); print("array after", myRank, array); MPI_Finalize(); return 0; }

27 csinparallel.org Sample Executions: broadcast $ mpirun –np 2./broadcast process 1 array before: {0, 0, 0, 0, 0, 0, 0, 0} process 0 array before: {11, 12, 13, 14, 15, 16, 17, 18} process 0 array after: {11, 12, 13, 14, 15, 16, 17, 18} process 1 array after: {11, 12, 13, 14, 15, 16, 17, 18} $ mpirun –np 4./broadcast process 1 array before: {0, 0, 0, 0, 0, 0, 0, 0} process 0 array before: {11, 12, 13, 14, 15, 16, 17, 18} process 3 array before: {0, 0, 0, 0, 0, 0, 0, 0} process 0 array after: {11, 12, 13, 14, 15, 16, 17, 18} process 2 array before: {0, 0, 0, 0, 0, 0, 0, 0} process 1 array after: {11, 12, 13, 14, 15, 16, 17, 18} process 3 array after: {11, 12, 13, 14, 15, 16, 17, 18} process 2 array after: {11, 12, 13, 14, 15, 16, 17, 18}

28 csinparallel.org Example: The Scatter Pattern

29 csinparallel.org /* scatter.c (MPI) … */ … int main(int argc, char** argv) { const int SIZE = 8; int* arrSend = NULL; int* arrRcv = NULL; int numProcs = -1, myRank = -1, numSent = -1; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numProcs); MPI_Comm_rank(MPI_COMM_WORLD, &myRank); if (myRank == 0) { arrSend = (int*) malloc( SIZE* sizeof(int) ); for (int i = 0; i < SIZE; i++) { arrSend[i] = (i+1) * 11; } print(myRank, "arrSend", arrSend, SIZE); } numSent = SIZE / numProcs; arrRcv = (int*) malloc( numSent * sizeof(int) ); MPI_Scatter(arrSend, numSent, MPI_INT, arrRcv, numSent, MPI_INT, 0, MPI_COMM_WORLD); print(myRank, "arrRcv", arrRcv, numSent); free(arrSend); free(arrRcv); MPI_Finalize(); return 0; }

30 csinparallel.org Sample Executions: scatter $ mpirun –np 2./scatter Process 0, arrSend: 11 22 33 44 55 66 77 88 Process 0, arrRcv: 11 22 33 44 Process 1, arrRcv: 55 66 77 88 $ mpirun –np 4./scatter Process 0, arrSend: 11 22 33 44 55 66 77 88 Process 0, arrRcv: 11 22 Process 1, arrRcv: 33 44 Process 2, arrRcv: 55 66 Process 3, arrRcv: 77 88 $ mpirun –np 8./scatter Process 0, arrSend: 11 22 33 44 55 66 77 88 Process 0, arrRcv: 11 Process 1, arrRcv: 22 Process 2, arrRcv: 33 Process 3, arrRcv: 44 Process 5, arrRcv: 66 Process 6, arrRcv: 77 Process 7, arrRcv: 88 Process 4, arrRcv: 55

31 csinparallel.org Patternlets Collection (so far) OpenMP – SPMD – Master-Worker – Parallel Loops – Barrier – Fork-Join – Reduction – Private – Atomic – Mutex (Critical) – Sections – … MPI – SPMD – Master-Worker – Parallel Loops – Barrier – Message Passing – Reduction – Broadcast – Scatter – Gather – … Pthreads – Fork-Join – SPMD

32 csinparallel.org Curricular Uses of Patternlets Instructors have used the patternlets to introduce students to parallel patterns in these courses: – Intro to Data Structures (1 st year) – System Fundamentals (1 st or 2 nd year) – Computer Organization (1 st or 2 nd year) – Operating Systems (2 nd or 3 rd year) – Parallel Computing (3 rd or 4 th year) – High Performance Computing (3 rd or 4 th year) Some use them for in-class live-coding demos; others for student hands-on closed-lab exercises

33 csinparallel.org Assessment In Intro to Data Structures (CS2): – Fall 2012: Lectures on parallel topics; no patternlets – Spring 2013: Live-coding patternlet demos instead Students were much more engaged than previously, asking many “What if you change…” kinds of questions On the final exam, we ask 4 “parallel” questions – Fall 2012 average (41 students): 2.95 out of 4 – Spring 2013 average (38 students): 3.05 out of 4 2.5% improvement not significant (p=0.293), but… – Fall 2012 students were mostly 3 rd year EE majors – Spring 2013 students were mostly 1 st year CS majors

34 csinparallel.org Conclusions Parallel experts think in parallel patterns – Scalable, industry-standard best practices – We want our students to think this way too Patternlets are minimalist, scalable, working programs, illustrating pattern behaviors – Can be used in a variety of courses – Can be used by an instructor or by the students Students find patternlets very engaging – We believe they can improve student learning

35 csinparallel.org Resources The current collection of patternlets may be freely downloaded from CSinParallel.org. Thanks to: – CSinParallel – The National Science Foundation – You!


Download ppt "Csinparallel.org Patternlets: A Teaching Tool for Introducing Students to Parallel Design Patterns Joel C. Adams Department of Computer Science Calvin."

Similar presentations


Ads by Google