Suzaku Pattern Programming Framework (a) Structure and low level patterns © 2015 B. Wilkinson Suzaku.pptx Modification date February 22, 2016 1.

Slides:



Advertisements
Similar presentations
Introduction to C Programming
Advertisements

C Language.
Practical techniques & Examples
Chapter 7 Introduction to Procedures. So far, all programs written in such way that all subtasks are integrated in one single large program. There is.
Grid Computing, B. Wilkinson, C Program Command Line Arguments A normal C program specifies command line arguments to be passed to main with:
1 ITCS 5/4145 Parallel computing, B. Wilkinson, April 11, CUDAMultiDimBlocks.ppt CUDA Grids, Blocks, and Threads These notes will introduce: One.
AU/MITM/1.6 By Mohammed A. Saleh 1. Arguments passed by reference  Until now, in all the functions we have seen, the arguments passed to the functions.
Reference: / MPI Program Structure.
Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.
ספטמבר 04Copyright Meir Kalech1 C programming Language Chapter 10: Appendices.
Chapter 8 Arrays and Strings
Guide To UNIX Using Linux Third Edition
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Programming Arrays. Question Write a program that reads 3 numbers from the user and print them in ascending order. How many variables do we need to store.
Monte Carlo Simulation Used when it is infeasible or impossible to compute an exact result with a deterministic algorithm Especially useful in –Studying.
Java Unit 9: Arrays Declaring and Processing Arrays.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Programming with Shared Memory Introduction to OpenMP
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
chap13 Chapter 13 Programming in the Large.
1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.
© The McGraw-Hill Companies, 2006 Chapter 4 Implementing methods.
Arrays Module 6. Objectives Nature and purpose of an array Using arrays in Java programs Methods with array parameter Methods that return an array Array.
Hybrid MPI and OpenMP Parallel Programming
1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.
Structure of a C program Preprocessor directive (header file) Program statement } Preprocessor directive Global variable declaration Comments Local variable.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Chapter 8: Arrays and Functions Department of Computer Science Foundation Year Program Umm Alqura University, Makkah Computer Programming Skills
Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
Copyright © 2002 W. A. Tucker1 Chapter 9 Lecture Notes Bill Tucker Austin Community College COSC 1315.
 2008 Pearson Education, Inc. All rights reserved. 1 Arrays and Vectors.
UniMAP Sem2-10/11 DKT121: Fundamental of Computer Programming1 Arrays.
Chapter 8 Arrays. A First Book of ANSI C, Fourth Edition2 Introduction Atomic variable: variable whose value cannot be further subdivided into a built-in.
JAVA: An Introduction to Problem Solving & Programming, 5 th Ed. By Walter Savitch and Frank Carrano. ISBN © 2008 Pearson Education, Inc., Upper.
Pattern Programming with the Seeds Framework © 2013 B. Wilkinson/Clayton Ferner SIGCSE 2013 Workshop 31 intro.ppt Modification date: Feb 17,
Arrays Chapter 7. MIS Object Oriented Systems Arrays UTD, SOM 2 Objectives Nature and purpose of an array Using arrays in Java programs Methods.
SEQUENTIAL AND OBJECT ORIENTED PROGRAMMING Arrays.
1 Lecture 4: Part1 Arrays Introduction Arrays  Structures of related data items  Static entity (same size throughout program)
1 Agenda Arrays: Definition Memory Examples Passing arrays to functions Multi dimensional arrays.
KUKUM-06/07 EKT120: Computer Programming 1 Week 6 Arrays-Part 1.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
LESSON 8: INTRODUCTION TO ARRAYS. Lesson 8: Introduction To Arrays Objectives: Write programs that handle collections of similar items. Declare array.
1 ITCS4145 Parallel Programming B. Wilkinson March 23, hybrid-abw.ppt Hybrid Parallel Programming Introduction.
The Suzaku Pattern Programming Framework 6th NSF/TCPP Workshop on Parallel and Distributed Computing Education (EduPar-16) 2016 IEEE International Parallel.
BIL 104E Introduction to Scientific and Engineering Computing Lecture 4.
User-Written Functions
The Suzaku Pattern Programming Framework
Pattern Parallel Programming
Introduction to OpenMP
Functions Separate Compilation
Suzaku Pattern Programming Framework Workpool pattern (Version 2)
JavaScript: Functions.
Arrays & Functions Lesson xx
Using compiler-directed approach to create MPI code automatically
Pattern Parallel Programming
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
EKT150 : Computer Programming
Pattern Programming Tools
© B. Wilkinson/Clayton Ferner SIGCSE 2013 Workshop 31 session2a
Introduction to parallelism and the Message Passing Interface
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Monte Carlo Methods A so-called “embarrassingly parallel” computation as it decomposes into obviously independent tasks that can be done in parallel without.
Using compiler-directed approach to create MPI code automatically
Parallel Techniques • Embarrassingly Parallel Computations
Pattern Programming Seeds Framework Workpool Assignment 1
Patterns Paraguin Compiler Version 2.1.
Matrix Addition and Multiplication
Presentation transcript:

Suzaku Pattern Programming Framework (a) Structure and low level patterns © 2015 B. Wilkinson Suzaku.pptx Modification date February 22,

Similar to OpenMP but using processes instead of threads. Computation begins with a single master process (after initialization of the environment). One or more parallel sections created that will use all the processes including the master process. Outside parallel sections only executed by the master process. 2 Suzaku program structure

3 int main (int argc, char **argv ) { int P,... // variables declaration and initialization SZ_Init(P);// initialize message-passing environment // sets P to number of processes... SZ_Parallel_begin // parallel section … SZ_Parallel_end;... SZ_Finalize(); return(0); } All the variables declared here are duplicated in each process. All initializations here will apply to all copies of the variables. After call to SZ_Init() only master process executes code, until a parallel section. After SZ_Parallel_begin, all processes execute code, until a SZ_Parallel_end Only master process executed code here.

4 Routines available in Suzaku divided into: Low-level message passing patterns or routines and Routines that implement high-level patterns

5

Implementation Low level Suzaku routines implemented using macros and exposed in suzaku.h. Macros perform in-line text substitution and are more efficient to replace short simple routines. Macros used here because do not require parameters of a specific data type or size, whereas routines/functions would require parameters of a specific type and size in C. Low-level Suzaku routines do not require programmer to specify data type or size and in some cases accept different data types. Point to point routine accepts characters, integers, doubles, one- dimensional arrays of characters, integers, and doubles, and multi-dimensional arrays of doubles without specification. 6

Point-to-point pattern with various data types #include "suzaku.h" // Suzaku macros int main(int argc, char *argv[]) { char m[20], n[20]; int p, x, y, xx[5], yy[5]; double a, b, aa[10], bb[10], aaa[2][3], bbb[2][3];... SZ_Init(p);// initialize environment, SZ_Parallel_begin// parallel section - from proc 0 to proc 1: SZ_Point_to_point(0, 1, m, n); // send a string SZ_Point_to_point(0, 1, &x, &y); // send an int SZ_Point_to_point(0, 1, &a, &b); // send a double SZ_Point_to_point(0, 1, xx, yy); // send 1-D array of ints SZ_Point_to_point(0, 1, aa, bb); // send 1-D array of doubles SZ_Point_to_point(0, 1, aaa, bbb); // send 2-D array of doubles SZ_Parallel_end;// end of parallel section... SZ_Finalize(); return 0; } 7 Address of an individual variable specified by prefixing argument with & address operator.

Compilation/execution Compile pt-to-pt.c: mpicc -o pt-to-pt pt-to-pt.c where suzaku.h is present in the same directory. Execute pt-to-pt.c: mpiexec -n pt-to-pt where is number of processes you wish to use. 8 As a regular MPI program.

Master-slave pattern 9 Slaves Master Implemented using broadcast, scatter, and gather patterns. For efficient mapping to collective MPI routines, master also acts as one slave. SZ_Parallel_begin SZ_Scatter(a,c); // Scatter data SZ_Broadcast(b); // Broadcast data Compute();// All slaves do this SZ_Gather(b,a);// Gather results SZ_Parallel_end;

10 SZ_Parallel_begin SZ_Scatter(A,A1); // Scatter A array into A1 SZ_Broadcast(B); // broadcast B array blksz = N/P;// note: must be executed by all processes for (i = 0 ; i < blksz; i++) { for (j = 0 ; j < N ; j++) { sum = 0; for(k = 0 ; k < N ; k++) { sum += A1[i][k] * B[k][j]; } C1[i][j] = sum; } SZ_Gather(C1,C);// gather results SZ_Parallel_end; Matrix multiplication with Suzaku routines

11 Higher Level Patterns Workpool (three versions) Iterative synchronous pipeline pattern Generalized patterns (to implement any pattern, stencil. etc.) Higher level patterns implemented as routines in suzaku.c, which needs to be compiled.

Workpool Pattern 12 Master Task from task queue Another task if task queue not empty Aggregate answers Slaves/Workers Result Task queue Very widely applicable pattern Master gives slaves tasks Once a slave return the result for a task, it is given another task from task queue, until empty Creates load- balancing quality. Need to differentiate from master-slave pattern, which does not imply a task queue.

Programming Modeled on the Seeds framework (see later). Workpool Three phases: 1. Master sends data to slaves (in Seeds terminology “diffuses”) 2. Slaves performs computations 3. Master gathers results for slaves 13 Diffuse Compute Gather Slaves Master Message passing done by framework Slaves Master Workpool

Suzaku Interface In first implementation of the workpool, data sent between processes limited to 1-D arrays of doubles. Programmer must implement four routines: init() -- sets values for number of tasks (T), number of data elements in each task (D), and number of data elements in results (R). diffuse() -- generates the next task – a 1-D array of D doubles with an associated task ID - when called by the master. compute() -- takes a task generated by diffuse and generate the corresponding result - a 1-D array of R doubles with an associated task ID. Executed by slaves. gather() -- accepts a slave result and use it to develop the final answer. Executed by master. 14

Workpool code Implemented by the provided routine SZ_Workpool() placed within a parallel section: SZ_Parallel_begin SZ_Workpool(init, diffuse, compute, gather); SZ_Parallel_end; Input function parameters: init(), diffuse(), compute(), and gather() Can be re-named to accommodate for example multiple workpools in a single program. (Not possible in Seeds) 15

#include "suzaku.h" void init(int *T, int *D, int *R) {... } void diffuse(int *taskID,double output[D]) {... } void compute(int taskID, double input[D], double output[R]) {... } void gather(int taskID, double input[R]) {... } int main(int argc, char *argv[]) { int P; SZ_Init(P);... SZ_Parallel_begin SZ_Workpool(init, diffuse, compute, gather); SZ_Parallel_end;... // final results SZ_Finalize(); return 0; } 16 Program structure

Compilation/Execution SZ_Workpool() is in the file suzaku.c. Could be separately compiled into suzaku.o with mpicc –c –o suzaku.o suzuku.c Then compile prog1 with: mpicc –o prog1 prog1.c suzaku.o with suzaku.h and suzaku.o in the same directory. suzaku.c does not use suzaku.h but since SZ_Workpool() needs to be within a parallel section, application code must include suzaku.h. Execute as a regular MPI program: mpiexec –n prog1 17 Add Add to create object file

Sample workpool programs 1. Adding and multiplying numbers // Suzaku Workpool : Adding and multiplying numbers. B. Wilkinson April 3, 2015 #include #include "suzaku.h" #define T 6// number of tasks, max = INT_MAX - 1 #define D 10// number of data items in each task, doubles only #define R 2// number of data items in result of each task, doubles only double workpool_result[T][R];// Final results computed by workpool, in gather() double task[T][D];// Initial data, created by initialize() for testing void init(int *tasks, int *data_items, int *result_items) { *tasks = T; *data_items = D; *result_items = R; return; } void diffuse(int *taskID,double output[D]) { int j;// taskID not used static int temp = 0;// only initialized first time function called for (j = 0; j < D; j++) output[j] = ++temp;// set elements to consecutive data values } 18

void compute(int taskID, double input[D], double output[R]) { // function done by slaves // adding numbers together, and multiply them output[0] = 0; output[1] = 1; int i; for (i = 0; i < D; i++) { output[0] += input[i]; output[1] *= input[i]; } return; } void gather(int taskID, double input[R]) { // function by master collecting slave results, //uses taskID int j; for (j = 0; j < R; j++) { workpool_result[taskID][j] = input[j]; } // additional routines used in this application void initialize() { // create initial data for sequential testing, not used by workpool... } void compute_seq() {// Compute results sequentially and print out... } 19

int main(int argc, char *argv[]) { int i; // All variables declared here are in every process int P;// number of processes, set by SZ_Init(P) SZ_Init(P);// initialize MPI message-passing, sets P to # of processes printf("number of tasks = %d\n",T); printf("number of data items in each task = %d\n",D); printf("number of data items in each result = %d\n",R); initialize();// create initial data for sequential testing, not used by workpool compute_seq();// compute results sequentially and print out SZ_Parallel_begin SZ_Workpool(init,diffuse,compute,gather); SZ_Parallel_end;// end of parallel printf("\nWorkpool results\n"); // print out workpool results for (i = 0; i < T; i++) { printf("%5.2f ", workpool_result[i][0]);// result printf("%5.2e", workpool_result[i][1]);// result printf("\n"); } SZ_Finalize(); return 0; } 20

Sample output 21

Sample output with debug messages There is version of SZ_Workpool that will print out debug messages. 22

Circle formed within a 2 x 2 square. Ratio of area of circle to square: Points within square chosen randomly. Count how many points lie within circle. Fraction of points within circle will be  /4, given sufficient number of randomly selected samples. 2. Calculate  using the Monte Carlo method 23 1 x y Area =  2 2 Typically only one quadrant used. Random pairs of numbers, (xr,yr) generated, each between 0 and 1. Counted as in circle if

24 Slaves Master inside seed Starting seed for random sequence to each slave Return number of random points inside arc of circle Workpool implementation DiffuseData GatherData Compute Aggregate answers

Suzaku Monte Carlo Pi program // Suzaku Workpool: Monte Carlo Pi. B. Wilkinson April 4, 2015 #include #include "suzaku.h" // required Suzaku constants #define T 100// number of tasks, max = INT_MAX - 1 #define D 1// number of data items in each task, doubles only #define R 1// number of data items in result of each task, doubles only #define S // constant used in computation, sample pts done in a slave double total = 0;// // gobal variable, final result void init(int *tasks, int *data_items, int *result_items) { *tasks = T; *data_items = D; *result_items = R; return; } void diffuse(int *taskID,double output[D]) {// taskID not used in computation static int temp = 0;// only initialized first time function called output[0] = ++temp;// set seed to consecutive data value } 25

void compute(int taskID, double input[D], double output[R]) { int i; double x, y; double inside = 0; srand(input[0]);// initialize random number generator for (i = 0; i < S; i++) { x = rand() / (double) RAND_MAX; y = rand() / (double) RAND_MAX; if ( (x * x + y * y) <= 1.0 ) inside++; } output[0] = inside; return; } void gather(int taskID, double input[R]) { total += input[0];// aggregate answer } double get_pi() { // additional routines used in this application double pi; pi = 4 * total / (S*T); printf("\nWorkpool results, Pi = %f\n",pi);// results } 26

int main(int argc, char *argv[]) { int i; // All variables declared here are in every process int P;// number of processes, set by SZ_Init(P) double time1, time2; // use clock for timing SZ_Init(P);// initialize MPI environment, sets P to number of procs printf("number of tasks = %d\n",T); printf("number of samples done in slave per task = %d\n",S); time1 = SZ_Wtime(); // record time stamp SZ_Parallel_begin// start of parallel section SZ_Workpool(init,diffuse,compute,gather); SZ_Parallel_end;// end of parallel time2 = SZ_Wtime(); // record time stamp get_pi();// calculate final result printf(“Exec. time =\t%lf (secs)\n",time2- time1); SZ_Finalize(); return 0; } 27

Sample output 28

Slaves Master C A B Send one row of A and one column of B to slave Return one element of C Workpool implementation With one slave computing one element of result. 29 Matrix Multiplication

3 Suzaku matrix multiplication workpool program // Suzaku Workpool: Matrix Multiplication. B. Wilkinson April 5, 2015 #include...// usual C includes #include "suzaku.h" #define T 100// req. Suzaku constants, T (tasks), D (data in task), R (result elements #define D 6 #define R 1 #define N 3 // size of arrays double A[N][N], B[N][N], C[N][N], Cseq[N][N]; void init(int *tasks, int *data_items, int *result_items) { *tasks = T; *data_items = D; *result_items = R; return; } void diffuse(int taskID,double output[D]) { // same as Seeds sample but inefficient int i, a, b; // copying of arrays, taskID used a = taskID / N; b = taskID % N; for (i = 0; i < N; i++) { //Copy one row of A and one col of B into output output[i] = A[a][i]; output[i+N] = B[i][b]; } return; } 30

void compute(int taskID, double input[D], double output[R]) { int i; output[0] = 0; for (i = 0; i < N; i++) { output[0] += input[i] * input[i+N]; } return; } void gather(int taskID, double input[R]) { int a, b; a = taskID / 3; b = taskID % 3; C[a][b]= input[0]; return; } // additional routines used in this application void initialize() { // initialize arrays... return; } void seq_matrix_mult(double A[N][N], double B[N][N], double C[N][N]) {... return; } void print_array(double array[N][N]) { // print out an array... return; } 31

int main(int argc, char *argv[]) { int i; // All variables declared here are in every process int P;// number of processes, set by SZ_Init(P) double time1, time2; // use clock for timing SZ_Init(P);// initialize MPI environment, sets P initialize();// initialize input arrays...// print out arrays.. // compute sequentially to check results time1 = SZ_Wtime(); // record time stamp SZ_Parallel_begin// start of parallel section SZ_Workpool(init,diffuse,compute,gather); SZ_Parallel_end;// end of parallel time2 = SZ_Wtime(); // record time stamp printf("Workpool results");...// print final result printf(“Time =\t%lf (seconds)\n", time2 - time1); SZ_Finalize(); return 0; } 32

Sample output 33

Questions 34