Using compiler-directed approach to create MPI code automatically

Slides:

Advertisements

Similar presentations

MPI Message Passing Interface Portable Parallel Programs.

Advertisements

Practical techniques & Examples

Toward using higher-level abstractions to teach Parallel Computing 5/20/2013 (c) Copyright 2013 Clayton S. Ferner, UNC Wilmington1 Clayton Ferner, University.

1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.

Other Means of Executing Parallel Programs OpenMP And Paraguin 1(c) 2011 Clayton S. Ferner.

12d.1 Two Example Parallel Programs using MPI UNC-Wilmington, C. Ferner, 2007 Mar 209, 2007.

1 UNC-Charlotte’s Grid Computing “Seeds” framework 1 © 2011 Jeremy Villalobos /B. Wilkinson Fall 2011 Grid computing course. Slides10-1.ppt Modification.

A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.

Monte Carlo Simulation Used when it is infeasible or impossible to compute an exact result with a deterministic algorithm Especially useful in –Studying.

– 1 – Basic Machine Independent Performance Optimizations Topics Load balancing (review, already discussed) In the context of OpenMP notation Performance.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Programming with Shared Memory Introduction to OpenMP

CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.

Lecture 5: Shared-memory Computing with Open MP. Shared Memory Computing.

1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.

Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.

Hybrid MPI and OpenMP Parallel Programming

Message Passing Programming Model AMANO, Hideharu Textbook pp. １４０－１４７.

1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.

Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.

1 "Workshop 31: Developing a Hands-on Undergraduate Parallel Programming Course with Pattern Programming SIGCSE The 44 th ACM Technical Symposium.

MPI and OpenMP.

Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.

CS/EE 217 GPU Architecture and Parallel Programming Lecture 23: Introduction to OpenACC.

3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.

Using Compiler Directives Paraguin Compiler 1 © 2013 B. Wilkinson/Clayton Ferner SIGCSE 2013 Workshop 310 session2a.ppt Modification date: Jan 9, 2013.

Message Passing Interface Using resources from

Suzaku Pattern Programming Framework (a) Structure and low level patterns © 2015 B. Wilkinson Suzaku.pptx Modification date February 22,

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

1 ITCS4145 Parallel Programming B. Wilkinson March 23, hybrid-abw.ppt Hybrid Parallel Programming Introduction.

Introduction to OpenMP

Using Paraguin to Create Parallel Programs

Lecture 5: Shared-memory Computing with Open MP

Hybrid Parallel Programming with the Paraguin compiler

Pattern Parallel Programming

MPI Message Passing Interface

Introduction to OpenMP

September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.

Paraguin Compiler Examples.

OpenMP Quiz B. Wilkinson January 22, 2016.

Suzaku Pattern Programming Framework Workpool pattern (Version 2)

Sieve of Eratosthenes.

Parallel Graph Algorithms

Pattern Parallel Programming

Paraguin Compiler Examples.

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

Using compiler-directed approach to create MPI code automatically

Hybrid Parallel Programming

Paraguin Compiler Communication.

Paraguin Compiler Version 2.1.

Paraguin Compiler Examples.

Paraguin Compiler Version 2.1.

Pattern Programming Tools

Introduction to parallelism and the Message Passing Interface

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

Hybrid Parallel Programming

Using compiler-directed approach to create MPI code automatically

Hybrid Parallel Programming

Introduction to OpenMP

Patterns Paraguin Compiler Version 2.1.

Hybrid MPI and OpenMP Parallel Programming

Matrix Addition and Multiplication

Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as ai,j and elements of B as.

Hybrid Parallel Programming

Data Parallel Pattern 6c.1

Quiz Questions How does one execute code in parallel in Paraguin?

Shared-Memory Paradigm & OpenMP

MPI Message Passing Interface

CS 584 Lecture 8 Assignment?.

Presentation transcript:

Using compiler-directed approach to create MPI code automatically Paraguin Compiler ITCS4145/5145, Parallel Programming Clayton Ferner/B. Wilkinson March 10, 2014. ParagionSlides1.ppt

The Paraguin compiler is being developed by Dr The Paraguin compiler is being developed by Dr. C Ferner, UNC-Wilmington Following based upon his slides (Assumes already seen OpenMP)

Paraguin compiler A source-to-source compiler built using the Stanford SUIF compiler (suif.stanford.edu) Transforms a sequential program into an MPI program suitable for compilation and execution on a distributed-memory system User can inspect and modify resulting MPI code Create a similar abstraction as OpenMP for creating MPI code and uses pragma statements Directives also for higher-level patterns (Scatter-gather master-worker, workpool*, stencil, etc) * Not yet implemented.

Compiler Directives Advantage to using pragmas is that other compilers will ignore them You can provide information to Paraguin that is ignored by other compilers, say gcc You can create a hybrid program using pragmas for different compilers

Paraguin Directives Paraguin Syntax: #pragma paraguin <type> [<parameters>] A parallel region specified by: #pragma paraguin begin_parallel … #pragma paraguin end_parallel Other directives placed inside this region, as in OpenMP but now for MPI processes rather than threads.

Example 1 (Monte Carlo Estimation of PI) int main(int argc, char *argv[]) { char *usage = "Usage: %s N\n"; int i, error = 0, count, count_tmp, total = atoi(argv[1]); double x, y, result; #pragma paraguin begin_parallel #pragma paraguin bcast total count = 0; srandom(…); for (i = 0; i < total; i++) { x = ((double) random()) / RAND_MAX; y = ((double) random()) / RAND_MAX; if (x*x + y*y <= 1.0) { count++; } ; #pragma paraguin reduce sum count count_tmp #pragma paraguin end_parallel result = 4.0 * (((double) count_tmp) / (__guin_NP * total)); … Parallel Region Broadcast input Computation (All processes do this) Paraguin variable for no. of processes Reduce Partial Results End Parallel Region

Example 2 (Matrix Addition) int main(int argc, char *argv[]){ int i, j, error = 0; double A[N][N], B[N][N], C[N][N]; char *usage = "Usage: %s file\n"; … // Read input matrices A and B #pragma paraguin begin_parallel #pragma paraguin scatter A B // Scatter input to all processors. #pragma paraguin forall // Parallelize for loop for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { C[i][j] = A[i][j] + B[i][j]; } ; #pragma paraguin gather C #pragma paraguin end_parallel … // Process Results Parallel Region Scatter input Forall Gather Partial Results End Parallel Region

Scatter/Gather Pattern Monte Carlo and Matrix Addition are examples of Scatter/Gather Scatter/gather pattern can also use either broadcast or reduction or both Done as a template: Master prepares input Scatter/Broadcast input Compute partial results Gather/Reduce partial results into the final result

Stencil Pattern

Paraguin Stencil Pattern Example (heat distribution) #define TOTAL_TIME 3000 #define N 200 #define M 200 double computeValue (double A[][M], int i, int j) { return (A[i-1][j] + A[i+1][j] + A[i][j-1] + A[i][j+1]) * 0.25; } int main(int argc, char *argv[]) { int i, j,n, m, max_iterations, done; double A[2][N][M]; … // Initialize input A #pragma paraguin begin_parallel n = N; m = M; max_iterations = TOTAL_TIME; ; #pragma paraguin stencil A n m max_iterations computeValue #pragma paraguin end_parallel … Computation (All processes do this) Stencil Pattern

Stencil Pragma Replaced with Code to do: The array given as an argument to the stencil pragma is broadcast to all available processors. A loop is created to iterate max_iteration number of times. Within that loop, code is inserted to perform the following steps: Each processor (except the last one) will send its last row to the processor with rank one more than its own rank. Each processor (except the first one) will receive the last row from the processor with rank one less than its own rank. Each processor (except the first one) will send its first row to the processor with rank one less than its own rank. Each processor (except the last one) will receive the first row from the processor with rank one more than its own rank. Each processor will iterate through the values of the rows for which it is responsible and use the function provided compute the next value. The data is gathered back to the root processor (rank 0).

Compilation and Running Source Code w/ pragmas Executable Run with mpiexec Paraguin mpicc Source code w/ pragmas scc is SUIF compiler driver (source to source compiler) scc -DPARAGUIN -D__x86_64__ hello.c -.out.c mpicc –o hello.out hello.out.c mpiexec -n 8 ./hello.out In this case, 8 processes Can be replaced with scc -DPARAGUIN -D__x86_64__ -cc mpicc hello.c -o hello.out

Compile Web Page (Avoids needing scc compiler) http://babbage.cis.uncw.edu/~cferner/compoptions.html Upload your source code Compiles remotely Download resulting MPI source code and compiler log messages (or actual executable) You can then use your MPI environment to compile and execute.

Questions so far