Presentation is loading. Please wait.

Presentation is loading. Please wait.

OpenMP Presented by Kyle Eli. OpenMP Open –Open, Collaborative Specification –Managed by the OpenMP Architecture Review Board (ARB) MP –Multi Processing.

Similar presentations


Presentation on theme: "OpenMP Presented by Kyle Eli. OpenMP Open –Open, Collaborative Specification –Managed by the OpenMP Architecture Review Board (ARB) MP –Multi Processing."— Presentation transcript:

1 OpenMP Presented by Kyle Eli

2 OpenMP Open –Open, Collaborative Specification –Managed by the OpenMP Architecture Review Board (ARB) MP –Multi Processing

3 OpenMP is… A specification for using shared memory parallelism in Fortran and C/C++. –Compiler Directives –Library Routines –Environment Variables Usable with –Fortran 77, Fortran 90, ANSI 89 C or ANSI C++ –Does not require Fortran 90 or C++

4 OpenMP requires… Platform support –Many operating systems, including Windows, Solaris, Linux, AIX, HP-UX, IRIX, OSX –Many CPU architectures, including x86, x86-64, PowerPC, Itanium, PA-RISC, MIPS Compiler support –Many commercial compilers from vendors such as Microsoft, Intel, Sun, and IBM –GCC via GOMP Should be included in GCC 4.2 May already be available with some distributions

5 OpenMP offers… Consolidation of vendor-specific implementations Single-source portability Support for coarse grain parallelism –Allows for complex (coarse grain) code in parallel applications.

6 OpenMP offers… Scalability –Simple constructs with low overhead –However, still dependent on the application and algorithm. Nested parallelism –May be executed on a single thread Loop-level parallelism However, no support for task parallelism –They’re working on it

7 OpenMP compared to… Message Passing Interface (MPI) –OpenMP is not a message passing specification –Less overhead High Performance Fortran (HPF) –Not widely accepted –Focus on data parallelism

8 OpenMP compared to… Pthreads –Not targeted for HPC/scientific computing –No support for data parallelism –Requires lower-level programming FORALL loops –Simple loops –Subroutine calls can’t have side-effects Various parallel programming languages –May be architecture specific –May be application specific

9 The OpenMP Model Sequential code –Implemented in the usual way –Executes normally Parallel code –Multiple threads created Number of threads can be user-specified –Each thread executes the code in the parallel region

10 Using OpenMP Compiler directives –Begin with #pragma omp –In C/C++, the code region is defined by curly braces following the directive –Should be ignored by compilers that don’t understand OpenMP –Define how regions of code should be executed –Define variable scope –Synchronization

11 Using OpenMP Parallel region construct –#pragma omp parallel –Defines a region of parallel code –Causes a team of threads to be created –Threads execute code in the region in the same order –Threads join after the region ends

12 Using OpenMP Work-sharing directives –For –Sections –Single

13 Using OpenMP For construct –#pragma omp for –Loop parallelism Iterations of the loop are divided amongst worker threads –Workload division can be user-specified Branching out of the loop is not allowed

14 Using OpenMP Sections construct –#pragma omp sections –Divides code into sections which are divided amongst worker threads #pragma omp section –Used to define each section

15 Using OpenMP Single construct –#pragma omp single –Only one thread executes the code Useful when code is not thread-safe All other threads wait until execution completes

16 Using OpenMP Synchronization directives –Master #pragma omp master Code is executed only by the master thread –Critical #pragma omp critical Code is executed by only one thread at a time –Barrier #pragma omp barrier Threads will wait for all other threads to reach this point before continuing –Atomic #pragma omp atomic The following statement (which must be an assignment) is executed by only one thread at a time.

17 Using OpenMP Synchronization Directives –Flush #pragma omp flush Thread-visible variables are written back to memory to present a consistent view across all threads –Ordered #pragma omp ordered Forces iterations of a loop to be executed in sequential order Used with the For directive –Threadprivate #pragma omp threadprivate Causes global variables to be local and persistent to a thread across multiple parallel regions

18 Using OpenMP Data Scope –By default, most variables are shared Loop index and subroutine stack variables are private

19 Using OpenMP Data scoping attributes –Private New object of the same type is created for each thread –Not initialized –Shared Shared amongst all threads –Default Allows specification of default scope (Private, Shared, or None) –Firstprivate Variable is initialized with the value from the original object

20 Using OpenMP Data scoping attributes –Lastprivate Original object is updated with data from last section or loop iteration –Copyin Variable in each thread is initialized with the data from the original object in the master thread –Reduction Each thread gets a private copy of the variable, and the reduction clause allows specification of an operator for combining the private copies into the final result

21 Using OpenMP

22 OpenMP Example A short OpenMP example…

23 References http://www.openmp.org http://www.llnl.gov/computing/tutorials/openMP/

24 MPI By Chris Van Horn

25 What is MPI? Message Passing Interface More specifically a library specification for a message passing interface

26 Why MPI? What are the advantages of a message passing interface? What could a message passing interface be used for?

27 History of MPI MPI 1.1 Before everyone had to implement own message passing interface Committee formed of around 60 people from 40 organizations

28 MPI 1.1 The standardization process began in April 1992 Preliminary draft submitted November 1992 Just meant to get the ball rolling

29 MPI 1.1 Continued Subcommittees were formed for the major component areas Goal to produce standard by Fall 1993

30 MPI Goals Design an application programming interface (not necessarily for compilers or a system implementation library). Allow efficient communication: Avoid memory-to-memory copying and allow overlap of computation and communication and offload to communication co-processor, where available. Allow for implementations that can be used in a heterogeneous environment. Allow convenient C and Fortran 77 bindings for the interface. Assume a reliable communication interface: the user need not cope with communication failures. Such failures are dealt with by the underlying communication subsystem.

31 MPI Goals Continued Define an interface that is not too different from current practice, such as PVM, NX, Express, p4, etc., and provides extensions that allow greater flexibility. Define an interface that can be implemented on many vendor's platforms, with no significant changes in the underlying communication and system software. Semantics of the interface should be language independent. The interface should be designed to allow for thread- safety

32 MPI 2.0 In March 1995 work began on extensions to MPI 1.1 Forward Compatibility was preserved

33 Goals of MPI 2.0 Further corrections and clarifications for the MPI-1.1 document. Additions to MPI-1.1 that do not significantly change its types of functionality (new datatype constructors, language interoperability, etc.). Completely new types of functionality (dynamic processes, one-sided communication, parallel I/O, etc.) that are what everyone thinks of as ``MPI-2 functionality.'' Bindings for Fortran 90 and C++. This document specifies C++ bindings for both MPI-1 and MPI-2 functions, and extensions to the Fortran 77 binding of MPI-1 and MPI-2 to handle Fortran 90 issues. Discussions of areas in which the MPI process and framework seem likely to be useful, but where more discussion and experience are needed before standardization (e.g. 0-copy semantics on shared- memory machines, real-time specifications).

34 How MPI is used An MPI program consists of autonomous processes The processes communicate via calls to MPI communication primitives

35 Features Process Management One Sided Communication Collective Operations I/O

36 What MPI Does not Do Resource Control Not able to design a portable interface that would be appropriate for the broad spectrum of existing and potential resource and process controllers.

37 Process Management Can be tricky to implement properly What to watch out for: The MPI-2 process model must apply to the vast majority of current parallel environments. These include everything from tightly integrated MPPs to heterogeneous networks of workstations. MPI must not take over operating system responsibilities. It should instead provide a clean interface between an application and system software.

38 Warnings continued MPI must continue to guarantee communication determinism, i.e., process management must not introduce unavoidable race conditions. MPI must not contain features that compromise performance. MPI-1 programs must work under MPI-2, i.e., the MPI-1 static process model must be a special case of the MPI-2 dynamic model.

39 How Issues Addressed MPI remains primarily a communication library. MPI does not change the concept of communicator.

40 One Sided Communication Functions that establish communication between two sets of MPI processes that do not share a communicator. When would one sided communication be useful?

41 One Sided Communication How are the two sets of processes going to communicate with each other? Need some sort of rendezvous point.

42 Collective Operations Intercommunicator collective operations All-To-All All processes contribute to the result. All processes receive the result. * MPI_Allgather, MPI_Allgatherv * MPI_Alltoall, MPI_Alltoallv * MPI_Allreduce, MPI_Reduce_scatter All-To-One All processes contribute to the result. One process receives the result. * MPI_Gather, MPI_Gatherv * MPI_Reduce

43 Collective Operations One-To-All One process contributes to the result. All processes receive the result. * MPI_Bcast * MPI_Scatter, MPI_Scatterv Other Collective operations that do not fit into one of the above categories. * MPI_Scan * MPI_Barrier

44 I/O Optimizations required for efficiency can only be implemented if the parallel I/O system provides a high-level interface

45 MPI Implementations Many different implementations most widely used MPICH(1.1) and MPICH2(2.0) Argonne National Laboratory

46 Examples To run the program ``ocean'' with arguments ``-gridfile'' and ``ocean1.grd'' in C: char command[] = "ocean"; char *argv[] = {"-gridfile", "ocean1.grd", NULL}; MPI_Comm_spawn(command, argv,...); To run the program ``ocean'' with arguments ``-gridfile'' and ``ocean1.grd'' and the program ``atmos'' with argument ``atmos.grd'' in C: char *array_of_commands[2] = {"ocean", "atmos"}; char **array_of_argv[2]; char *argv0[] = {"-gridfile", "ocean1.grd", (char *)0}; char *argv1[] = {"atmos.grd", (char *)0}; array_of_argv[0] = argv0; array_of_argv[1] = argv1; MPI_Comm_spawn_multiple(2, array_of_commands, array_of_argv,...);

47 More Examples /* manager */ #include "mpi.h" int main(int argc, char *argv[]) { int world_size, universe_size, *universe_sizep, flag; MPI_Comm everyone; /* intercommunicator */ char worker_program[100]; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &world_size); if (world_size != 1) error("Top heavy with management"); MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &universe_sizep, &flag);

48 Example Continued if (!flag) { printf("This MPI does not support UNIVERSE_SIZE. How many\n\ processes total?"); scanf("%d", &universe_size); } else universe_size = *universe_sizep; if (universe_size == 1) error("No room to start workers"); /* * Now spawn the workers. Note that there is a run-time determination * of what type of worker to spawn, and presumably this calculation must * be done at run time and cannot be calculated before starting * the program. If everything is known when the application is * first started, it is generally better to start them all at once * in a single MPI_COMM_WORLD. */ choose_worker_program(worker_program); MPI_Comm_spawn(worker_program, MPI_ARGV_NULL, universe_size-1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone, MPI_ERRCODES_IGNORE); /* * Parallel code here. The communicator "everyone" can be used * to communicate with the spawned processes, which have ranks 0,.. * MPI_UNIVERSE_SIZE-1 in the remote group of the intercommunicator * "everyone". */ MPI_Finalize(); return 0; }

49 Yet More Example /* worker */ #include "mpi.h" int main(int argc, char *argv[]) { int size; MPI_Comm parent; MPI_Init(&argc, &argv); MPI_Comm_get_parent(&parent); if (parent == MPI_COMM_NULL) error("No parent!"); MPI_Comm_remote_size(parent, &size); if (size != 1) error("Something's wrong with the parent"); /* * Parallel code here. * The manager is represented as the process with rank 0 in (the remote * group of) MPI_COMM_PARENT. If the workers need to communicate among * themselves, they can use MPI_COMM_WORLD. */ MPI_Finalize(); return 0; }

50 References MPI Standards (http://www- unix.mcs.anl.gov/mpi/mpi- standard/mpi-report-2.0/mpi2- report.htm)


Download ppt "OpenMP Presented by Kyle Eli. OpenMP Open –Open, Collaborative Specification –Managed by the OpenMP Architecture Review Board (ARB) MP –Multi Processing."

Similar presentations


Ads by Google