OpenMP Presented by Kyle Eli. OpenMP Open –Open, Collaborative Specification –Managed by the OpenMP Architecture Review Board (ARB) MP –Multi Processing.

Slides:



Advertisements
Similar presentations
Parallel Processing with OpenMP
Advertisements

Introduction to Openmp & openACC
Introductions to Parallel Programming Using OpenMP
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.
Reference: Getting Started with MPI.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Introduction to MPI. What is Message Passing Interface (MPI)?  Portable standard for communication  Processes can communicate through messages.  Each.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
Programming with Shared Memory Introduction to OpenMP
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
1 Datamation Sort 1 Million Record Sort using OpenMP and MPI Sammie Carter Department of Computer Science N.C. State University November 18, 2004.
Parallel Programming in Java with Shared Memory Directives.
OMPi: A portable C compiler for OpenMP V2.0 Elias Leontiadis George Tzoumas Vassilios V. Dimakopoulos University of Ioannina.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
MPI3 Hybrid Proposal Description
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
OpenMP: Open specifications for Multi-Processing What is OpenMP? Join\Fork model Join\Fork model Variables Variables Explicit parallelism Explicit parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Hybrid MPI and OpenMP Parallel Programming
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
Threaded Programming Lecture 4: Work sharing directives.
Introduction to OpenMP
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Chapter 4 Message-Passing Programming. The Message-Passing Model.
MPI and OpenMP.
Threaded Programming Lecture 2: Introduction to OpenMP.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
Message Passing Interface Using resources from
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
PVM and MPI.
Threads Some of these slides were originally made by Dr. Roger deBry. They include text, figures, and information from this class’s textbook, Operating.
Introduction to OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
Shared Memory Parallelism - OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
CS427 Multicore Architecture and Parallel Computing
Open[M]ulti[P]rocessing
Computer Engg, IIT(BHU)
Introduction to OpenMP
Shared Memory Programming with OpenMP
Multi-core CPU Computing Straightforward with OpenMP
MPI-Message Passing Interface
Introduction to High Performance Computing Lecture 20
Programming with Shared Memory Introduction to OpenMP
Allen D. Malony Computer & Information Science Department
Introduction to parallelism and the Message Passing Interface
Introduction to OpenMP
Programming Parallel Computers
Presentation transcript:

OpenMP Presented by Kyle Eli

OpenMP Open –Open, Collaborative Specification –Managed by the OpenMP Architecture Review Board (ARB) MP –Multi Processing

OpenMP is… A specification for using shared memory parallelism in Fortran and C/C++. –Compiler Directives –Library Routines –Environment Variables Usable with –Fortran 77, Fortran 90, ANSI 89 C or ANSI C++ –Does not require Fortran 90 or C++

OpenMP requires… Platform support –Many operating systems, including Windows, Solaris, Linux, AIX, HP-UX, IRIX, OSX –Many CPU architectures, including x86, x86-64, PowerPC, Itanium, PA-RISC, MIPS Compiler support –Many commercial compilers from vendors such as Microsoft, Intel, Sun, and IBM –GCC via GOMP Should be included in GCC 4.2 May already be available with some distributions

OpenMP offers… Consolidation of vendor-specific implementations Single-source portability Support for coarse grain parallelism –Allows for complex (coarse grain) code in parallel applications.

OpenMP offers… Scalability –Simple constructs with low overhead –However, still dependent on the application and algorithm. Nested parallelism –May be executed on a single thread Loop-level parallelism However, no support for task parallelism –They’re working on it

OpenMP compared to… Message Passing Interface (MPI) –OpenMP is not a message passing specification –Less overhead High Performance Fortran (HPF) –Not widely accepted –Focus on data parallelism

OpenMP compared to… Pthreads –Not targeted for HPC/scientific computing –No support for data parallelism –Requires lower-level programming FORALL loops –Simple loops –Subroutine calls can’t have side-effects Various parallel programming languages –May be architecture specific –May be application specific

The OpenMP Model Sequential code –Implemented in the usual way –Executes normally Parallel code –Multiple threads created Number of threads can be user-specified –Each thread executes the code in the parallel region

Using OpenMP Compiler directives –Begin with #pragma omp –In C/C++, the code region is defined by curly braces following the directive –Should be ignored by compilers that don’t understand OpenMP –Define how regions of code should be executed –Define variable scope –Synchronization

Using OpenMP Parallel region construct –#pragma omp parallel –Defines a region of parallel code –Causes a team of threads to be created –Threads execute code in the region in the same order –Threads join after the region ends

Using OpenMP Work-sharing directives –For –Sections –Single

Using OpenMP For construct –#pragma omp for –Loop parallelism Iterations of the loop are divided amongst worker threads –Workload division can be user-specified Branching out of the loop is not allowed

Using OpenMP Sections construct –#pragma omp sections –Divides code into sections which are divided amongst worker threads #pragma omp section –Used to define each section

Using OpenMP Single construct –#pragma omp single –Only one thread executes the code Useful when code is not thread-safe All other threads wait until execution completes

Using OpenMP Synchronization directives –Master #pragma omp master Code is executed only by the master thread –Critical #pragma omp critical Code is executed by only one thread at a time –Barrier #pragma omp barrier Threads will wait for all other threads to reach this point before continuing –Atomic #pragma omp atomic The following statement (which must be an assignment) is executed by only one thread at a time.

Using OpenMP Synchronization Directives –Flush #pragma omp flush Thread-visible variables are written back to memory to present a consistent view across all threads –Ordered #pragma omp ordered Forces iterations of a loop to be executed in sequential order Used with the For directive –Threadprivate #pragma omp threadprivate Causes global variables to be local and persistent to a thread across multiple parallel regions

Using OpenMP Data Scope –By default, most variables are shared Loop index and subroutine stack variables are private

Using OpenMP Data scoping attributes –Private New object of the same type is created for each thread –Not initialized –Shared Shared amongst all threads –Default Allows specification of default scope (Private, Shared, or None) –Firstprivate Variable is initialized with the value from the original object

Using OpenMP Data scoping attributes –Lastprivate Original object is updated with data from last section or loop iteration –Copyin Variable in each thread is initialized with the data from the original object in the master thread –Reduction Each thread gets a private copy of the variable, and the reduction clause allows specification of an operator for combining the private copies into the final result

Using OpenMP

OpenMP Example A short OpenMP example…

References

MPI By Chris Van Horn

What is MPI? Message Passing Interface More specifically a library specification for a message passing interface

Why MPI? What are the advantages of a message passing interface? What could a message passing interface be used for?

History of MPI MPI 1.1 Before everyone had to implement own message passing interface Committee formed of around 60 people from 40 organizations

MPI 1.1 The standardization process began in April 1992 Preliminary draft submitted November 1992 Just meant to get the ball rolling

MPI 1.1 Continued Subcommittees were formed for the major component areas Goal to produce standard by Fall 1993

MPI Goals Design an application programming interface (not necessarily for compilers or a system implementation library). Allow efficient communication: Avoid memory-to-memory copying and allow overlap of computation and communication and offload to communication co-processor, where available. Allow for implementations that can be used in a heterogeneous environment. Allow convenient C and Fortran 77 bindings for the interface. Assume a reliable communication interface: the user need not cope with communication failures. Such failures are dealt with by the underlying communication subsystem.

MPI Goals Continued Define an interface that is not too different from current practice, such as PVM, NX, Express, p4, etc., and provides extensions that allow greater flexibility. Define an interface that can be implemented on many vendor's platforms, with no significant changes in the underlying communication and system software. Semantics of the interface should be language independent. The interface should be designed to allow for thread- safety

MPI 2.0 In March 1995 work began on extensions to MPI 1.1 Forward Compatibility was preserved

Goals of MPI 2.0 Further corrections and clarifications for the MPI-1.1 document. Additions to MPI-1.1 that do not significantly change its types of functionality (new datatype constructors, language interoperability, etc.). Completely new types of functionality (dynamic processes, one-sided communication, parallel I/O, etc.) that are what everyone thinks of as ``MPI-2 functionality.'' Bindings for Fortran 90 and C++. This document specifies C++ bindings for both MPI-1 and MPI-2 functions, and extensions to the Fortran 77 binding of MPI-1 and MPI-2 to handle Fortran 90 issues. Discussions of areas in which the MPI process and framework seem likely to be useful, but where more discussion and experience are needed before standardization (e.g. 0-copy semantics on shared- memory machines, real-time specifications).

How MPI is used An MPI program consists of autonomous processes The processes communicate via calls to MPI communication primitives

Features Process Management One Sided Communication Collective Operations I/O

What MPI Does not Do Resource Control Not able to design a portable interface that would be appropriate for the broad spectrum of existing and potential resource and process controllers.

Process Management Can be tricky to implement properly What to watch out for: The MPI-2 process model must apply to the vast majority of current parallel environments. These include everything from tightly integrated MPPs to heterogeneous networks of workstations. MPI must not take over operating system responsibilities. It should instead provide a clean interface between an application and system software.

Warnings continued MPI must continue to guarantee communication determinism, i.e., process management must not introduce unavoidable race conditions. MPI must not contain features that compromise performance. MPI-1 programs must work under MPI-2, i.e., the MPI-1 static process model must be a special case of the MPI-2 dynamic model.

How Issues Addressed MPI remains primarily a communication library. MPI does not change the concept of communicator.

One Sided Communication Functions that establish communication between two sets of MPI processes that do not share a communicator. When would one sided communication be useful?

One Sided Communication How are the two sets of processes going to communicate with each other? Need some sort of rendezvous point.

Collective Operations Intercommunicator collective operations All-To-All All processes contribute to the result. All processes receive the result. * MPI_Allgather, MPI_Allgatherv * MPI_Alltoall, MPI_Alltoallv * MPI_Allreduce, MPI_Reduce_scatter All-To-One All processes contribute to the result. One process receives the result. * MPI_Gather, MPI_Gatherv * MPI_Reduce

Collective Operations One-To-All One process contributes to the result. All processes receive the result. * MPI_Bcast * MPI_Scatter, MPI_Scatterv Other Collective operations that do not fit into one of the above categories. * MPI_Scan * MPI_Barrier

I/O Optimizations required for efficiency can only be implemented if the parallel I/O system provides a high-level interface

MPI Implementations Many different implementations most widely used MPICH(1.1) and MPICH2(2.0) Argonne National Laboratory

Examples To run the program ``ocean'' with arguments ``-gridfile'' and ``ocean1.grd'' in C: char command[] = "ocean"; char *argv[] = {"-gridfile", "ocean1.grd", NULL}; MPI_Comm_spawn(command, argv,...); To run the program ``ocean'' with arguments ``-gridfile'' and ``ocean1.grd'' and the program ``atmos'' with argument ``atmos.grd'' in C: char *array_of_commands[2] = {"ocean", "atmos"}; char **array_of_argv[2]; char *argv0[] = {"-gridfile", "ocean1.grd", (char *)0}; char *argv1[] = {"atmos.grd", (char *)0}; array_of_argv[0] = argv0; array_of_argv[1] = argv1; MPI_Comm_spawn_multiple(2, array_of_commands, array_of_argv,...);

More Examples /* manager */ #include "mpi.h" int main(int argc, char *argv[]) { int world_size, universe_size, *universe_sizep, flag; MPI_Comm everyone; /* intercommunicator */ char worker_program[100]; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &world_size); if (world_size != 1) error("Top heavy with management"); MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &universe_sizep, &flag);

Example Continued if (!flag) { printf("This MPI does not support UNIVERSE_SIZE. How many\n\ processes total?"); scanf("%d", &universe_size); } else universe_size = *universe_sizep; if (universe_size == 1) error("No room to start workers"); /* * Now spawn the workers. Note that there is a run-time determination * of what type of worker to spawn, and presumably this calculation must * be done at run time and cannot be calculated before starting * the program. If everything is known when the application is * first started, it is generally better to start them all at once * in a single MPI_COMM_WORLD. */ choose_worker_program(worker_program); MPI_Comm_spawn(worker_program, MPI_ARGV_NULL, universe_size-1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone, MPI_ERRCODES_IGNORE); /* * Parallel code here. The communicator "everyone" can be used * to communicate with the spawned processes, which have ranks 0,.. * MPI_UNIVERSE_SIZE-1 in the remote group of the intercommunicator * "everyone". */ MPI_Finalize(); return 0; }

Yet More Example /* worker */ #include "mpi.h" int main(int argc, char *argv[]) { int size; MPI_Comm parent; MPI_Init(&argc, &argv); MPI_Comm_get_parent(&parent); if (parent == MPI_COMM_NULL) error("No parent!"); MPI_Comm_remote_size(parent, &size); if (size != 1) error("Something's wrong with the parent"); /* * Parallel code here. * The manager is represented as the process with rank 0 in (the remote * group of) MPI_COMM_PARENT. If the workers need to communicate among * themselves, they can use MPI_COMM_WORLD. */ MPI_Finalize(); return 0; }

References MPI Standards ( unix.mcs.anl.gov/mpi/mpi- standard/mpi-report-2.0/mpi2- report.htm)