1 OpenMP—An API for Shared Memory Programming Slides are based on:

Slides:



Advertisements
Similar presentations
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Advertisements

Indian Institute of Science Bangalore, India भारतीय विज्ञान संस्थान बंगलौर, भारत Supercomputer Education and Research Centre (SERC) Adapted from: o “MPI-Message.
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.
1 Friday, November 10, 2006 “ Programs for sale: Fast, Reliable, Cheap: choose two.” -Anonymous.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.
OpenMPI Majdi Baddourah
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
INTEL CONFIDENTIAL OpenMP for Domain Decomposition Introduction to Parallel Programming – Part 5.
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
Programming with Shared Memory Introduction to OpenMP
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Introduction to OpenMP
Parallel Programming in Java with Shared Memory Directives.
UNIT -6 PROGRAMMING SHARED ADDRESS SPACE PLATFORMS THREAD BASICS PREPARED BY:-H.M.PATEL.
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
Introduction to OpenMP. OpenMP Introduction Credits:
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
OpenMP Martin Kruliš Jiří Dokulil. OpenMP OpenMP Architecture Review Board Compaq, HP, Intel, IBM, KAI, SGI, SUN, U.S. Department of Energy,…
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
OpenMP fundamentials Nikita Panov
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Threaded Programming Lecture 4: Work sharing directives.
Introduction to OpenMP
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Parallel Programming Models (Shared Address Space) 5 th week.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
Embedded Systems MPSoC Architectures OpenMP: Exercises Alberto Bosio
B. Estrade, LSU – High Performance Computing Enablement Group OpenMP II B. Estrade.
NPACI Parallel Computing Institute August 19-23, 2002 San Diego Supercomputing Center S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED.
OpenMP An API : For Writing Portable SMP Application Software Rider NCHC GTD.
Introduction to OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
Shared Memory Parallelism - OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
CS427 Multicore Architecture and Parallel Computing
Parallelize Codes Using Intel Software
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Improving Barrier Performance Dr. Xiao Qin.
Loop Parallelism and OpenMP CS433 Spring 2001
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing A bug in the rwlock program Dr. Xiao Qin.
Open[M]ulti[P]rocessing
Computer Engg, IIT(BHU)
Introduction to OpenMP
Shared-Memory Programming
Computer Science Department
Multi-core CPU Computing Straightforward with OpenMP
Parallel Programming with OpenMP
Prof. Thomas Sterling Department of Computer Science
Introduction to High Performance Computing Lecture 20
Programming with Shared Memory Introduction to OpenMP
Introduction to OpenMP
OpenMP Martin Kruliš.
Shared-Memory Paradigm & OpenMP
WorkSharing, Schedule, Synchronization and OMP best practices
Presentation transcript:

1 OpenMP—An API for Shared Memory Programming Slides are based on:

2 Case for moving beyond MPI Generally speaking, MPI is doing great. But most agree “there has to be a better way” Functional and domain decomposition is manual, which makes things complicated Case for moving beyond MPI: Low-level programming: Require high-level parallelism constructs Steep learning curve and complex application programming: Was MPI meant to write applications? Will MPI scale with next-generation peta-flops machines, which will become reality in the near future: How will MPI deal with the emergence of multi-core systems? Fault-tolerance in MPI: The frequency of hardware failures on large clusters is high Possibly, the MPI-2 specifications need to clarify the semantics of one- sided communication

3 Lines of code for Matrix Multiplication MPIUPCOpenMP Source- code lines

4 Introduction to OpenMP Formulated in 1997 as an API for writing portable, multithreaded applications: Bindings available for Fortran, C, and C++ A Java API is also available Considered as a standard for programming shared memory parallel computers Shared memory standard Shared memory parallelism Reasons of popularity Provide capability to incrementally parallelize a serial program Easy-to-use The serial code is not changed for doing parallelization Require a compiler that supports OpenMP: Most modern compilers support OpenMP: Visual C Intel compilers gcc 4.1 and above Sun compilers

5 Programming Model Shared memory model All CPUs have access to the same globally shared memory Data can be shared or private Data transfer is transparent to the programmer: No need to explicitly call send() or recv() methods like MPI

6 OpenMP HelloWorld Program #include # define N 4 void main() { int arr[N]; #pragma omp parallel for for (int i=0;i<N;i++) { arr[i]=0; } }

7 Compile and Execute “HelloWorld” Export environment variable: Telling how many threads should be started Normally number of threads is equal to number of processors in the system # export OMP_NUM_THREADS=4 On Sun Solaris compile your program with gcc: # cc –xopenmp –o hello hello.c Execute the code #./hello

8 Iteration 0 Iteration 1 Iteration 2 Iteration 3

9 Thread-based Parallelism The Master thread executes the serial regions of the code The Master thread spawns additional threads to execute parallel regions

10 Fork-join Model

11 Components of OpenMP Comprised of three primary API components: Compiler Directives (Pragmas): Creating threads Sharing the work among threads Synchronizing the threads Runtime Library Routines: Setting and querying thread attributes Environment Variables: Controlling the behavior of the parallel program at runtime

12 Pragmas  Pragmas are compiler directives that direct the compiler to parallelize sections of code #pragma omp [clause[ [, ]clause]...]  Where the directive can be: ◦ parallel ◦ for ◦ parallel for ◦ section ◦ single  Clauses are optional modifiers of the directives and affect their behavior

13 parallel for directive #pragma omp parallel #pragma omp for for (int i=0;i<100;i++) { printf (“iter ”,i); } #pragma omp parallel for for (int i=0;i<100;i++) { printf (“iter ”,i); }

14 parallel for directive #include void main (void) { int i=0, iam=0, np=1; omp_set_dynamic(0); omp_set_num_threads(4); int arr [16]; #pragma omp parallel { np=omp_get_max_threads(); #pragma omp for schedule (static, 4) for (i=0;i<16;i++) { iam=omp_get_thread_num(); arr[i]=iam; printf("%d",arr[i]); } Parallel for End parallel

15 Work-sharing construct:loop #pragma omp parallel #pragma omp for for (int i=0;i<100;i++) printf (“hello world”); #pragma omp parallel for (int i=0;i<100;i++) printf(“hello world”); #pragma omp parallel for for (int i=0;i<100;i++) printf (“hello world”);

16 Work-sharing construct:loop No jump statements from inside the loop to outside the loop are allowed If goto or break are used they must jump within the loop Exceptions must be caught within the loop

17 The schedule clause  The schedule clause specifies how ◦ iterations are divided into chunks ◦ How chunks are assigned to threads  Schedule (static) ◦ Chunks assigned in a round robin fashion ◦ Iterations divided among CPUs in contiguous chunks  Schedule (dynamic) ◦ Chunks assigned as threads request them or as CPUs become available  We can also specify the size of the chunk for static and dynamic clauses

18 Data scope  SHARED - variable is shared by all processors  PRIVATE - each processor has a private copy of a variable #PRAGMA OMP PARALLEL FOR SHARED(A,B,C,N) PRIVATE(i) for(i=0;i<n:i++) B(i) = A(i) + C(i)  All CPUs have access to the same storage area for A, B, C and n  but each thread needs its own private value of the loop index i.

19 Data scope By default, all the variables in the parallel region are shared with three exceptions The loop index in parallel for loops Variables that are local to the parallel block Any variables listed in the private, firstprivate, lastprivate or reduction clause are private

20 Data scope #PRAGMA OMP PARALLEL FOR SHARED(A,C,n) PRIVATE(i,temp) for(i=0;i<n:i++) temp = A(i) + C(i) In this loop, each processor needs its own private copy of the variable TEMP. If TEMP were shared, the result would be unpredictable since multiple processors would be writing to the same memory location.

21 Environment variables OpenMP provides four environment variables for controlling the execution of parallel code All environment variable names are uppercase The values assigned to them are not case sensitive

22 Environment variables OMP_NUM_THREADS Sets the maximum number of threads to use during execution export OMP_NUM_THREADS=4 OMP_SCHEDULE Applies only to loop directives that have their schedule clause set to “runtime” The value of this variable determines how iterations of the loop are scheduled on processors export OMP_SCHEDULE ="dynamic"

23 Environment variables OMP_DYNAMIC Enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. Valid values are TRUE or FALSE export OMP_DYNAMIC =TRUE OMP_NESTED Enables or disables nested parallelism Valid values are TRUE or FALSE export OMP_NESTED=TRUE

24 Runtime library routines Used primarily to set and retrieve information about the environment and thread attributes There are three broad classes of runtime routines Execution environment routines Synchronization routines Timing routines All of the OpenMP routines begin with omp_ Defined in the omp.h

25 void omp_set_num_threads(int num_threads) int omp_get_num_threads(void) int omp_get_max_threads(void) int omp_get_thread_num(void) int omp_get_num_procs(void) int omp_in_parallel(void) void omp_set_dynamic(int dynamic_threads) int omp_get_dynamic(void) void omp_set_nested(int nested)

26 Internal control variables Store info for determining the number of threads being used for a parallel region How to schedule a work sharing loop Initialized by the implementation itself and may be given values by using environment variables Calling OpenMP library routines The values are retrieved by OpenMP library routines

27 Internal control variables Control variableWays to modifyWays to retrieve value nthreads-varOMP_NUM_THREADS omp_set_num_threads() omp_get_max_threads() dyn-varOMP_DYNAMIC omp_set_dynamic() omp_get_dynamic() nest-varOMP_NESTED omp_set_nested() omp_get_nested() run-sched-varOMP_SCHEDULENone def-sched-varnoneNone

28 Matrix Multiplication using OpenMP #include void main() {.. omp_set_num_threads(10); #pragma omp parallel for private(temp), schedule(static) for (i=0; i<N; i++) { for (j=0; j<N; j++) { temp = 0; for (k=0; k<N; k++) temp += a[i][k] * b[k][j]; c[i][j] = temp; }

29 Questions