Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openMP/#Introduction)http://www.llnl.gov/computing/tutorials/openMP/#Introduction.

Slides:



Advertisements
Similar presentations
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Advertisements

1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.
Indian Institute of Science Bangalore, India भारतीय विज्ञान संस्थान बंगलौर, भारत Supercomputer Education and Research Centre (SERC) Adapted from: o “MPI-Message.
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
OpenMP Mark Reed UNC Research Computing
1 OpenMP—An API for Shared Memory Programming Slides are based on:
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.
OpenMPI Majdi Baddourah
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
Programming with Shared Memory Introduction to OpenMP
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
Introduction to OpenMP
Steve Lantz Computing and Information Science In-Class “Guerrilla” Development of MPI Examples Week 5 Lecture Notes.
Parallel Programming in Java with Shared Memory Directives.
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
Introduction to OpenMP. OpenMP Introduction Credits:
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
OpenMP Martin Kruliš Jiří Dokulil. OpenMP OpenMP Architecture Review Board Compaq, HP, Intel, IBM, KAI, SGI, SUN, U.S. Department of Energy,…
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
OpenMP fundamentials Nikita Panov
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Introduction to OpenMP
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,
Parallel Programming Models (Shared Address Space) 5 th week.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-2. Environment Variables OMP_NUM_THREADS OMP_SCHEDULE.
Heterogeneous Computing using openMP lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University
B. Estrade, LSU – High Performance Computing Enablement Group OpenMP II B. Estrade.
NPACI Parallel Computing Institute August 19-23, 2002 San Diego Supercomputing Center S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED.
OpenMP An API : For Writing Portable SMP Application Software Rider NCHC GTD.
MPSoC Architectures OpenMP Alberto Bosio
Introduction to OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
Martin Kruliš Jiří Dokulil
Shared Memory Parallelism - OpenMP
Parallelize Codes Using Intel Software
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Improving Barrier Performance Dr. Xiao Qin.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing A bug in the rwlock program Dr. Xiao Qin.
Open[M]ulti[P]rocessing
Computer Engg, IIT(BHU)
Introduction to OpenMP
Shared-Memory Programming
September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.
Computer Science Department
Programming with Shared Memory
Introduction to High Performance Computing Lecture 20
Programming with Shared Memory Introduction to OpenMP
Introduction to OpenMP
OpenMP Martin Kruliš.
OpenMP Parallel Programming
Shared-Memory Paradigm & OpenMP
Parallel Programming with OPENMP
Presentation transcript:

Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial ( OpenMP sc99 tutorial presentation (openmp.org) Dr. Eric Strohmaier (University of Tennessee, CS594 class, Feb 9, 2000)

Introduction  An API for multi-threaded shared memory parallelism  A specification for a set of compiler directives, library routines, and environment variables – standardizing pragmas  Standardized by a group of hardware and software vendors  Both fine-grain and coarse-grain parallelism (orphaned directives)  Much easier to program than MPI

History  Many different vendors provided their own ways of compiler directives for shared memory programming using threads  OpenMP standard started in 1997  October 1997: Fortran version 1.0  Late 1998: C/C++ version 1.0  June 2000: Fortran version 2.0  April 2002: C/C++ version 2.0

Introduction  Mainly supports loop-level parallelism  Follows fork-join model  The number of threads can be varied from one region to another  Based on compiler directives  User’s responsibility to ensure program correctness, avoiding deadlocks etc.

Execution Model  Begins as a single thread called master thread  Fork: When parallel construct is encountered, team of threads are created  Statements in the parallel region are executed in parallel  Join: At the end of the parallel region, the team threads synchronize and terminate

Definitions  Construct – statement containing directive and structured block  Directive - #pragma Based on C #pragma directives #pragma omp directive-name [clause [, clause] …] new-line Example: #pragma omp parallel default(shared) private(beta,pi)

Types of constructs, Calls, Variables  Work-sharing constructs  Synchronization constructs  Data environment constructs  Library calls, environment variables

parallel construct #pragma omp parallel [clause [, clause] …] new-line structured-block Clause:

Parallel construct  Parallel region executed by multiple threads  Implied barrier at the end of parallel section  If num_threads, omp_set_num_threads(), OMP_SET_NUM_THREADS not used, then number of created threads is implementation dependent  Number of threads can be dynamically adjusted using omp_set_dynamic() or OMP_DYNAMIC  Number of physical processors hosting the thread also implementation dependent  Threads numbered from 0 to N-1  Nested parallelism by embedding one parallel construct inside another

Parallel construct - Example #include main () { int nthreads, tid; #pragma omp parallel private(nthreads, tid) { printf("Hello World \n); }

Work sharing construct  For distributing the execution among the threads that encounter it  3 types – for, sections, single

for construct  For distributing the iterations among the threads #pragma omp for [clause [, clause] …] new-line for-loop Clause:

for construct  Restriction in the structure of the for loop so that the compiler can determine the number of iterations – e.g. no branching out of loop  The assignment of iterations to threads depend on the schedule clause  Implicit barrier at the end of for if not nowait

schedule clause 1.schedule(static, chunk_size) – iterations/chunk_size chunks distributed in round-robin 2.schedule(dynamic, chunk_size) – chunk_size chunk given to the next ready thread 3.schedule(guided, chunk_size) – actual chunk_size is unassigned_iterations/(threads*chunk_size) to the next ready thread. Thus exponential decrease in chunk sizes 4.schedule(runtime) – decision at runtime. Implementation dependent

for - Example include #define CHUNKSIZE 100 #define N 1000 main () { int i, chunk; float a[N], b[N], c[N]; /* Some initializations */ for (i=0; i < N; i++) a[i] = b[i] = i * 1.0; chunk = CHUNKSIZE; #pragma omp parallel shared(a,b,c,chunk) private(i) { #pragma omp for schedule(dynamic,chunk) nowait for (i=0; i < N; i++) c[i] = a[i] + b[i]; } /* end of parallel section */ }

sections construct  For distributing non-iterative sections among threads  Clause:

sections - Example

single directive  Only a single thread can execute the block

Single - Example

Synchronization directives

critical - Example #include main() { int x; x = 0; #pragma omp parallel shared(x) { #pragma omp critical x = x + 1; }

atomic - Example

flush directive  Point where consistent view of memory is provided among the threads  Thread-visible variables (global variables, shared variables etc.) are written to memory  If var-list is used, only variables in the list are flushed

flush - Example

flush – Example (Contd…)

ordered - Example

Data Environment Global variable-list declared are made private to a thread Each thread gets its own copy Persist between different parallel regions #include int alpha[10], beta[10], i; #pragma omp threadprivate(alpha) main () { /* Explicitly turn off dynamic threads */ omp_set_dynamic(0); /* First parallel region */ #pragma omp parallel private(i,beta) for (i=0; i < 10; i++) alpha[i] = beta[i] = i; /* Second parallel region */ #pragma omp parallel printf("alpha[3]= %d and beta[3]= %d\n",alpha[3],beta[3]); }

Data Scope Attribute Clauses Most variables are shared by default Data scopes explicitly specified by data scope attribute clauses Clauses: 1.private 2.firstprivate 3.lastprivate 4.shared 5.default 6.reduction 7.copyin 8.copyprivate

private, firstprivate & lastprivate  private (variable-list)  variable-list private to each thread  A new object with automatic storage duration allocated for the construct  firstprivate (variable-list)  The new object is initialized with the value of the old object that existed prior to the construct  lastprivate (variable-list)  The value of the private object corresponding to the last iteration or the last section is assigned to the original object

private - Example

lastprivate - Example

shared, default, reduction  shared(variable-list)  default(shared | none)  Specifies the sharing behavior of all of the variables visible in the construct  Reduction(op: variable-list)  Private copies of the variables are made for each thread  The final object value at the end of the reduction will be combination of all the private object values

default - Example

reduction - Example #include main () { int i, n, chunk; float a[100], b[100], result; /* Some initializations */ n = 100; chunk = 10; result = 0.0; for (i=0; i < n; i++) { a[i] = i * 1.0; b[i] = i * 2.0; } #pragma omp parallel for \ default(shared) private(i) \ schedule(static,chunk) \ reduction(+:result) for (i=0; i < n; i++) result = result + (a[i] * b[i]); printf("Final result= %f\n",result); }

copyin, copyprivate  copyin(variable-list)  Applicable to threadprivate variables  Value of the variable in the master thread is copied to the individual threadprivate copies  copyprivate(variable-list)  Appears on a single directive  Variables in variable-list are broadcast to other threads in the team from the thread that executed the single construct

copyprivate - Example

Library Routines (API)  Querying function (number of threads etc.)  General purpose locking routines  Setting execution environment (dynamic threads, nested parallelism etc.)

API  OMP_SET_NUM_THREADS(num_threads)  OMP_GET_NUM_THREADS()  OMP_GET_MAX_THREADS()  OMP_GET_THREAD_NUM()  OMP_GET_NUM_PROCS()  OMP_IN_PARALLEL()  OMP_SET_DYNAMIC(dynamic_threads)  OMP_GET_DYNAMIC()  OMP_SET_NESTED(nested)  OMP_GET_NESTED()

API(Contd..)  omp_init_lock(omp_lock_t *lock)  omp_init_nest_lock(omp_nest_lock_t *lock)  omp_destroy_lock(omp_lock_t *lock)  omp_destroy_nest_lock(omp_nest_lock_t *lock)  omp_set_lock(omp_lock_t *lock)  omp_set_nest_lock(omp_nest_lock_t *lock)  omp_unset_lock(omp_lock_t *lock)  omp_unset_nest__lock(omp_nest_lock_t *lock)  omp_test_lock(omp_lock_t *lock)  omp_test_nest_lock(omp_nest_lock_t *lock)  omp_get_wtime()  omp_get_wtick()

Lock details  Simple locks and nestable locks  Simple locks are not locked if they are already in a locked state  Nestable locks can be locked multiple times by the same thread  Simple locks are available if they are unlocked  Nestable locks are available if they are unlocked or owned by a calling thread

Example – Lock functions

Example – Nested lock

Example – Nested lock (Contd..)

Hybrid Programming – Combining MPI and OpenMP benefits  MPI - explicit parallelism, no synchronization problems - suitable for coarse grain  OpenMP - easy to program, dynamic scheduling allowed - only for shared memory, data synchronization problems  MPI/OpenMP Hybrid - Can combine MPI data placement with OpenMP fine- grain parallelism - Suitable for cluster of SMPs (Clumps) - Can implement hierarchical model