CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.

CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois

PARALLEL DO: Restrictions Spring 2010CPE 779 Parallel Computing2  The number of times that the loop body is executed (trip- count) must be available at runtime before the loop is executed.  The loop body must be such that all iterations of the loop are completed

PARALLEL Directive: Syntax Spring 2010CPE 779 Parallel Computing3 Fortran: !$omp parallel [clause [,] [clause …]] structured block !$omp end parallelC/C++: #pragma omp parallel [clause [clause …]] structured block

Parallel Regions Details Spring 2010CPE 779 Parallel Computing4  When a parallel directive is encountered, threads are spawned which execute the code of the enclosed structured block (the parallel region).  The number of threads can be specified just like for the parallel do directive.  The parallel region is replicated and each thread executes a copy of the replicated region.

Example Spring 2010CPE 779 Parallel Computing5 omp_set_num_threads(4) pooh(1,A)pooh(2,A)pooh(3,A) printf(“all done\n”); pooh(0,A) double A[1000]; ID = omp_thread_num() double A[1000]; omp_set_num_threads(4); #pragma omp parallel { int ID = omp_thread_num(); pooh(ID, A); } printf(“all done\n”);

PARALLEL vs PARALLEL DO Spring 2010CPE 779 Parallel Computing6  Arbitrary structured blocks v/s loops  Coarse grained v/s fine grained  Replication v/s work division !$omp parallel do I = 1,10 print *, ‘Hello world’, I enddo !$omp end parallel !$omp parallel do do I = 1,10 print *, ‘Hello world’, I enddo Output: 10 Hello world messages Output: 10*T Hello world messages where T = number of threads

Synchronization - Motivation Spring 2010CPE 779 Parallel Computing7  Concurrent access to shared data may result in data inconsistency - mechanism required to maintain data consistency : mutual exclusion  Sometimes code sections executed by different threads need to be sequenced in some particular order : event synchronization

Mutual Exclusion Spring 2010CPE 779 Parallel Computing8  Mechanisms for ensuring the consistency of data that is accessed concurrently by several threads  Critical directive: specifies a region of code that must be executed by only one thread at a time.  Atomic directive: specifies that a specific memory location must be updated atomically, rather than letting multiple threads attempt to write to it.  Library lock routines

Critical Section: Syntax Spring 2010CPE 779 Parallel Computing9 Fortran: !$omp critical [(name)] structured block !$omp end critical [(name)]C/C++: #pragma omp critical [(name)] structured block

Example Spring 2010CPE 779 Parallel Computing10 cur_max = MINUS_INFINITY !$omp parallel do do i = 1, n … !$OMP CRITICAL if (a(i).gt. cur_max) then cur_max = a(i) endif !$OMP END CRITICAL …enddo

Atomic Directive Spring 2010CPE 779 Parallel Computing11  The body of an atomic directive is a single assignment statement.  There are restrictions on the statement which insure that it can be translated into an atomic sequence of machine instructions to read, modify and write a memory location.  An atomic statement must follow a specific syntax. See the most recent OpenMP specs for this

Example Spring 2010CPE 779 Parallel Computing12 C$OMP PARALLEL PRIVATE(B) B = DOIT(I) C$OMP ATOMIC X = X + B C$OMP END PARALLEL C$OMP PARALLEL PRIVATE(B) B = DOIT(I) C$OMP CRITICAL(XB) X = X + B C$OMP END CRITICAL(XB) C$OMP END PARALLEL

Library Lock routines Spring 2010CPE 779 Parallel Computing13  Routines to:  create a lock - omp_init_lock  acquire a lock, waiting until it becomes available if necessary - omp_set_lock  release a lock, resuming a waiting thread, if one exists - omp_unset_lock  try and acquire a lock but return instead of waiting if not available - omp_test_lock  destroy a lock - omp_destroy_lock

Example Spring 2010CPE 779 Parallel Computing14 omp_lock_t lck; omp_init_lock(&lck); #pragma omp parallel private (tmp) { id = omp_get_thread_num(); tmp = do_lots_of_work(id); omp_set_lock(&lck); printf(“%d %d”, id, tmp); omp_unset_lock(&lck); }

Library Lock Routines Spring 2010CPE 779 Parallel Computing15  Locks are the most flexible of the mutual exclusion primitives because the there are no restrictions on where they can be placed.  The previous routines don’t support nested acquires - deadlock if tried!! - a separate set of routines exist to allow nesting.  Nesting of locks is useful for code like recursive routines.

Mutual Exclusion Features Spring 2010CPE 779 Parallel Computing16  Apply to critical, atomic as well as library routines:  NO Fairness guarantee.  Guarantee of Progress.  Careful when nesting - lots of chances for deadlock.

CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.

Similar presentations

Presentation on theme: "CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.

Similar presentations

Presentation on theme: "CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois."— Presentation transcript:

Similar presentations

About project

Feedback