CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois
PARALLEL DO: Restrictions Spring 2010CPE 779 Parallel Computing2 The number of times that the loop body is executed (trip- count) must be available at runtime before the loop is executed. The loop body must be such that all iterations of the loop are completed
PARALLEL Directive: Syntax Spring 2010CPE 779 Parallel Computing3 Fortran: !$omp parallel [clause [,] [clause …]] structured block !$omp end parallelC/C++: #pragma omp parallel [clause [clause …]] structured block
Parallel Regions Details Spring 2010CPE 779 Parallel Computing4 When a parallel directive is encountered, threads are spawned which execute the code of the enclosed structured block (the parallel region). The number of threads can be specified just like for the parallel do directive. The parallel region is replicated and each thread executes a copy of the replicated region.
Example Spring 2010CPE 779 Parallel Computing5 omp_set_num_threads(4) pooh(1,A)pooh(2,A)pooh(3,A) printf(“all done\n”); pooh(0,A) double A[1000]; ID = omp_thread_num() double A[1000]; omp_set_num_threads(4); #pragma omp parallel { int ID = omp_thread_num(); pooh(ID, A); } printf(“all done\n”);
PARALLEL vs PARALLEL DO Spring 2010CPE 779 Parallel Computing6 Arbitrary structured blocks v/s loops Coarse grained v/s fine grained Replication v/s work division !$omp parallel do I = 1,10 print *, ‘Hello world’, I enddo !$omp end parallel !$omp parallel do do I = 1,10 print *, ‘Hello world’, I enddo Output: 10 Hello world messages Output: 10*T Hello world messages where T = number of threads
Synchronization - Motivation Spring 2010CPE 779 Parallel Computing7 Concurrent access to shared data may result in data inconsistency - mechanism required to maintain data consistency : mutual exclusion Sometimes code sections executed by different threads need to be sequenced in some particular order : event synchronization
Mutual Exclusion Spring 2010CPE 779 Parallel Computing8 Mechanisms for ensuring the consistency of data that is accessed concurrently by several threads Critical directive: specifies a region of code that must be executed by only one thread at a time. Atomic directive: specifies that a specific memory location must be updated atomically, rather than letting multiple threads attempt to write to it. Library lock routines
Critical Section: Syntax Spring 2010CPE 779 Parallel Computing9 Fortran: !$omp critical [(name)] structured block !$omp end critical [(name)]C/C++: #pragma omp critical [(name)] structured block
Example Spring 2010CPE 779 Parallel Computing10 cur_max = MINUS_INFINITY !$omp parallel do do i = 1, n … !$OMP CRITICAL if (a(i).gt. cur_max) then cur_max = a(i) endif !$OMP END CRITICAL …enddo
Atomic Directive Spring 2010CPE 779 Parallel Computing11 The body of an atomic directive is a single assignment statement. There are restrictions on the statement which insure that it can be translated into an atomic sequence of machine instructions to read, modify and write a memory location. An atomic statement must follow a specific syntax. See the most recent OpenMP specs for this
Example Spring 2010CPE 779 Parallel Computing12 C$OMP PARALLEL PRIVATE(B) B = DOIT(I) C$OMP ATOMIC X = X + B C$OMP END PARALLEL C$OMP PARALLEL PRIVATE(B) B = DOIT(I) C$OMP CRITICAL(XB) X = X + B C$OMP END CRITICAL(XB) C$OMP END PARALLEL
Library Lock routines Spring 2010CPE 779 Parallel Computing13 Routines to: create a lock - omp_init_lock acquire a lock, waiting until it becomes available if necessary - omp_set_lock release a lock, resuming a waiting thread, if one exists - omp_unset_lock try and acquire a lock but return instead of waiting if not available - omp_test_lock destroy a lock - omp_destroy_lock
Example Spring 2010CPE 779 Parallel Computing14 omp_lock_t lck; omp_init_lock(&lck); #pragma omp parallel private (tmp) { id = omp_get_thread_num(); tmp = do_lots_of_work(id); omp_set_lock(&lck); printf(“%d %d”, id, tmp); omp_unset_lock(&lck); }
Library Lock Routines Spring 2010CPE 779 Parallel Computing15 Locks are the most flexible of the mutual exclusion primitives because the there are no restrictions on where they can be placed. The previous routines don’t support nested acquires - deadlock if tried!! - a separate set of routines exist to allow nesting. Nesting of locks is useful for code like recursive routines.
Mutual Exclusion Features Spring 2010CPE 779 Parallel Computing16 Apply to critical, atomic as well as library routines: NO Fairness guarantee. Guarantee of Progress. Careful when nesting - lots of chances for deadlock.