Introduction to OpenMP Part II White Rose Grid Computing Training Series Deniz Savas, Alan Real, Mike Griffiths RTP Module February 2012.

Slides:

Advertisements

Similar presentations

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.

Advertisements

Introduction to Openmp & openACC

Scheduling and Performance Issues for Programming using OpenMP

Concurrency Important and difficult (Ada slides copied from Ed Schonberg)

Mutual Exclusion.

NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.

Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.

PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu

1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.

Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.

1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.

1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.

A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.

INTEL CONFIDENTIAL Confronting Race Conditions Introduction to Parallel Programming – Part 6.

A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.

Python quick start guide

Programming with Shared Memory Introduction to OpenMP

Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.

Parallel Programming in Java with Shared Memory Directives.

OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

Nachos Phase 1 Code -Hints and Comments

High Performance Parallel Programming Dirk van der Knijff Advanced Research Computing Information Division.

ECE 1747 Parallel Programming Shared Memory: OpenMP Environment and Synchronization.

1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.

Lecture 2 Process Concepts, Performance Measures and Evaluation Techniques.

OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.

1 Concurrent Languages – Part 1 COMP 640 Programming Languages.

OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Introduction to Concurrency.

CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.

Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j

Games Development 2 Concurrent Programming CO3301 Week 9.

1 Workshop Topics - Outline Workshop 1 - Introduction Workshop 2 - module instantiation Workshop 3 - Lexical conventions Workshop 4 - Value Logic System.

OpenMP fundamentials Nikita Panov

High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.

Threaded Programming Lecture 4: Work sharing directives.

10/02/2012CS4230 CS4230 Parallel Programming Lecture 11: Breaking Dependences and Task Parallel Algorithms Mary Hall October 2,

Introduction to OpenMP

Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.

9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,

Introduction to Loops Iteration Repetition Counting Loops Also known as.

Threaded Programming Lecture 2: Introduction to OpenMP.

3/12/2013Computer Engg, IIT(BHU)1 OpenMP-3. OMP_INIT_LOCK OMP_INIT_NEST_LOCK Purpose: ● This subroutine initializes a lock associated with the lock variable.

3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,

Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.

1 Critical Section Problem CIS 450 Winter 2003 Professor Jinhua Guo.

CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.

CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.

LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.

OpenMP – Part 2 * *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)

1 Programming with Shared Memory - 2 Issues with sharing data ITCS 4145 Parallel Programming B. Wilkinson Jan 22, _Prog_Shared_Memory_II.ppt.

OpenMP Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.

B. Estrade, LSU – High Performance Computing Enablement Group OpenMP II B. Estrade.

Introduction to OpenMP

Open[M]ulti[P]rocessing

Computer Engg, IIT(BHU)

Introduction to OpenMP

CS4230 Parallel Programming Lecture 12: More Task Parallelism Mary Hall October 4, /04/2012 CS4230.

Java Programming Arrays

Introduction to High Performance Computing Lecture 20

Threading And Parallel Programming Constructs

Memory Management Tasks

Programming with Shared Memory Introduction to OpenMP

Shared Memory Programming

Dr. Mustafa Cem Kasapbaşı

Introduction to OpenMP

Shared-Memory Paradigm & OpenMP

Presentation transcript:

Introduction to OpenMP Part II White Rose Grid Computing Training Series Deniz Savas, Alan Real, Mike Griffiths RTP Module February 2012

Synchronisation Pitfalls when using shared variables (Race Conditions) A variable that is used (read from) but never updated (written to) can safely be declared as a shared variable in a parallel region. Problems arise when the above rule is violated by attempting to change the value of any shared variable within the parallel region. Such problems are known as data-race problems and should be avoided at the programming level. However, for situations where avoidance is not possible or efficient, there are a variety of OMP directives that can be used for resolving these problems. These are BARRIER,ATOMIC,CRITICAL and FLUSH which we will discussed later.

Synchronisation example a=a+1 on 2 threads where a is a shared variable load a add a 1 store a Program load a add a 1 store a Program Private data Case 1 (thread 2 behind thread 1): a=12 Thread 1Thread 2 Shared data

Synchronisation example a=a+1 on 2 threads where a is a shared variable load a add a 1 store a Program load a add a 1 store a Program Private data Case 1 (thread 2 behind thread 1): a=12 Case 2 (thread 2 at similar time to thread 1): a=11 Thread 1Thread 2 Shared data 1011

Synchronization related directives We have seen the potential problems arising from the interaction of multiple threads, particularly the race conditions when multiple threads attempt to write to the same shared variable simultaneously. Such events may render our results useless, being determined by the toss of a coin, according to which thread runs ahead of which one. The following set of OMP directives, namely; CRITICAL, BARRIER ATOMIC and FLUSH directives help us to avoid these synchronization related problems.

OMP Barrier Syntax C: #pragma omp barrier Fortran: !#omp barrier This directive defines a marker where all threads must reach before the execution of the program continues. It may be a useful tool in circumstances where you need to ensure that the work relating to one set of tasks are completed before embarking on a new set of tasks. Beware, overuse of this feature may reduce efficiency. It may also give rise to DEADLOCK situations Never-the-less it is very useful to ensure correct working of complex programs Most of the work sharing directives have an implied barrier at the end of their block ( unless NOWAIT is used). I.e. OMP END DO, OMP END SECTIONS, OMP END WORKSHARE. Note that they do not have an implied barrier at the beginning, only at the end unless a no wait is specified : I.e. !$OMP END WORKSHARE NOWAIT

OMP BARRIER To avoid deadlocks, NEVER use $OMP BARRIER inside any of these blocks ! !$OMP MASTER.... !$OMP END MASTER !$OMP SECTIONS.... !$OMP END SECTIONS !$OMP CRITICAL.... !$OMP END CRITICAL !$OMP SINGLE.... !$OMP END SINGLE

NOWAIT clause We have seen during the earlier discussion of the BARRIER statement that the directives END DO/FOR, END SECTIONS, END SINGLE and END WORKSHARE all imply a barrier where executing threads must wait until everyone of them finished their work and arrived there. The NOWAIT clause of the above mentioned statements remove this restriction to allow the earlier finishing threads to proceed straight onto the instructions following the work sharing construct without having to waiting for the other threads to catch up. This will reduce the amount of idle periods and increase efficiency but at the risk of producing wrong results! SO BE VERY CAREFUL! Syntax: –Fortran: !$OMP DO do loop !$OMP END DO NOWAIT –C/C++: #pragma omp for nowait for loop Similar for END SECTIONS, END SINGLE and END WORKSHARE

NOWAIT example Two loops with no dependencies will present an ideal opportunity for the NOWAIT clause. !$OMP PARALLEL !$OMP DO do j=1,n a(j) = c * b(j) end do !$OMP ENDDO NOWAIT !$OMP DO do i=1,m x(i) = sqrt(y(i)) * 2.0 end do !$OMP END PARALLEL

NOWAIT warning Use with EXTREME CAUTION Too easy to remove a barrier which is necessary. Results in non-deterministic behaviour: –Sometimes the right result –Sometimes wrong results –Behaviour changes under debugger Possibly a good coding style to use NOWAIT everywhere and make all barriers explicit –Not done in practice.

NOWAIT warning example !$OMP DO do j=1,n a(j)=b(j) + c(j) end do !$OMP END DO !$OMP DO do j=1,n d(j)=e(j) * f end do $OMP END DO !$OMP DO do j=1,n z(j) = (a(j) + a(j+1)) * 0.5 end do a(j+1) could be updated by a different thread to a(j) Can remove the first barrier but not the second as there is a dependency on a( )

OMP CRITICAL ( Mutual Exclusion ) A thread waits at the start of a critical section until no other thread is executing a section with the same critical name. This construct can be utilised to mark sections of the code that may, for example change global flags etc., once a particular task is performed so as not to repeat the same work again. It is also useful for sectioning-off code such as updating of heaps and stacks, where simultaneous updating by competing threads may prove disastrous! The OMP ATOMIC directive becomes a better choice if the synchronization worries are related to a specific memory location.

OMP CRITICAL EXAMPLE !$OMP PARALLEL SHARED( MYDATA ) !$OMP CRITICAL updatepart ! Perform operations on the global/shared array WORK ! Which redefines WORK and then sets new flags to ! indicate what the next call to partition will see in MYDATA. CALL PARTITION ( I, MYDATA) !$OMP END CRITICAL updatepart ! Now perform the work, that can be done in isolation ! Without affecting the other threads CALL SOLVE(MYDATA) $OMP END PARALLEL

OMP Atomic Unlike most of the other OMP directives, this is a directive that applies to a single statement immediately following itself ‘rather than a block of statements’. It ensures that a specific shared memory location is updated atomically to avoid it been exposed to the possibility of simultaneous writes that may give rise to race conditions. May be more efficient than using CRITICAL directives e.g. if different array elements can be protected separately. By using the atomic directive we can be confident that no race situation will arise while evaluating an expression and updating a variable it is assigned to. Note that ATOMIC directive does not impose any conditions on the order in which each thread will execute the statement, it merely ensures that no two threads will execute it simultaneously. See OMP ORDERED later.

ATOMIC directive Syntax Syntax: –Fortran !$OMP ATOMIC statement where statement must be one of ; x=x op(expr), x= (expr)op x, x=intr(x,expr) or x=intr(expr,x) x is a scalar shared variable and op is one of +,*,-,/,.and.,.or.,.eqv.,.neqv. intr is one of MAX,MIN,IAND,IOR or IEOR intrinsic functions. –C #pragma omp atomic statement where statement must be one of ; x binop= expr, x++, ++x, x–- or –-x binop is one of +,*,-,/,&,^, > and expr is an expression of scalar type that does not reference the object designated by x.

ATOMIC example !$OMP PARALLEL DO PRIVATE(xlocal,ylocal) DO i=1,n call work(xlocal,ylocal) !$OMP ATOMIC x(index(i))=x(index(i))+xlocal y(i)=y(i)+ylocal END DO Prevents simultaneous updates of an element of x by multiple threads. ATOMIC directives allows different elements of x to be updated simultaneously. – CRITICAL region would “serialise” the update. Note: update on y is not atomic as ATOMIC only applies to the statement that immediately follows the directive.

Lock Routines Occasionally need more flexibility than offered by CRITICAL and ATOMIC directives. (Although not as easy to use) –A lock is a special variable that may be set by a thread. –No other thread may unset the lock until the thread which set the lock has unset it. –Setting a lock may be blocking ‘set_lock’ or non-blocking ‘test_lock’. –A lock must be initialised before it is used and may be destroyed when no longer required. –Lock variables should not be used for any other purpose.

Syntax Fortran: SUBROUTINE OMP_INIT_LOCK(var) SUBROUTINE OMP_SET_LOCK(var) LOGICAL FUNCTION OMP_TEST_LOCK(var) SUBROUTINE OMP_UNSET_LOCK(var) SUBROUTINE OMP_DESTROY_LOCK(var) var should be an INTEGER of the same size as addresses (e.g. INTEGER*8 on 64-bit machine). C/C++: #include void omp_init_lock(omp_lock_t *lock); void omp_set_lock(omp_lock_t *lock); int omp_test_lock(omp_lock_t *lock); void omp_unset_lock(omp_lock_t *lock); void omp_destroy_lock(omp_lock_t *lock);

Lock example call omp_init_lock(ilock) !$OMP PARALLEL SHARED(ilock) : do while (.not. omp_test_lock(ilock)) call something_else() end do call work() call omp_unset_lock(ilock) : !$OMP END PARALLEL OMP_TEST_LOCK will set a lock if it is not set.

FLUSH directive Ensures that a variable is written to/read from main memory. Variable will be flushed out of the register file (and usually out of cache). –Also called a memory fence. Allows use of “normal” variables for synchronisation. Avoids the need for use of volatile type qualifiers in this context.

FLUSH syntax Fortran: !$OMP FLUSH [(list)] C/C++: #pragma omp flush [(list)] –list specifies a list of variables to be flushed. If no list is present all shared variables are flushed. FLUSH directives are implied by a BARRIER, at entry and exit to CRITICAL and ORDERED sections, and at the end of PARALLEL, DO/FOR, SECTIONS and SINGLE directives (except when a NOWAIT clause is present).

FLUSH example !$OMP PARALLEL PRIVATE(myid,i,neighb) : do j=1, niters do i=lb(myid),ub(myid) a(i)=( a(i+1)+a(i))*0.5 end do ndone(myid)=ndone(myid)+1 !$OMP FLUSH (ndone) do while (ndone(neighb).lt.ndone(myid)) !$OMP FLUSH (ndone) end do May be updated on different thread. Must wait for previous iteration to finish on neighbour Make sure write is to main memory Make sure read is from main memory Waits for neighbour

Choosing Synchronisation Use ATOMIC if possible. –Allows the most optimisation. If not possible, use CRITICAL –Use different names wherever possible If appropriate use variable flushing As a last resort use lock routines –Should be a rare occurrence in practice.

Practical: Molecular dynamics Aim: to introduce atomic updates Code is a simple MD simulation of the melting of solid argon. Computation is dominated by the calculation of force pairs in the subroutine forces. Parallelise this routine using a DO/FOR directive and atomic updates. –Watch out for PRIVATE and REDUCTION variables.

Practical: Image processing Aim: Introduction to the use of parallel DO/for loops. Simple image processing algorithm to reconstruct an image from an edge-detected version. Use parallel DO/for directives to run in parallel.

OpenMP resources Web sites Official web site, including language specifications, links to compilers, tools + mailing lists OpenMP community site: links, events, resources. Book: “Parallel programming in OpenMP”, Chandra et al., Morgan Kaufmann, ISBN PGI Users Guide on ‘iceberg’