CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.

Slides:



Advertisements
Similar presentations
OpenMP.
Advertisements

Parallel Processing with OpenMP
– R 7 :: 1 – 0024 Spring 2010 Parallel Programming 0024 Recitation Week 7 Spring Semester 2010.
Introduction to Openmp & openACC
Synchronization and Deadlocks
Chapter 6 Process Synchronization Bernard Chen Spring 2007.
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
CS 470/570 Lecture 7 Dot Product Examples Odd-even transposition sort More OpenMP Directives.
Department of Computer and Information Science
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
Programming with Shared Memory Introduction to OpenMP
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
Parallel Programming in Java with Shared Memory Directives.
UNIT -6 PROGRAMMING SHARED ADDRESS SPACE PLATFORMS THREAD BASICS PREPARED BY:-H.M.PATEL.
Object Oriented Analysis & Design SDL Threads. Contents 2  Processes  Thread Concepts  Creating threads  Critical sections  Synchronizing threads.
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
OpenMP fundamentials Nikita Panov
CSCI-455/552 Introduction to High Performance Computing Lecture 19.
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Threaded Programming Lecture 4: Work sharing directives.
Chapter 7 -1 CHAPTER 7 PROCESS SYNCHRONIZATION CGS Operating System Concepts UCF, Spring 2004.
Introduction to OpenMP
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
Threaded Programming Lecture 2: Introduction to OpenMP.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-3. OMP_INIT_LOCK OMP_INIT_NEST_LOCK Purpose: ● This subroutine initializes a lock associated with the lock variable.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-2. Environment Variables OMP_NUM_THREADS OMP_SCHEDULE.
Heterogeneous Computing using openMP lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University
OpenMP – Part 2 * *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
1 Programming with Shared Memory - 2 Issues with sharing data ITCS 4145 Parallel Programming B. Wilkinson Jan 22, _Prog_Shared_Memory_II.ppt.
Pitfalls: Time Dependent Behaviors CS433 Spring 2001 Laxmikant Kale.
B. Estrade, LSU – High Performance Computing Enablement Group OpenMP II B. Estrade.
OpenMP An API : For Writing Portable SMP Application Software Rider NCHC GTD.
Introduction to OpenMP
Process Synchronization
Shared Memory Parallelism - OpenMP
CS427 Multicore Architecture and Parallel Computing
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Improving Barrier Performance Dr. Xiao Qin.
Loop Parallelism and OpenMP CS433 Spring 2001
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing A bug in the rwlock program Dr. Xiao Qin.
Open[M]ulti[P]rocessing
Computer Engg, IIT(BHU)
Introduction to OpenMP
Computer Science Department
Introduction to High Performance Computing Lecture 20
Programming with Shared Memory Introduction to OpenMP
Questions Parallel Programming Shared memory performance issues
Introduction to OpenMP
CS333 Intro to Operating Systems
OpenMP Martin Kruliš.
Presentation transcript:

CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois

PARALLEL DO: Restrictions Spring 2010CPE 779 Parallel Computing2  The number of times that the loop body is executed (trip- count) must be available at runtime before the loop is executed.  The loop body must be such that all iterations of the loop are completed

PARALLEL Directive: Syntax Spring 2010CPE 779 Parallel Computing3 Fortran: !$omp parallel [clause [,] [clause …]] structured block !$omp end parallelC/C++: #pragma omp parallel [clause [clause …]] structured block

Parallel Regions Details Spring 2010CPE 779 Parallel Computing4  When a parallel directive is encountered, threads are spawned which execute the code of the enclosed structured block (the parallel region).  The number of threads can be specified just like for the parallel do directive.  The parallel region is replicated and each thread executes a copy of the replicated region.

Example Spring 2010CPE 779 Parallel Computing5 omp_set_num_threads(4) pooh(1,A)pooh(2,A)pooh(3,A) printf(“all done\n”); pooh(0,A) double A[1000]; ID = omp_thread_num() double A[1000]; omp_set_num_threads(4); #pragma omp parallel { int ID = omp_thread_num(); pooh(ID, A); } printf(“all done\n”);

PARALLEL vs PARALLEL DO Spring 2010CPE 779 Parallel Computing6  Arbitrary structured blocks v/s loops  Coarse grained v/s fine grained  Replication v/s work division !$omp parallel do I = 1,10 print *, ‘Hello world’, I enddo !$omp end parallel !$omp parallel do do I = 1,10 print *, ‘Hello world’, I enddo Output: 10 Hello world messages Output: 10*T Hello world messages where T = number of threads

Synchronization - Motivation Spring 2010CPE 779 Parallel Computing7  Concurrent access to shared data may result in data inconsistency - mechanism required to maintain data consistency : mutual exclusion  Sometimes code sections executed by different threads need to be sequenced in some particular order : event synchronization

Mutual Exclusion Spring 2010CPE 779 Parallel Computing8  Mechanisms for ensuring the consistency of data that is accessed concurrently by several threads  Critical directive: specifies a region of code that must be executed by only one thread at a time.  Atomic directive: specifies that a specific memory location must be updated atomically, rather than letting multiple threads attempt to write to it.  Library lock routines

Critical Section: Syntax Spring 2010CPE 779 Parallel Computing9 Fortran: !$omp critical [(name)] structured block !$omp end critical [(name)]C/C++: #pragma omp critical [(name)] structured block

Example Spring 2010CPE 779 Parallel Computing10 cur_max = MINUS_INFINITY !$omp parallel do do i = 1, n … !$OMP CRITICAL if (a(i).gt. cur_max) then cur_max = a(i) endif !$OMP END CRITICAL …enddo

Atomic Directive Spring 2010CPE 779 Parallel Computing11  The body of an atomic directive is a single assignment statement.  There are restrictions on the statement which insure that it can be translated into an atomic sequence of machine instructions to read, modify and write a memory location.  An atomic statement must follow a specific syntax. See the most recent OpenMP specs for this

Example Spring 2010CPE 779 Parallel Computing12 C$OMP PARALLEL PRIVATE(B) B = DOIT(I) C$OMP ATOMIC X = X + B C$OMP END PARALLEL C$OMP PARALLEL PRIVATE(B) B = DOIT(I) C$OMP CRITICAL(XB) X = X + B C$OMP END CRITICAL(XB) C$OMP END PARALLEL

Library Lock routines Spring 2010CPE 779 Parallel Computing13  Routines to:  create a lock - omp_init_lock  acquire a lock, waiting until it becomes available if necessary - omp_set_lock  release a lock, resuming a waiting thread, if one exists - omp_unset_lock  try and acquire a lock but return instead of waiting if not available - omp_test_lock  destroy a lock - omp_destroy_lock

Example Spring 2010CPE 779 Parallel Computing14 omp_lock_t lck; omp_init_lock(&lck); #pragma omp parallel private (tmp) { id = omp_get_thread_num(); tmp = do_lots_of_work(id); omp_set_lock(&lck); printf(“%d %d”, id, tmp); omp_unset_lock(&lck); }

Library Lock Routines Spring 2010CPE 779 Parallel Computing15  Locks are the most flexible of the mutual exclusion primitives because the there are no restrictions on where they can be placed.  The previous routines don’t support nested acquires - deadlock if tried!! - a separate set of routines exist to allow nesting.  Nesting of locks is useful for code like recursive routines.

Mutual Exclusion Features Spring 2010CPE 779 Parallel Computing16  Apply to critical, atomic as well as library routines:  NO Fairness guarantee.  Guarantee of Progress.  Careful when nesting - lots of chances for deadlock.