Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.

Slides:



Advertisements
Similar presentations
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Advertisements

Indian Institute of Science Bangalore, India भारतीय विज्ञान संस्थान बंगलौर, भारत Supercomputer Education and Research Centre (SERC) Adapted from: o “MPI-Message.
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
1 OpenMP—An API for Shared Memory Programming Slides are based on:
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.
OpenMPI Majdi Baddourah
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
C66x KeyStone Training OpenMP: An Overview.  Motivation: The Need  The OpenMP Solution  OpenMP Features  OpenMP Implementation  Getting Started with.
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
Programming with Shared Memory Introduction to OpenMP
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Introduction to OpenMP
Parallel Programming in Java with Shared Memory Directives.
Lecture 5: Shared-memory Computing with Open MP. Shared Memory Computing.
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
OpenMP China MCP.
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
ECE 1747 Parallel Programming Shared Memory: OpenMP Environment and Synchronization.
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
OpenMP: Open specifications for Multi-Processing What is OpenMP? Join\Fork model Join\Fork model Variables Variables Explicit parallelism Explicit parallelism.
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
OpenMP Martin Kruliš Jiří Dokulil. OpenMP OpenMP Architecture Review Board Compaq, HP, Intel, IBM, KAI, SGI, SUN, U.S. Department of Energy,…
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
OpenMP fundamentials Nikita Panov
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Introduction to OpenMP
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
09/09/2010CS4961 CS4961 Parallel Programming Lecture 6: Data Parallelism in OpenMP, cont. Introduction to Data Parallel Algorithms Mary Hall September.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,
Threaded Programming Lecture 2: Introduction to OpenMP.
Parallel Programming Models (Shared Address Space) 5 th week.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University
B. Estrade, LSU – High Performance Computing Enablement Group OpenMP II B. Estrade.
NPACI Parallel Computing Institute August 19-23, 2002 San Diego Supercomputing Center S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED.
OpenMP An API : For Writing Portable SMP Application Software Rider NCHC GTD.
Introduction to OpenMP
Shared Memory Parallelism - OpenMP
CS427 Multicore Architecture and Parallel Computing
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Improving Barrier Performance Dr. Xiao Qin.
Loop Parallelism and OpenMP CS433 Spring 2001
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing A bug in the rwlock program Dr. Xiao Qin.
Open[M]ulti[P]rocessing
Computer Engg, IIT(BHU)
Introduction to OpenMP
Shared-Memory Programming
Computer Science Department
Multi-core CPU Computing Straightforward with OpenMP
Introduction to High Performance Computing Lecture 20
Programming with Shared Memory Introduction to OpenMP
Introduction to OpenMP
OpenMP Martin Kruliš.
OpenMP Parallel Programming
Presentation transcript:

Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP

Computer Architecture II 2 OpenMP overview Open specifications for Multi Processing –A set of API for writing multi threaded applications –C/C++ and Fortran Thread-based parallelism –fork/join model OpenMP 1.Compiler directives. 2.Library calls 3.Environment variables

Computer Architecture II 3 OpenMP Release History 1997: OpenMP Fortran : OpenMP C/C : OpenMP Fortran : OpenMP Fortran : OpenMP C/C++ 2.0

Computer Architecture II 4 Goals Standard for shared-memory machines –major computer hardware and software vendors Limited number of directives Ease of Use: Incrementally parallelize a serial program Coarse-grain and fine-grain parallelism Portability: Fortran (77, 90, and 95), C, and C++

Computer Architecture II 5 OpenMP Constructs 1.Directives a.Parallel Region b.Work-sharing c.Synchronization 2.Runtime Library Routines 3.Environment variables

Computer Architecture II 6 OpenMP C Directive format # pragma omp directivename [clauses..] { code }

Computer Architecture II 7 1.a. Parallel Regions Directive Indicates a block of code that will be executed by multiple threads. Fork-join model #include void main() {int x; sequential code(); #pragma omp parallel { parallel code(); } sequential code(); } T0 T1 T2 M Join threads Master creates (forks) threads

Computer Architecture II 8 1.b. Work sharing Directive Types –for –section –single For Construct –Assigns work to all threads. –The method of assigning depends on a SCHEDULE clause. –A implicit barrier is to be assumed at the end –All the private variables are flushed at the end

Work Sharing example for(i=0;I<N;i++) { a[i] = a[i] + b[i];} #pragma omp parallel { int id, i, Nthrds, istart, iend; id = omp_get_thread_num(); Nthrds = omp_get_num_threads(); istart = id * N / Nthrds; iend = (id+1) * N / Nthrds; for(i=istart;I<iend;i++) { a[i] = a[i] + b[i];} } #pragma omp parallel #pragma omp for schedule(static) for(i=0;I<N;i++) { a[i] = a[i] + b[i];} Sequential code OpenMP parallel region OpenMP parallel region and a work-sharing for- construct

Computer Architecture II 10 Schedule Clause The schedule clause effects how loop iterations are mapped onto threads schedule(static [,chunk]) – assigns a number of “chunk” iterations to each thread. schedule(dynamic[,chunk]) – When free, each thread picks “chunk” iterations from a queue until all iterations have been executed. schedule(guided[,chunk]) – a special dynamical schedule. At the beginning each thread grads “chunk” iteration, then the number decreases slowly.

Computer Architecture II 11 Section Directive Non-iterative construct Each section is executed by one thread # pragma omp parallel { # pragma omp sections { #pragma omp section code_executed_by_one(); #pragma omp section code_executed_byanother_one(); }

Computer Architecture II 12 Single Directive One thread only will execute the single section, while the others will do nothing. # pragma omp single code_executed_by_only_one

Computer Architecture II 13 Parallel Regions and work sharing Directives A parallel region directive could be combined with a work-sharing construct. # pragma omp parallel for ScheduleClause # pragma omp parallel sections

Computer Architecture II 14 Data Scoping Clauses Scoping: in which blocks of programs are the declared variable visible By default the majority of variables is shared –Exceptions Loop index within a parallel for Subroutines called within a parallel region Local variables declared within lexical scope of a parallel region Is recommended to declare explicitly the scope of variables by using the clauses –SHARED: the variables are shared among threads –PRIVATE: the variable is private to a thread –FIRSTPRIVATE: the variable is private and all the private copies are initialized to the value from the original object location before entering the parallel region –LASTPRIVATE: the value of the last iteration is copied to the original object location –REDUCTION: performs a reduction on the private variables at the end of the parallel construct

Reduction example #include main () { int i; float a[100], b[100], result; result = 0.0; #pragma omp parallel for private(i) reduction(+:result) for (i=0; i < n; i++) result = result + (a[i] * b[i]); } –a and b arrays are shared (by default) –result is declared private and reduced at the end – i is private by default (one of the 3 exceptions)

Computer Architecture II 16 1.c. Synchronization directives !$omp barrier !$omp noWait !$omp critical !$omp master !$omp flush

Computer Architecture II 17 Synchronization Directives When a BARRIER directive is reached, a thread will wait at that point until all other threads have reached that barrier. Implicit barriers are applied at: End parallel regions End of work sharing constructs (for,sections,single) End of critical sections

Computer Architecture II 18 Synchronization Directives NoWait is a construct that overcomes the implicit barriers. It is used with: Parallel Regions Directives Work sharing Directives

Computer Architecture II 19 The CRITICAL directive specifies a region of code that must be executed by only one thread at a time It blocks all other threads until the current thread exits that CRITICAL region. # pragma omp critical name The optional name enables multiple different CRITICAL regions to exist Different CRITICAL regions with the same name are treated as the same region. Synchronization Directives

Computer Architecture II 20 The FLUSH directive identifies a synchronization point at which the implementation must provide a consistent view of memory. Thread-visible variables are written back to memory at this point. FLUSH is implied implicitly with these directives: critical - entry and exit barrier parallel - exit for - exit sections - exit single - exit Synchronization Directives

Computer Architecture II Runtime Library Routines The OpenMP standard defines an API for library calls that perform a variety of functions: –Query the number of threads/processors, set number of threads to use –General purpose locking routines (semaphores) –Set execution environment functions: nested parallelism, dynamic adjustment of threads.

Computer Architecture II 22 Runtime Library Routines sets the number of threads that will be used in the next parallel region. void omp_set_num_threads(int num_threads) returns the number of threads that are currently in the team executing the parallel region from which it is called. int omp_get_num_threads(void) returns the thread number of the thread, within the team. This number will be between 0 and OMP_GET_NUM_THREADS-1. The master thread of the team is thread 0 int omp_get_thread_num(void) returns the number of processors that are available to the program. int omp_get_num_procs(void) Used to determine if the section of code which is executing is parallel or not. int omp_in_parallel(void)

Computer Architecture II 23 Runtime Library Routines By default, a program with multiple parallel regions will use the same number of threads to execute each region. This behavior can be changed to allow the run- time system to dynamically adjust the number of threads that are created for a given parallel section. To enables or disables dynamic adjustment (by the run time system) of the number of threads available for execution of parallel regions. void omp_set_dynamic(int dynamic_threads)

Computer Architecture II Environment Variables Some of them are variants of run-time library calls OMP_NUM_THREADS Sets the maximum number of threads to use during execution. For example: setenv OMP_NUM_THREADS 8 OMP_DYNAMIC Enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. Valid values are TRUE or FALSE. For example: setenv OMP_DYNAMIC TRUE