OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)

Slides:



Advertisements
Similar presentations
OpenMP.
Advertisements

Introduction to Openmp & openACC
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Indian Institute of Science Bangalore, India भारतीय विज्ञान संस्थान बंगलौर, भारत Supercomputer Education and Research Centre (SERC) Adapted from: o “MPI-Message.
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
1 OpenMP—An API for Shared Memory Programming Slides are based on:
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.
OpenMPI Majdi Baddourah
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
Programming with Shared Memory Introduction to OpenMP
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Introduction to OpenMP
Parallel Programming in Java with Shared Memory Directives.
OMPi: A portable C compiler for OpenMP V2.0 Elias Leontiadis George Tzoumas Vassilios V. Dimakopoulos University of Ioannina.
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
OpenMP China MCP.
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
Introduction to OpenMP. OpenMP Introduction Credits:
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
OpenMP fundamentials Nikita Panov
Threaded Programming Lecture 4: Work sharing directives.
Introduction to OpenMP
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Threaded Programming Lecture 2: Introduction to OpenMP.
Introduction to Pragnesh Patel 1 NICS CSURE th June 2015.
Parallel Programming Models (Shared Address Space) 5 th week.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
CS240A, T. Yang, Parallel Programming with OpenMP.
OpenMP – Part 2 * *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
B. Estrade, LSU – High Performance Computing Enablement Group OpenMP II B. Estrade.
OpenMP An API : For Writing Portable SMP Application Software Rider NCHC GTD.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Chapter 4: Multithreaded Programming
Introduction to OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
Shared Memory Parallelism - OpenMP
CS427 Multicore Architecture and Parallel Computing
Open[M]ulti[P]rocessing
Computer Engg, IIT(BHU)
Supporting OpenMP and other Higher Languages in Dyninst
Introduction to OpenMP
Computer Science Department
Shared Memory Programming with OpenMP
Chapter 4: Threads Overview Multithreading Models Thread Libraries
Multi-core CPU Computing Straightforward with OpenMP
Introduction to High Performance Computing Lecture 20
Programming with Shared Memory Introduction to OpenMP
Introduction to OpenMP
OpenMP Martin Kruliš.
OpenMP Parallel Programming
Shared-Memory Paradigm & OpenMP
Parallel Programming with OPENMP
WorkSharing, Schedule, Synchronization and OMP best practices
Presentation transcript:

OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)

Outline What is OpenMP? –Introduction (Code Structure, Directives, Threads etc.) –Limitations –Data Scope Clauses Shared, Private –Work-sharing constructs –Synchronization

What is OpenMP? An Application Program Interface (API) that may be used to explicitly direct multithreaded, shared memory parallelism OpenMP is managed by the nonprofit technology consortium OpenMP Architecture Review Board (or OpenMP ARB), jointly defined by a group of major computer hardware and software vendors, including AMD, IBM, Intel, Cray, HP, Fujitsu, Nvidia, NEC, Red Hat, Texas Instruments, Oracle Corporation, and more. Portable & Standardized –API exist both C/C++ and Fortan 90/77 –Multi platform Support (Unix, Linux etc.)

OpenMP Specifications Version 3.1, Complete Specifications, July 2011 Version 3.0, May 2008 Version 2.5, May 2005 (C/C++ & Fortran) Version 2.0 –C/C++, March 2002 –Fortran, November 2000 Version 1.0 –C/C++, October 1998 –Fortran, October 1997 Detailed Info:

Intel & GNU OpenMP Intel Compilers –OpenMP 2.5 conforming –Nested parallelisim –Workqueuing extension to OpenMP –Interoperability with POSIX and Windows threads –OMP_DYNAMIC support GNU OpenMP (OpenMP+gcc) –OpenMP 3.0 Support (gcc 4.4 and later)

OpenMP Programming Model Explicit parallelism Thread based parallelism; program runs with user specified number of multiple thread Uses fork & join model Synchronization Point (“barrier”, “critical region”, “single processor region”)

Shared Memory Model –Each thread must be reach a shared memory (SMP) Limitations of OpenMP

Terminology and Behavior OpenMP Team = Master + Worker Parallel Region is a block of code executed by all threads simultaneously (has implicit barrier) –The master thread always has thread id 0 –Parallel regions can be nested –If clause can be used to guard the parallel region

Terminology and Behavior A Work-Sharing construct divides the execution of the enclosed code region among the members of the team. (Loop, Section etc.)

OpenMP Code Structure #include main () { int var1, var2, var3; /* Serial code */. /* Beginning of parallel section. Fork a team of threads.Specify variable scoping */ #pragma omp parallel private(var1, var2) shared(var3) { Parallel section executed by all threads. All threads join master thread and disband } /* Resume serial code */. } #include main () { int var1, var2, var3; /* Serial code */. /* Beginning of parallel section. Fork a team of threads.Specify variable scoping */ #pragma omp parallel private(var1, var2) shared(var3) { Parallel section executed by all threads. All threads join master thread and disband } /* Resume serial code */. } C/C++

OpenMP Directives Format in C/C++: #pragma omp : Required for all OpenMP C/C++ directives. directivename : A valid OpenMP directive. Must appear after the pragma and before any clauses. [clause,...] : Optional. Clauses can be in any order, and repeated as necessary unless otherwise restricted. #pragma omp directivename [clause,...] \

OpenMP Directives Example: General Rules: Directives follow conventions of the C/C++ standards for compiler directives. Case sensitive Only one directivename may be specified per directive Long directive lines can be "continued" on succeeding lines by escaping the newline character with a backslash ("\") at the end of a directive line. #pragma omp parallel default(shared) private(beta,pi)

OpenMP Directives PARALLEL Region Construct: A parallel region is a block of code that will be executed by multiple threads. This is the fundamental OpenMP parallel construct. #pragma omp parallel [clause...]

OpenMP Directives C/C++ OpenMP structured block definition. #pragma omp parallel [clause...] { structured_block }

When a thread reaches a PARALLEL directive It creates a term of threads and becomes the master of the team The master is a member of that team, it has thread number 0 within that team (THREAD ID) Starting from the beginning of this parallel region, the code is duplicated and all threads will execute that code There is an implied barrier at the end of a parallel section Only the master thread continues execution past this point

Lab: Helloworld

Lab: Compiling Helloworld $ gcc -fopenmp omp_hello.c -o omp_hello $ export OMP_NUM_THREADS=2 $./omp_hello Hello World from thread = 0 Hello World from thread = 1

Lab: Helloworld Set environment variables (export) Run your OpenMP compile bash: $./omp_hello Hello OpenMP! bash: $ export OMP_NUM_THREADS=4 Optional Exercise: 1 - set OMP_NUM_THREADS to an higher value (such as 10) 2- repeat example.

OpenMP Constructs

Data Scope Attribute Clauses SHARED Clause: –It declares variables in its list to be shared to each thread. –Behavior The pointer of the object of the same type is declared once for each thread in the team All threads reference to the original object shared (list) C/C++

Data Scope Attribute Clauses PRIVATE Clause: –It declares variables in its list to be private to each thread. –Behavior A new object of the same type is declared once for each thread in the team All references to the original object are replaced with references to the new object Variables declared PRIVATE are uninitialized for each thread (FIRSTPRIVATE can be used for initialization of variables) private (list) C/C++

A work-sharing construct divides the execution of the enclosed code region among the members of team that encounter it. Must be enclosed in a parallel region otherwise it is simply ignored. Work-sharing constructs do not launch/create new threads. There is no implied barrier upon entry to a work-sharing construct. However there is an implicit barrier at the end of a work-sharing construct. Work-Sharing Constructs

Types Work-Sharing Constructs

shares iterations of a loop across the team. Represents a type of "data parallelism". breaks work into separate, discrete sections. Each section is executed by a thread. Can be used to implement a type of "functional parallelism". serializes a section of code Work-Sharing Constructs

for directive (C/C++) #pragma omp for [clause...] { for_loop }

schedule clause: schedule(kind [,chunk_size]) –static: less overhead, default on many OpenMP compilers –dynamic & guided: useful for poorly balanced and unpredictable workload. In guided the size of chunk decreases over time. –runtime: If this schedule is selected, the decision regarding scheduling kind is made at run time. The schedule and (optional) chunk size are set through the OMP_SCHEDULE environment variable. Work-Sharing Constructs

schedule clause: –describes how iterations of the loop are divided among the threads in the team Loop iterations are divided into pieces of size chunk statically When a thread finishes one chunk, it is dynamically assigned another. The default chunk size is 1. The chunk size is exponentially reduced with each dispatched piece of the iteration space. The default chunk size is 1. Work-Sharing Constructs

nowait (C/C++) clause: –If specified, then threads do not synchronize at the end of the parallel loop. Threads proceed directly to the next statements after the loop. Work-Sharing Constructs

Work-Sharing Lab : nowait