Heterogeneous Computing using openMP lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.

Slides:



Advertisements
Similar presentations
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
Advertisements

1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.
Indian Institute of Science Bangalore, India भारतीय विज्ञान संस्थान बंगलौर, भारत Supercomputer Education and Research Centre (SERC) Adapted from: o “MPI-Message.
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.
1 Friday, November 10, 2006 “ Programs for sale: Fast, Reliable, Cheap: choose two.” -Anonymous.
Games at Bolton OpenMP Techniques Andrew Williams
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.
OpenMPI Majdi Baddourah
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP Instructors: Krste Asanovic & Vladimir Stojanovic.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
CS240A, T. Yang, 2013 Modified from Demmel/Yelick’s and Mary Hall’s Slides 1 Parallel Programming with OpenMP.
Programming with Shared Memory Introduction to OpenMP
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Parallel Programming in Java with Shared Memory Directives.
Lecture 5: Shared-memory Computing with Open MP. Shared Memory Computing.
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
Introduction to OpenMP. OpenMP Introduction Credits:
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
OpenMP Martin Kruliš Jiří Dokulil. OpenMP OpenMP Architecture Review Board Compaq, HP, Intel, IBM, KAI, SGI, SUN, U.S. Department of Energy,…
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
OpenMP fundamentials Nikita Panov
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Threaded Programming Lecture 4: Work sharing directives.
09/08/2011CS4961 CS4961 Parallel Programming Lecture 6: More OpenMP, Introduction to Data Parallel Algorithms Mary Hall September 8, 2011.
Introduction to OpenMP
09/09/2010CS4961 CS4961 Parallel Programming Lecture 6: Data Parallelism in OpenMP, cont. Introduction to Data Parallel Algorithms Mary Hall September.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,
Introduction to Pragnesh Patel 1 NICS CSURE th June 2015.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
CS240A, T. Yang, Parallel Programming with OpenMP.
CS 110 Computer Architecture Lecture 20: Thread-Level Parallelism (TLP) and OpenMP Intro Instructor: Sören Schwertfeger School.
COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University
Heterogeneous Computing using openMP lecture 1 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
Distributed and Parallel Processing George Wells.
B. Estrade, LSU – High Performance Computing Enablement Group OpenMP II B. Estrade.
Introduction to OpenMP
Martin Kruliš Jiří Dokulil
Shared Memory Parallelism - OpenMP
Lecture 5: Shared-memory Computing with Open MP
Shared-memory Programming
CS427 Multicore Architecture and Parallel Computing
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Improving Barrier Performance Dr. Xiao Qin.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing A bug in the rwlock program Dr. Xiao Qin.
Open[M]ulti[P]rocessing
Computer Engg, IIT(BHU)
Introduction to OpenMP
Shared-Memory Programming
Computer Science Department
Multi-core CPU Computing Straightforward with OpenMP
Parallel Programming with OpenMP
CS4961 Parallel Programming Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms Mary Hall September 4, /04/2012 CS4230.
Programming with Shared Memory Introduction to OpenMP
Introduction to OpenMP
OpenMP Martin Kruliš.
OpenMP Parallel Programming
Parallel Programming with OPENMP
Presentation transcript:

Heterogeneous Computing using openMP lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz

Adding Insult to Injury 1 int main() { int i; double x, pi, sum [10] =0.0; int num_t; step = 1.0/(double)num_steps; omp_set_num_threads( 10); #pragma_omp_parallel { int i, id, num_threads; double x; id = omp_get_thread_num(); num_threads = omp_get_num_threads(); if( id==0) num_t = num_threads; for( i= id ; i<num_steps; i = i + num_threads ) { x = (i+0.5) * step; sum [id] += 4.0/(1.0+x*x); } for(i=0; i< num_t; i++) pi += sum[i] * step; }

Synchronisation in openMP #pragma omp barrier – synchronises across all threads #pragma omp critical {} – ensures mutual exclusion #pragma omp atomic {} – tries to use an atomic operation, i.e. only works for very simple operations. If not possible, it turns into omp critical 2 4.0

Using Critical Sections 3 int main() { int i; double x, pi, sum =0.0; step = 1.0/(double)num_steps; omp_set_num_threads( 10); #pragma_omp_parallel { int i, id, num_threads; double x; id = omp_get_thread_num(); num_threads = omp_get_num_threads(); if( id==0) num_t = num_threads; for( i= id ; i<num_steps; i = i + num_threads ) { x = (i+0.5) * step; sum += 4.0/(1.0+x*x); } } for(i=0; i< num_t; i++) pi += sum[i] * step; } int num_t; [10] [id] pi = sum * step; #pragma omp critical { }

int main() { int i; double x, pi, sum =0.0; step = 1.0/(double)num_steps; omp_set_num_threads( 10); #pragma_omp_parallel { int i, id, num_threads; double x; id = omp_get_thread_num(); num_threads = omp_get_num_threads(); for( i= id ; i<num_steps; i = i + num_threads ) { x = (i+0.5) * step; sum += 4.0/(1.0+x*x); } } pi = sum * step; } #pragma omp critical { } x = 4.0/(1.0+x*x); #pragma omp atomic { sum += x ; } Going Atomic 4

Concurrent Loops OpenMP provides a loop pattern!! – Requires: No data dependencies (reads/write or write/write pairs) between iterations! Preprocessor calculates loop bounds for each thread directly from serial source Scheduling no longer hand-coded 20 ? ? 5555 for( i=0; i < 20; i++ ) { printf(“Hello World!”); } #pragma omp parallel { #pragma omp for } 5

Loop Scheduling schedule clause determines how loop iterations are divided among the thread team – static([chunk]) divides iterations statically between threads Each thread receives [chunk] iterations, rounding as necessary to account for all iterations Default [chunk] is ceil( # iterations / # threads ) – dynamic([chunk]) allocates [chunk] iterations per thread, allocating an additional [chunk] iterations when a thread finishes Forms a logical work queue, consisting of all loop iterations Default [chunk] is 1 – guided([chunk]) allocates dynamically, but [chunk] is exponentially reduced with each allocation 6

Loop Scheduling II schedule clause determines how loop iterations are divided among the thread team – auto leaves the choice to the compiler – runtime enables dynamic control through Environment variable OMP_SCHEDULE type Library function omp_set_schedule( type) 7

int main() { int i; double x, pi, sum =0.0; step = 1.0/(double)num_steps; omp_set_num_threads( 10); #pragma_omp_parallel { int i, id, num_threads; double x; id = omp_get_thread_num(); num_threads = omp_get_num_threads(); for( i= id ; i<num_steps; i = i + num_threads ) { x = (i+0.5) * step; x = 4.0/(1.0+x*x); #pragma omp atomic { sum += x ; } } } pi = sum * step; } Using OMP FOR 8 { #pragma omp for for(i=0; i<num_steps; i++) { schedule static(1)

Reductions Typical pattern: Examples: sum / product / mean / … 9 double vect[N]; double accu = neutral; for( i=0; i<N; i++) { accu = accu op big_load(i); }

Reductions 10 double vect[N]; double accu = neutral; for( i=0; i<N; i++) { accu = accu op big_load(i); } #pragma omp parallel for reduction( op: accu)

int main() { int i; double x, pi, sum =0.0; step = 1.0/(double)num_steps; omp_set_num_threads( 10); #pragma_omp_parallel { double x; #pragma omp for static(1) for( i= 0 ; i<num_steps; i ++ ) { x = (i+0.5) * step; x = 4.0/(1.0+x*x); #pragma omp atomic { sum += x ; } } } pi = sum * step; } Using Reduction 11 reduction( + : sum) sum += 4.0/(1.0+x*x);

Private vs Shared Default behaviour: – outside the parallel region: shared – inside the parallel region: private Explicit control: – clause shared( var, …) – clause private( var, …) 12

Example: 13 int x=1, y=1, z=1, q=1; #pragma parallel private( y,z) shared( q) { x=2; y=2; q=z; } What values of x, y, z and q do we have after the parallel region? x==2 y==1 z==1 q==

Private vs Shared ctnd. Default behaviour: – outside the parallel region: shared – inside the parallel region: private Explicit control: – clause shared( var, …) – clause private( var, …) – clause firstprivate( var, …) 14

Example: 15 int x=1, y=1, z=1, q=1; #pragma parallel private( y) shared( q) firstprivate( z) { x=2; y=2; q=z; } What values of x, y, z and q do we have after the parallel region? x==2 y==1 z==1 q==1

Summary The most common pattern can easily be achieved There is much more (-> – Tasks – Sections – Vector-instructions – Teams – … 16 Check it out