1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.

Slides:



Advertisements
Similar presentations
OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Advertisements

Introduction to Openmp & openACC
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
1 OpenMP—An API for Shared Memory Programming Slides are based on:
Cc Compiler Parallelization Options CSE 260 Mini-project Fall 2001 John Kerwin.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
INTEL CONFIDENTIAL OpenMP for Domain Decomposition Introduction to Parallel Programming – Part 5.
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.
– 1 – Basic Machine Independent Performance Optimizations Topics Load balancing (review, already discussed) In the context of OpenMP notation Performance.
Programming with Shared Memory Introduction to OpenMP
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Parallel Programming in Java with Shared Memory Directives.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Lecture 5: Shared-memory Computing with Open MP. Shared Memory Computing.
This module was created with support form NSF under grant # DUE Module developed by Martin Burtscher Module B1 and B2: Parallelization.
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
OpenMP fundamentials Nikita Panov
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Threaded Programming Lecture 4: Work sharing directives.
09/08/2011CS4961 CS4961 Parallel Programming Lecture 6: More OpenMP, Introduction to Data Parallel Algorithms Mary Hall September 8, 2011.
Introduction to OpenMP
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,
MPI and OpenMP.
Threaded Programming Lecture 2: Introduction to OpenMP.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
CS240A, T. Yang, Parallel Programming with OpenMP.
Parallel Programming in C with MPI and OpenMP
Introduction to OpenMP
Lecture 5: Shared-memory Computing with Open MP
SHARED MEMORY PROGRAMMING WITH OpenMP
Shared-memory Programming
CS427 Multicore Architecture and Parallel Computing
Open[M]ulti[P]rocessing
Computer Engg, IIT(BHU)
Introduction to OpenMP
Shared-Memory Programming
Computer Science Department
Shared Memory Programming with OpenMP
Parallel Programming with OpenMP
Introduction to High Performance Computing Lecture 20
Programming with Shared Memory Introduction to OpenMP
Introduction to OpenMP
Programming with Shared Memory Specifying parallelism
Shared-Memory Paradigm & OpenMP
Presentation transcript:

1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law

2

3

4

5 Shared Memory Model §A collection of processors, each with access to same shared memory. §Processors can interact and synchronize with each other through shared variables.

6 Shared Memory Programming §It is possible to write parallel programs for multiprocessors using MPI §But we can achieve better performance by using a programming model tailored for a shared-memory environment.

7 OpenMP §On shared-memory multiprocessors, memory among processors can be shared. §A directive-based OpenMP Application Program Interface (API) has been developed specifically for shared-memory parallel processing. §Directives assist the compiler in the parallelization of application codes.

8 §In the past, almost all major manufacturers of high performance shared-memory multiprocessor computers have their own sets of directives. §The functionalities and syntaxes of these directive sets varied among vendors.

9 Code portability…

10 §A standard to ensure code portability across shared-memory platforms, an independent organization, openmp.org, was established in §As a result, the OpenMP API came into being in §The primary benefit of using OpenMP is the relative ease of code parallelization made possible by the shared-memory architecture.

11 OpenMP §OpenMP has broad support from many major computer hardware and software manufacturers. §Similar to MPI's achievement as the standard for distributed-memory parallel processing, OpenMP has emerged as the standard for shared-memory parallel computing.

12 §fork §join

13 §The standard view of parallelism in a shared memory program is fork/join parallelism. §A the beginning of a program, only a single thread, called master thread, is active. §At points where parallel operations are required, the master thread forks (creates/awakens) additional threads.

14

15 §The master thread and child threads work concurrently through the parallel section. §At end of parallel code the child threads die or are suspended and flow of control single master thread (join). §Number of active threads can change dynamically throughout the execution of the program.

16

17 Parallel for loops §Parallel operations are often expressed as loops §With OpenMP it is easy to indicate when iterations of for loop may be executed in parallel.

18 for (i=first; i<size; i+=prime) marked[i]=1; l No dependence between one iteration of loop and another.

19 for (i=first; i<size; i+=prime) marked[i]=1; l In OpenMP we will simply indicate that the iterations of for loop may be executed in parallel l The compiler will take care of generating the code that forks/joins threads and schedules the iterations.

20 pragma §A compiler directive in C/C++ is called a pragma (pragmatic information) §A pragma is used to communicate information to the compiler.

21 pragma §Compiler may ignore that information and still generate correct object program. §Information provided by pragma can help compiler optimize the program

22 parallel for pragma #pragma omp

23 parallel for pragma #pragma omp parallel for §Instruct the compiler to parallelize the for loop that immediately follows this directive.

24 parallel for pragma #pragma omp parallel for for (i=first; i<size; i+=prime) marked[i]=1;

25 parallel for pragma #pragma omp parallel for for (i=first; i<size; i+=prime) marked[i]=1; §Runtime system must have information it needs to determine the number of iterations when it evaluates the control clause.  for loop must not contain statements that allow the loop to be exited prematurely (i.e. break, return, exit, goto) l continue is allowed.

26 parallel for pragma #pragma omp parallel for for (i=first; i<size; i+=prime) marked[i]=1;  In parallel for pragma, variables are by default shared, except the loop index which is private.

27 parallel for pragma int b[3]; char* cptr; int i; cptr = malloc(1); #pragma omp parallel for for(i=0; i<3; i++) b[i]=i;

28 parallel for pragma for(i=2; i<=5; i++) a[i] = a[i] + a[i-1];

29 parallel for pragma for(i=2; i<=5; i++) a[i] = a[i] + a[i-1]; Assume that the array a has been initialized with integers from 1-5.

30 parallel for pragma §Suppose we have 2 threads. §Assume the first thread is assigned loop indices 2 and 3. §Second thread is assigned 4 and 5.

31 parallel for pragma One possible order of execution: Thread 1 performs the computation on i=4, reading the value of a(3) before thread 0 has completed the computations for i=3 which update a(3).

32 §OpenMP will do what you tell it to. §If you parallelize a loop with data dependency, it will give wrong result. §The programmer is responsible for correctness of code.

33 parallel for pragma §The runtime system needs to know how many threads to create. §There are several ways to specify the number of threads to be used. l One of these is to set the environment variable OMP_NUM_THREADS to the required number of threads.

34 parallel for pragma Environment variable OMP_NUM_THREADS In bash: export OMP_NUM_THREADS=4

35 parallel for pragma §The loop indices are distributed among the specified number of threads. §The way in which the loop indices are distributed is known as the schedule.

36 parallel for pragma § In the "static" schedule, which is typically the default, each thread will get a chunk of indices of approximately equal size. §For example, if the loop goes from 1 to 100 and there are 3 threads, l The first thread will process i=1 through i=34 l The second thread will process i=35 through i=67 l The third thread will process i=68 through i=100.

37 parallel for pragma §There is an implied barrier at the end of the loop l Each thread will wait at the end of the loop until all threads have reached that point before they continue.

38 §Sequential program is a special case of shared-memory parallel program (i.e. one with no forks/joins in it)

39 §Directives, as well as OpenMP function calls, are treated as comments in the event that OpenMP invocation is not preferred or available during compilation, the code is in effect a serial code. §This affords a unified code for both serial and parallel applications which can ease code maintenance.

40 §Shared memory model supports incremental parallelization. §Incremental parallelization is the process of transforming a sequential program into a parallel program one block of code at a time.

41 §Benefits of incremental parallelization?

42 §Benefits of incremental parallelization l Profile execution of sequential program. l Sort program blocks in terms of time they consume.

43 §An overhead is incurred any time parallel threads are spawned, such as in the case of parallel for directive. This is system dependent. §Therefore, when a short loop is parallelized, it will probably take longer to execute on multiple threads than on a single thread since the overhead is greater than the time savings due to parallelization.

44 "How long is long enough?" Answer is dependent upon the system and the loop under consideration. As a very rough estimate, several thousand operations (total over all loop iterations, not per iteration) There is only one way to know for sure: Try parallelizing the loop, and then time it and see if it is running faster.