Introduction to OpenMP 曾奕倫 Department of Computer Science & Engineering Yuan Ze University.

Slides:



Advertisements
Similar presentations
Parallel Processing with OpenMP
Advertisements

Introductions to Parallel Programming Using OpenMP
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.
1 OpenMP—An API for Shared Memory Programming Slides are based on:
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Parallel Programming by Tiago Sommer Damasceno Using OpenMP
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
OpenMPI Majdi Baddourah
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
INTEL CONFIDENTIAL OpenMP for Domain Decomposition Introduction to Parallel Programming – Part 5.
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.
Programming with Shared Memory Introduction to OpenMP
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
Executing OpenMP Programs Mitesh Meswani. Presentation Outline Introduction to OpenMP Machine Architectures Shared Memory (SMP) Distributed Memory MPI.
1 Datamation Sort 1 Million Record Sort using OpenMP and MPI Sammie Carter Department of Computer Science N.C. State University November 18, 2004.
Parallel Programming in Java with Shared Memory Directives.
Lecture 8: Caffe - CPU Optimization
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
OpenMP China MCP.
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP Blue Waters Undergraduate Petascale Education Program May 29 – June
Introduction to OpenMP. OpenMP Introduction Credits:
OpenMP: Open specifications for Multi-Processing What is OpenMP? Join\Fork model Join\Fork model Variables Variables Explicit parallelism Explicit parallelism.
O PEN MP (O PEN M ULTI -P ROCESSING ) David Valentine Computer Science Slippery Rock University.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
Programming With C.
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
OpenMP fundamentials Nikita Panov
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Introduction to OpenMP
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,
MPI and OpenMP.
Threaded Programming Lecture 2: Introduction to OpenMP.
Introduction to Pragnesh Patel 1 NICS CSURE th June 2015.
CS/EE 217 GPU Architecture and Parallel Programming Lecture 23: Introduction to OpenACC.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-3. OMP_INIT_LOCK OMP_INIT_NEST_LOCK Purpose: ● This subroutine initializes a lock associated with the lock variable.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
10/05/2010CS4961 CS4961 Parallel Programming Lecture 13: Task Parallelism in OpenMP Mary Hall October 5,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University
Heterogeneous Computing using openMP lecture 1 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
1 ITCS4145 Parallel Programming B. Wilkinson March 23, hybrid-abw.ppt Hybrid Parallel Programming Introduction.
OpenMP Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Distributed and Parallel Processing George Wells.
Introduction to OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
CS427 Multicore Architecture and Parallel Computing
Computer Engg, IIT(BHU)
Introduction to OpenMP
September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.
Multi-core CPU Computing Straightforward with OpenMP
Programming with Shared Memory Introduction to OpenMP
Hybrid Parallel Programming
DNA microarrays. Infinite Mixture Model-Based Clustering of DNA Microarray Data Using openMP.
Introduction to OpenMP
Multithreading Why & How.
Hybrid Parallel Programming
OpenMP Parallel Programming
SPL – PS1 Introduction to C++.
Parallel Programming with OPENMP
Presentation transcript:

Introduction to OpenMP 曾奕倫 Department of Computer Science & Engineering Yuan Ze University

Outline EETimes news articles regarding parallel computing Simple C programs Simple OpenMP programs How to compile & execute OpenMP programs 2

A Number of EETimes Articles Researchers report progress on parallel path (2009/08/24) [link]link Parallel software plays catch-up with multicore (2009/06/22) [link]link Cadence adds parallel solving capabilities to Spectre (2008/12/15) [link]link Mentor releases parallel timing analysis and optimization technology (2008/10/13) [link]link 3

A Number of EETimes Articles Researchers report progress on parallel path (2009/08/24) [link]link “The industry expects processors with 64 cores or more will arrive by 2015, forcing the need for parallel software, said David Patterson of the Berkeley Parallel Lab. Although researchers have failed to create a useful parallel programming model in the past, he was upbeat that this time there is broad industry focus on solving the problem.” Berkeley Parallel Lab “In a separate project, one graduate student used new data structures to map a high-end computer vision algorithm to a multicore graphics processor, shaving the time to recognize an image from 7.8 to 2.1 seconds.” 4

A Number of EETimes Articles 5 Parallel software plays catch-up with multicore (2009/06/22) [link]link “Microprocessors are marching into a multicore future to keep delivering performance gains …... But mainstream software has yet to find its path to using the new parallelism.” "Anything performance-critical will have to be rewritten," said Kunle Olukotun, director of the Pervasive Parallelism Lab at Stanford University, one of many research groups working on the problem seen as the toughest in computer science today. Some existing multiprocessing tools, such as OpenMP, now applied at the chip level. Intel and others have released libraries to mange software threads. Startups such as Critical Blue (Edinburgh, Scotland) and Cilk Arts Inc. (Burlington, Mass.) have developed tools to help find parallelism in today's C code.Critical Blue Freescale has doubled the size of its multicore software team in preparation for such offerings, Cole said.

A Number of EETimes Articles 6 Parallel software plays catch-up with multicore (2009/06/22) [link]link

The Textbook Barbara Chapman, Gabriele Jost, and Ruud van der Pas, Using OpenMP – Portable Shared Memory Parallel Programming, The MIT Press, 2008 The book can be viewed on-line within.yzu.edu.tw domain: [Link]Link 7

Block Diagram of a Dual-core CPU 8

Shared Memory and Distributed Memory 9

Fork-Join Programming Model 10

Environment Used in this Tutorial Ubuntu Linux version 9.04 Desktop Edition (64-bit version) gcc (version 4.3.3)  $ gcc --version  $ gcc –v gcc version (on Luna): OK 11

Your First C Program (HelloWorld.c) #include int main() { printf("Hello World\n"); } 12

Compiling Your C Program Method #1 $ gcc HelloWorld.c /* the executable file “a.out” (default) will be generated */ Method #2 $ gcc -o HelloW HelloWorld.c /* the executable file “HelloW” (instead of “a.out”) will * be generated */ 13

Executing Your First C Program Method #1 $./a.out /* if “$ gcc HelloWorld.c” was used. */ Method #2 $./HelloW /* if “$ gcc -o HelloW HelloWorld.c” was used */ 14

A Simple Makefile (for HelloWorld.c) HelloWorld: HelloWorld.c gcc -o HelloWorld HelloWorld.c 15 Makefile The first line: “HelloWorld” is the binary target. The second line (gcc –o …), which is a build rule, must begin with a tab. To compile, just type $ make

C Program – For Loop & printf (HelloWorld_2.c) #include int main() { int i; for (i=1; i<=10; i++) { printf("Hello World: %d\n", i); } 16

Your First OpenMP Program (omp_test00.c) #include int main() { #pragma omp parallel printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads() ); } 17

#pragma Directive The ‘#pragma’ directive is the method specified by the C standard for providing additional information to the compiler, beyond what is conveyed in the language itself. (Source: ) 18

#pragma Directive Each implementation of C and C++ supports some features unique to its host machine or operating system. Some programs, for instance, need to exercise precise control over the memory areas where data is placed or to control the way certain functions receive parameters. The #pragma directives offer a way for each compiler to offer machine- and operating system-specific features while retaining overall compatibility with the C and C++ languages. Pragmas are machine- or operating system-specific by definition, and are usually different for every compiler. (Source: ) 19

#pragma Directive Computing Dictionary pragma (pragmatic information) A standardized form of comment which has meaning to a compiler. It may use a special syntax or a specific form within the normal comment syntax. A pragma usually conveys non-essential information, often intended to help the compiler to optimize the program. commentcompilersyntax 20

Compiling Your OpenMP Program Method #1 $ gcc –fopenmp omp_test00.c /* the executable file “a.out” will be generated */ Method #2 $ gcc –fopenmp -o omp_test00 omp_test00.c /* the executable file “omp_test00” will be generated */ 21

Executing Your OpenMP Program 22 Method #1 $ a.out /* if “a.out” has been generated. */ Method #2 $ omp_test00 /* if “omp_test00” has been generated */

UNIX/Linux Shell BASH CSH TCSH What is my current shell?  $ echo $0 What is my login shell?  $ echo $SHELL 23

The OMP_NUM_THREADS Environment Variable BASH (Bourne Again Shell) $ export OMP_NUM_THREADS=3 $ echo $OMP_NUM_THREADS CSH/TCSH $ setenv OMP_NUM_THREADS 3 $ echo $OMP_NUM_THREADS Exercise: Change the environment variable to different values and then execute the program omp_test00. 24

#pragma omp parallel for (omp_test01.c) #include int main() { int i; #pragma omp parallel for for (i=1; i<=10; i++) { printf("Hello: %d\n", i ); } 25

#pragma omp parallel for The purpose of the directive #pragma omp parallel for:  Both to create a parallel region and to specify that the iterations of the loop should be distributed among the executing threads  A parallel work-sharing construct 26

#pragma omp parallel for (omp_test02.c) #include int main() { int i; #pragma omp parallel for for (i=1; i<=10; i++) { printf("Hello: %d (thread=%d, #threads=%d)\n", i, omp_get_thread_num(), omp_get_num_threads() ); } /*-- End of omp parallel for --*/ } 27

Executing omp_test02 $ gcc -fopenmp -o omp_test02 omp_test02.c $ export OMP_NUM_THREADS=1 $./omp_test02 $ export OMP_NUM_THREADS=2 $./omp_test02 $ export OMP_NUM_THREADS=4 $./omp_test02 $ export OMP_NUM_THREADS=10 $./omp_test02 $ export OMP_NUM_THREADS=100 $./omp_test02 28

Executing omp_test02 The work in the for-loop is shared among threads. You can specify the number of threads (for sharing the work) via the OMP_NUM_THREADS environment variable. 29

OpenMP: shared & private data Data in an OpenMP program is either shared by threads in a team, or is private. Private data: Each thread has its own copy of the data object, and hence the variable may have different values for different threads. Shared data: The shared data will be shared among the threads executing the parallel region it is associated with; each thread can freely read or modify the values of shared data. 30

OpenMP: shared & private data (omp_test03.c) #include int main() { int i; int a=101, b=102, c=103, d=104; #pragma omp parallel for shared(c,d) private(i,a,b) for (i=1; i<=10; i++) { a = 201; d = 204; printf("Hello: %d (thread_id=%d, #threads=%d), a=%d, b=%d, c=%d, d=%d\n", i, omp_get_thread_num(), omp_get_num_threads(), a, b, c, d ); } /*-- End of omp parallel for --*/ printf("a=%d, b=%d, c=%d, d=%d\n", a, b, c, d); } 31

Executing omp_test03 #include int main() { int i; int a=101, b=102, c=103, d=104; #pragma omp parallel for shared(c,d) private(i,a,b) for (i=1; i<=10; i++) { a = 201; d = 204; printf("Hello: %d (thread_id=%d, #threads=%d), a=%d, b=%d, c=%d, d=%d\n", i, omp_get_thread_num(), omp_get_num_threads(), a, b, c, d ); } /*-- End of omp parallel for --*/ printf("a=%d, b=%d, c=%d, d=%d\n", a, b, c, d); } 32 Hello: 5 (thread_id=1, #threads=3), a=201, b= , c=103, d=204 Hello: 6 (thread_id=1, #threads=3), a=201, b= , c=103, d=204 Hello: 7 (thread_id=1, #threads=3), a=201, b= , c=103, d=204 Hello: 8 (thread_id=1, #threads=3), a=201, b= , c=103, d=204 Hello: 1 (thread_id=0, #threads=3), a=201, b= , c=103, d=204 Hello: 2 (thread_id=0, #threads=3), a=201, b= , c=103, d=204 Hello: 3 (thread_id=0, #threads=3), a=201, b= , c=103, d=204 Hello: 4 (thread_id=0, #threads=3), a=201, b= , c=103, d=204 Hello: 9 (thread_id=2, #threads=3), a=201, b=0, c=103, d=204 Hello: 10 (thread_id=2, #threads=3), a=201, b=0, c=103, d=204 a=101, b=102, c=103, d=204 (Assume that 3 threads are used.)

Race Condition (omp_test04_p.c) int main() { int i; int a=0, b, c=0; #pragma omp parallel for shared(a) private(i,c) for (i=1; i<=50; i++) { a++; for (b=0; b<= ; b++) { c++; c--; } /* for slowing down the thread */ a--; printf("Hello: %d (thread_id=%d, #threads=%d), a=%d\n", i, omp_get_thread_num(), omp_get_num_threads(), a); } /*-- End of omp parallel for --*/ printf("a=%d\n", a); }

Shared Data Can Cause Race Condition An important implication of the shared attribute is that multiple threads might attempt to simultaneously update the same memory location or that one thread might try to read from a location that another thread is updating. Special care has to be taken to ensure that neither of these situations occurs that accesses to shared data are ordered as required by the algorithm. OpenMP places the responsibility for doing so on the user and provides several constructs that may help. 34

Matrix * Vector 35

Matrix * Vector 36 For example:

Matrix * Vector 37

Matrix * Vector – main() 38 /* Figure 3.5 */ int main(void) { double *a, *b, *c;int i, j, m, n; printf("Please give m and n: "); scanf("%d %d", &m, &n); if ( (a=(double *)malloc(m*sizeof(double))) == NULL ) perror("memory allocation for a"); if ( (b=(double *)malloc(m*n*sizeof(double))) == NULL ) perror("memory allocation for b"); if ( (c=(double *)malloc(n*sizeof(double))) == NULL ) perror("memory allocation for c"); printf("Initializing matrix B and vector c\n"); for (j=0; j<n; j++) c[j] = 2.0; for (i=0; i<m; i++) for (j=0; j<n; j++) b[i*n+j] = i; printf("Executing mxv function for m = %d n = %d\n", m, n); (void) mxv(m, n, a, b, c); free(a); free(b); free(c); return(0); }

Matrix * Vector – mxv() - sequential 39 /* Figure 3.7 */ void mxv( int m, int n, double * a, double * b, double * c ) { int i, j; for (i=0; i<m; i++) { a[i] = 0.0; for (j=0; j<n; j++) a[i] += b[i*n+j]*c[j]; }

Matrix * Vector – mxv() - parallel 40 /* Figure 3.10 */ void mxv( int m, int n, double * a, double * b, double * c ) { int i, j; #pragma omp parallel for default(none) \ shared(m,n,a,b,c) private(i,j) for (i=0; i<m; i++) { a[i] = 0.0; for (j=0; j<n; j++) a[i] += b[i*n+j]*c[j]; } /*-- End of omp parallel for --*/ }