Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programming with OpenMP*” Part II Intel Software College.

Similar presentations


Presentation on theme: "Programming with OpenMP*” Part II Intel Software College."— Presentation transcript:

1 Programming with OpenMP*” Part II Intel Software College

2 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 2 Programming with OpenMP* Agenda What is OpenMP? Parallel Regions Worksharing Construct Data Scoping to Protect Data Explicit Synchronization Scheduling Clauses Other Helpful Constructs and Clauses

3 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 3 Programming with OpenMP* Assigning Iterations The schedule clause affects how loop iterations are mapped onto threads schedule(static [,chunk]) Blocks of iterations of size “chunk” to threads Round robin distribution schedule(dynamic[,chunk]) Threads grab “chunk” iterations When done with iterations, thread requests next set schedule(guided[,chunk]) Dynamic schedule starting with large block Size of the blocks shrink; no smaller than “chunk”

4 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 4 Programming with OpenMP* Schedule ClauseWhen To Use STATIC Predictable and similar work per iteration DYNAMIC Unpredictable, highly variable work per iteration GUIDED Special case of dynamic to reduce scheduling overhead Which Schedule to Use

5 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 5 Programming with OpenMP* Schedule Clause Example #pragma omp parallel for schedule (static, 8) for( int i = start; i <= end; i += 2 ) { if ( TestForPrime(i) ) gPrimesFound++; } Iterations are divided into chunks of 8 If start = 3, then first chunk is i ={3,5,7,9,11,13,15,17}

6 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 6 Programming with OpenMP* Parallel Sections Independent sections of code can execute concurrently SerialParallel #pragma omp parallel sections { #pragma omp section phase1(); #pragma omp section phase2(); #pragma omp section phase3(); }

7 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Sections are distributed among the threads in the parallel team. Each section is executed only once and each thread may execute zero or more sections. It’s not possible to determine whether or not a section will be executed before another. Therefore, the output of one section should not serve as the input to another. Instead, the section that generates output should be moved before the sections construct. 7 Programming with OpenMP* Parallel Sections (continued)

8 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 8 Programming with OpenMP* Denotes block of code to be executed by only one thread First thread to arrive is chosen Implicit barrier at end Single Construct #pragma omp parallel { DoManyThings(); #pragma omp single { ExchangeBoundaries(); } // threads wait here for single (Implicit Barrier) DoManyMoreThings(); }

9 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 9 Programming with OpenMP* Denotes block of code to be executed only by the master thread No implicit barrier at end Master Construct #pragma omp parallel { DoManyThings(); #pragma omp master { // if not master skip to next stmt ExchangeBoundaries(); } DoManyMoreThings(); }

10 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 10 Programming with OpenMP* Implicit Barriers Several OpenMP* constructs have implicit barriers parallel for single Unnecessary barriers hurt performance Waiting threads accomplish no work! Suppress implicit barriers, when safe, with the nowait clause

11 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 11 Programming with OpenMP* Nowait Clause Use when threads would wait between independent computations #pragma single nowait { [...] } #pragma omp for nowait for(...) {...}; #pragma omp for schedule(dynamic,1) for(int i=0; i<n; i++) a[i] = bigFunc1(i); #pragma omp for schedule(dynamic,1) for(int j=0; j<m; j++) b[j] = bigFunc2(j); nowait

12 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 12 Programming with OpenMP* Need An Explicit Barrier Construct Each thread waits until all threads arrive Consider the following code segment: #pragma omp parallel shared (A, B, C) { DoSomeWork(A,B); printf(“Processed A into B\n”); DoSomeWork(B,C); printf(“Processed B into C\n”); } #pragma omp barrier What is Wrong?

13 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 13 Programming with OpenMP* More on synchronization Consider the following code segment: #pragma omp parallel for shared(x, y, index, n) for (i = 0; i < n; i++) { x[index[i]] += work1(i); y[i] += work2(i); } What is Wrong? Since index[i] can be the same for different i values, the update to x must be protected. Example if i=0, index[0]=5, x[5]=82 i=1, index[1]=5, x[5]=12 There will be a “Race Condition” between threads 0 & 1

14 Race Condition A race condition is nondeterministic behavior caused by the times at which two or more threads access a shared variable For example, suppose both Thread A and Thread B are executing the statement area += 4.0 / (1.0 + y*y);

15 Two Timings: Possibility 1 Value of area Thread AThread B 11.667 +3.765 15.432 + 3.563 18.995

16 Two Timings : Possibility 2 Value of area Thread AThread B 11.667 +3.765 11.667 15.432 + 3.563 15.230

17 Two Timings Value of area Thread AThread B 11.667 +3.765 15.432 + 3.563 18.995 Value of area Thread AThread B 11.667 +3.765 11.667 15.432 + 3.563 15.230 Order of thread execution causes non determinant behavior in a data race

18 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 18 Programming with OpenMP* Using a critical section #pragma omp parallel for shared(x, y, index, n) for (i = 0; i < n; i++) { #pragma omp critical x[index[i]] += work1(i); y[i] += work2(i); } This would serialize update of x; Threads wait their turn – only one at a time updates x However, this might be an overkill. Not All UPDATES to all elements of X might need to be protected

19 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 19 Programming with OpenMP* Using a critical section #pragma omp parallel for shared(x, y, index, n) for (i = 0; i < n; i++) { #pragma omp critical x[index[i]] += work1(i); y[i] += work2(i); } Example if i=0, index[0]=5, x[5]=82 i=1, index[1]=29,x[29]=12 i=2, index[2]=8, x[8]=435 i=3, index[3]=5, x[5]=12 Only updates to x[5] need to be protected

20 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 20 Programming with OpenMP* Atomic Construct Special case of a critical section Applies only to simple update of memory location #pragma omp parallel for shared(x, y, index, n) for (i = 0; i < n; i++) { #pragma omp atomic x[index[i]] += work1(i); y[i] += work2(i); } Atomic protects individual elements of x array, so that if multiple, concurrent instances of index[i] are different, updates can still be done in parallel.

21 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 21 Programming with OpenMP* Atomic Construct Special case of a critical section Applies only to simple update of memory location #pragma omp parallel for shared(x, y, index, n) for (i = 0; i < n; i++) { #pragma omp atomic x[index[i]] += work1(i); y[i] += work2(i); } In the previous example i=0, index[0]=5, x[5]=82 i=1, index[1]=29,x[29]=12 i=2, index[2]=8, x[8]=435 i=3, index[3]=5, x[5]=12 Updates to x[29] and x[8] will be done in parallel; updates to x[5] will be serialized.

22 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 22 Programming with OpenMP* OpenMP* API Get the thread number within a team int omp_get_thread_num(void); Get the number of threads in a team int omp_get_num_threads(void); Usually not needed for OpenMP codes Does have specific uses (debugging) Must include a header file #include

23 Workshop on Patterns in High Performance Computing University of Illinois at Urbana-Champaign May 4-6, 2005

24 Slide-24 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes HPC Applications Challenging? 1:15-1:30Piotr LuszczekHPCchallenge Challenges 1:30-1:45Fred TracyAlgorithm Comparisons of Application Benchmarks 1:45-2:00Henry NewmanI/O Challenges 2:00-2:15Phil ColellaThe Seven Dwarfs 2:15-2:30Glenn LueckeRun-Time Error Detection Benchmark 2:30-3:00Break 3:00-3:15Bill MannSSCA #1 Draft Specification 3:15-3:30Theresa MeuseSSCA #6 Draft Specification 3:30-??Discussions — User Needs HPCS Vendor Needs for the MS4 Review HPCS Vendor Needs for the MS5 Review HPCS Productivity Team Working Groups

25 Slide-25 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Killer Kernels Phil Colella —The Seven Dwarfs C OMPUTATIONAL R ESEARCH D IVISION 1 Algorithms that consume the bulk of the cycles of current high-end systems in DOE Structured Grids (including locally structured grids, e.g. AMR) Unstructured Grids Fast Fourier Transform Dense Linear Algebra Sparse Linear Algebra Particles Monte Carlo (Should also include optimization / solution of nonlinear systems, which at the high end is something one uses mainly in conjunction with the other seven) DARPA

26 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Monte Carlo Simulations Monte Carlo (MC) stochastic methods are based on the use of random numbers and probability statistics to investigate problems in many fields such as: Economics, nuclear physics, and many others 26 Programming with OpenMP*

27 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Monte Carlo Simulations (cont.) Solving equations which describe the interactions between hundreds or thousands of atoms is impossible. However, with MC, a large system can be sampled in a number of random configurations Data can be used to describe the system as a whole 27 Programming with OpenMP*

28 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Example: Calculates Pi based on a MC simulation 28 Programming with OpenMP* Consider a player throwing darts randomly Number of darts that hit shaded part (circle quadrant) is proportional to area of that part

29 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 29 Programming with OpenMP* Monte Carlo Pi loop 1 to MAX x.coor=(random#) y.coor=(random#) dist=sqrt(x^2 + y^2) if (dist <= 1) hits=hits+1 pi = 4 * hits/MAX r

30 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 30 Programming with OpenMP* Making Monte Carlo’s Parallel hits = 0 call SEED48(1) DO I = 1, max x = DRAND48() y = DRAND48() IF (SQRT(x*x + y*y).LT. 1) THEN hits = hits+1 ENDIF END DO pi = REAL(hits)/REAL(max) * 4.0 What is the challenge here?

31 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Making Monte Carlo’s Parallel The Random Number generator maintains a static variable (the Seed). The only way to make the code parallel is to do it in a critical section for each call of the DRAND(), which is a lot of overhead. 31 Programming with OpenMP*

32 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 32 Programming with OpenMP* Lab Activity : Computing Pi, Due, Tuesday, November 25 Use the Intel® Math Kernel Library (Intel® MKL) VSL: Intel MKL’s VSL (Vector Statistics Libraries) VSL creates an array, rather than a single random number VSL can have multiple seeds (one for each thread) Objective: Use basic OpenMP* syntax to make Pi parallel Choose the best code to divide the task up Categorize properly all variables

33 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 33 Programming with OpenMP* Programming with OpenMP What’s Been Covered OpenMP* is: A simple approach to parallel programming for shared memory machines We explored basic OpenMP coding on how to: Make code regions parallel ( omp parallel ) Split up work ( omp for ) Categorize variables ( omp private ….) Synchronize ( omp critical …)

34 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 34 Programming with OpenMP*

35 Advanced Concepts

36 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 36 Programming with OpenMP* More OpenMP* Data environment constructs FIRSTPRIVATE LASTPRIVATE THREADPRIVATE

37 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 37 Programming with OpenMP* Variables initialized from shared variable C++ objects are copy-constructed Firstprivate Clause incr=0; #pragma omp parallel for firstprivate(incr) for (I=0;I<=MAX;I++) { if ((I%2)==0) incr++; A(I)=incr; }

38 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 38 Programming with OpenMP* Variables update shared variable using value from last iteration C++ objects are updated as if by assignment Lastprivate Clause void sq2(int n, double *lastterm) { double x; int i; #pragma omp parallel #pragma omp for lastprivate(x) for (i = 0; i < n; i++){ x = a[i]*a[i] + b[i]*b[i]; b[i] = sqrt(x); } lastterm = x; }

39 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 39 Programming with OpenMP* Preserves global scope for per-thread storage Legal for name-space-scope and file-scope Use copyin to initialize from master thread Threadprivate Clause struct Astruct A; #pragma omp threadprivate(A) … #pragma omp parallel copyin(A) do_something_to(&A); … #pragma omp parallel do_something_else_to(&A); Private copies of “A” persist between regions

40 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 40 Programming with OpenMP* Performance Issues Idle threads do no useful work Divide work among threads as evenly as possible Threads should finish parallel tasks at same time Synchronization may be necessary Minimize time waiting for protected resources

41 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 41 Programming with OpenMP* Load Imbalance Unequal work loads lead to idle threads and wasted time. time Busy Idle #pragma omp parallel { #pragma omp for for( ; ; ){ } time

42 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 42 Programming with OpenMP* #pragma omp parallel { #pragma omp critical {... }... } Synchronization Lost time waiting for locks time Busy Idle In Critical

43 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 43 Programming with OpenMP* Performance Tuning Profilers use sampling to provide performance data. Traditional profilers are of limited use for tuning OpenMP*: Measure CPU time, not wall clock time Do not report contention for synchronization objects Cannot report load imbalance Are unaware of OpenMP constructs Programmers need profilers specifically designed for OpenMP.

44 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 44 Programming with OpenMP* Static Scheduling: Doing It By Hand Must know: Number of threads (Nthrds) Each thread ID number (id) Compute start and end iterations: #pragma omp parallel { int i, istart, iend; istart = id * N / Nthrds; iend = (id+1) * N / Nthrds; for(i=istart;i<iend;i++){ c[i] = a[i] + b[i];} }


Download ppt "Programming with OpenMP*” Part II Intel Software College."

Similar presentations


Ads by Google