Download presentation
Presentation is loading. Please wait.
Published byChristal Campbell Modified over 9 years ago
1
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-3
2
OMP_INIT_LOCK OMP_INIT_NEST_LOCK Purpose: ● This subroutine initializes a lock associated with the lock variable. ● The nest routine is new with OpenMP version 3.0
3
OMP_DESTROY_LOCK OMP_DESTRIY_NEST_LOCK Purpose: ● This subroutine disassociates the given lock variable from any locks. ● The nest routine is new with OpenMP version 3.0
4
OMP_SET_LOCK OMP_SET_NEST_LOCK Purpose: ● This subroutine forces the executing thread to wait until the specified lock is available. A thread is granted ownership of a lock when it becomes available. ● The nest routine is new with OpenMP version 3.0
5
OMP_UNSET_LOCK OMP_UNSET_NEST_LOCK Purpose: ● This subroutine releases the lock from the executing subroutine. ● The nest routine is new with OpenMP version 3.0
6
OMP_TEST_LOCK OMP_TEST_NEST_LOCK Purpose: ● This subroutine attempts to set a lock, but does not block if the lock is unavailable. ● The nest routine is new with OpenMP version 3.0
7
OMP_GET_WTIME Purpose: ● Provides a portable wall clock timing routine ● Returns a double-precision floating point value equal to the number of elapsed seconds since some point in the past. Usually used in "pairs" with the value of the first call subtracted from the value of the second call to obtain the elapsed time for a block of code. ● Designed to be "per thread" times, and therefore may not be globally consistent across all threads in a team - depends upon what a thread is doing compared to other threads.
8
OMP_GET_WTICK Purpose: ● Provides a portable wall clock timing routine ● Returns a double-precision floating point value equal to the number of seconds between successive clock ticks.
9
Perfomance related Issues
10
Best Practices Optimize Barrier Use Avoid the Ordered Construct Avoid Large Critical Regions Maximize Parallel Regions Address Poor Load Balance
11
Intel core i7 processor Features Model Name : Intel(R) Core(TM) i7 CPU 930@2.80GHz Cache size : 8192 KB. #of Cores=4, #of Threads =8 Max Turbo Frequency =3.8GHz Max Memory Bandwidth : 21 GB/s This quad-core processor features 8-way multitasking capability and additional L3 cache. Intel® Hyper-Threading Technology (Intel® HT Technology): allows each core of your processor to work on two tasks at the same time.
12
AMD Phenon II Frequency :3.2GHz Total L2 Cache:3MB L3 Cache:6MB The AMD Phenom™ II X6 1090T shifts frequency speed from 3.2GHz on six cores, to 3.6GHz on three cores.
13
Pi function on Intel i7 Processor Model Name :Intel(R) Core(TM) i7 CPU 930 @ 2.80GHz cache size : 8192 KB Terms in Pi function:10 Crore User time decreases as the no of thread increases upto 8 also can be seen that scalability falls rapidly after 4 threads as the Intel i7 processor is Quad Core Machine.
14
Pi function on AMD phenon II User time decreases as the no of thread decreases upto 6 threads also can be seen that scalability falls rapidly after 6 threads
15
Just 4 statements would do it !!!! #pragma omp parallel shared(totalTerms,pi) private(mypi) { mypi = 0; #pragma omp for for (i=0; i<totalTerms; i++) mypi += (4*(pow(-1,i)/double(2*i+1))); #pragma omp critical (update_pi) { pi += mypi; } #pragma omp single { std::cout<<"omp_get_num_threads()="<<omp_get_num_threads()<<"\n"; } }
16
Pi function on i7 Processor
17
Just 6 statements would do it !!!! #pragma omp parallel shared(totalTerms,pi,k) private(mypi) { while(k<No_Iterations){ #pragma omp single { pi[k]=0; }mypi = 0; #pragma omp for for (int i=0; i<totalTerms; i++) mypi += (4.0*(pow(-1.0,i)/double(2.0*i+1.0))); #pragma omp critical (update_pi) { pi[k] += mypi; } #pragma omp barrier #pragma omp single { k++; } } }
18
Summary OpenMP provides a compact, yet powerful programming model for shared memory programming OpenMP preserves the sequential version of the program
19
Summary Developing an OpenMP program: ➢ Start from a sequential program ➢ Identify the code segment that takes most of the time. ➢ Determine whether the important loops can be parallelized The loops may have critical sections, reduction variables, etc ➢ Determine the shared and private variables. ➢ Add directives. ➢ See for example pi.c and piomp.c program.
20
● Challenges in developing correct openMP programs ➢ Dealing with loop carried dependence ➢ Removing unnecessary dependencies ➢ Managing shared and private variables
21
Thanks and References Wikipedia : http://en.wikipedia.org/wiki/OpenMPhttp://en.wikipedia.org/wiki/OpenMP Msdn Magazine www.openmp.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.