3/12/2013Computer Engg, IIT(BHU)1 OpenMP-3. OMP_INIT_LOCK OMP_INIT_NEST_LOCK Purpose: ● This subroutine initializes a lock associated with the lock variable.

Slides:



Advertisements
Similar presentations
OpenMP.
Advertisements

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Parallel Programming – Barriers, Locks, and Continued Discussion of Parallel Decomposition David Monismith Jan. 27, 2015 Based upon notes from the LLNL.
Parallel Processing with OpenMP
Introduction to Openmp & openACC
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
1 Lecture 6 Performance Measurement and Improvement.
Games at Bolton OpenMP Techniques Andrew Williams
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Thread-Level Parallelism (TLP) and OpenMP Instructors: Krste Asanovic & Vladimir Stojanovic.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Programming with Shared Memory Introduction to OpenMP
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
OpenMP China MCP.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
Select the 2nd Gen Intel® Core™ Processor that is Best for YourBusiness Intel® Core™ i3 Processor— Affordable Business PC. CPU Frequency 3.3 GHz with.
OpenMP Martin Kruliš Jiří Dokulil. OpenMP OpenMP Architecture Review Board Compaq, HP, Intel, IBM, KAI, SGI, SUN, U.S. Department of Energy,…
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Hybrid MPI and OpenMP Parallel Programming
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
Copyright © 1997 – 2014 Curt Hill Concurrent Execution of Programs An Overview.
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
Introduction to OpenMP
DEV490 Easy Multi-threading for Native.NET Apps with OpenMP ™ and Intel ® Threading Toolkit Software Application Engineer, Intel.
GPU-based Computing. Tesla C870 GPU 8 KB / multiprocessor 1.5 GB per GPU 16 KB up to 768 threads () up to 768 threads ( 21 bytes of shared memory and.
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Threaded Programming Lecture 2: Introduction to OpenMP.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
Programming Multi-Core Processors based Embedded Systems A Hands-On Experience on Cavium Octeon based Platforms Lab Exercises: Lab 1 (Performance measurement)
CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-2. Environment Variables OMP_NUM_THREADS OMP_SCHEDULE.
OpenMP Runtime Extensions Many core Massively parallel environment Intel® Xeon Phi co-processor Blue Gene/Q MPI Internal Parallelism Optimizing MPI Implementation.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
CS 110 Computer Architecture Lecture 20: Thread-Level Parallelism (TLP) and OpenMP Intro Instructor: Sören Schwertfeger School.
COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University
OpenMP – Part 2 * *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
OpenMP An API : For Writing Portable SMP Application Software Rider NCHC GTD.
Martin Kruliš Jiří Dokulil
Shared Memory Parallelism - OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
CS427 Multicore Architecture and Parallel Computing
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Improving Barrier Performance Dr. Xiao Qin.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing A bug in the rwlock program Dr. Xiao Qin.
Computer Engg, IIT(BHU)
Computer Science Department
Shared Memory Programming with OpenMP
CS4230 Parallel Programming Lecture 12: More Task Parallelism Mary Hall October 4, /04/2012 CS4230.
Multi-core CPU Computing Straightforward with OpenMP
Introduction to High Performance Computing Lecture 20
Hybrid Programming with OpenMP and MPI
Multithreading Why & How.
OpenMP Martin Kruliš.
Shared-Memory Paradigm & OpenMP
Presentation transcript:

3/12/2013Computer Engg, IIT(BHU)1 OpenMP-3

OMP_INIT_LOCK OMP_INIT_NEST_LOCK Purpose: ● This subroutine initializes a lock associated with the lock variable. ● The nest routine is new with OpenMP version 3.0

OMP_DESTROY_LOCK OMP_DESTRIY_NEST_LOCK Purpose: ● This subroutine disassociates the given lock variable from any locks. ● The nest routine is new with OpenMP version 3.0

OMP_SET_LOCK OMP_SET_NEST_LOCK Purpose: ● This subroutine forces the executing thread to wait until the specified lock is available. A thread is granted ownership of a lock when it becomes available. ● The nest routine is new with OpenMP version 3.0

OMP_UNSET_LOCK OMP_UNSET_NEST_LOCK Purpose: ● This subroutine releases the lock from the executing subroutine. ● The nest routine is new with OpenMP version 3.0

OMP_TEST_LOCK OMP_TEST_NEST_LOCK Purpose: ● This subroutine attempts to set a lock, but does not block if the lock is unavailable. ● The nest routine is new with OpenMP version 3.0

OMP_GET_WTIME Purpose: ● Provides a portable wall clock timing routine ● Returns a double-precision floating point value equal to the number of elapsed seconds since some point in the past. Usually used in "pairs" with the value of the first call subtracted from the value of the second call to obtain the elapsed time for a block of code. ● Designed to be "per thread" times, and therefore may not be globally consistent across all threads in a team - depends upon what a thread is doing compared to other threads.

OMP_GET_WTICK Purpose: ● Provides a portable wall clock timing routine ● Returns a double-precision floating point value equal to the number of seconds between successive clock ticks.

Perfomance related Issues

Best Practices  Optimize Barrier Use  Avoid the Ordered Construct  Avoid Large Critical Regions  Maximize Parallel Regions  Address Poor Load Balance

Intel core i7 processor  Features  Model Name : Intel(R) Core(TM) i7 CPU  Cache size : 8192 KB.  #of Cores=4, #of Threads =8  Max Turbo Frequency =3.8GHz  Max Memory Bandwidth : 21 GB/s  This quad-core processor features 8-way multitasking capability and additional L3 cache.  Intel® Hyper-Threading Technology (Intel® HT Technology): allows each core of your processor to work on two tasks at the same time.

AMD Phenon II  Frequency :3.2GHz  Total L2 Cache:3MB L3 Cache:6MB  The AMD Phenom™ II X6 1090T shifts frequency speed from 3.2GHz on six cores, to 3.6GHz on three cores.

Pi function on Intel i7 Processor  Model Name :Intel(R) Core(TM) i7 CPU 2.80GHz  cache size : 8192 KB Terms in Pi function:10 Crore  User time decreases as the no of thread increases upto 8 also can be seen that scalability falls rapidly after 4 threads as the Intel i7 processor is Quad Core Machine.

Pi function on AMD phenon II  User time decreases as the no of thread decreases upto 6 threads also can be seen that scalability falls rapidly after 6 threads

Just 4 statements would do it !!!!  #pragma omp parallel shared(totalTerms,pi) private(mypi) { mypi = 0; #pragma omp for for (i=0; i<totalTerms; i++) mypi += (4*(pow(-1,i)/double(2*i+1))); #pragma omp critical (update_pi) { pi += mypi; } #pragma omp single { std::cout<<"omp_get_num_threads()="<<omp_get_num_threads()<<"\n"; } }

Pi function on i7 Processor

Just 6 statements would do it !!!!  #pragma omp parallel shared(totalTerms,pi,k) private(mypi) { while(k<No_Iterations){ #pragma omp single { pi[k]=0; }mypi = 0; #pragma omp for for (int i=0; i<totalTerms; i++) mypi += (4.0*(pow(-1.0,i)/double(2.0*i+1.0))); #pragma omp critical (update_pi) { pi[k] += mypi; } #pragma omp barrier #pragma omp single { k++; } } }

Summary OpenMP provides a compact, yet powerful programming model for shared memory programming OpenMP preserves the sequential version of the program

Summary Developing an OpenMP program: ➢ Start from a sequential program ➢ Identify the code segment that takes most of the time. ➢ Determine whether the important loops can be parallelized The loops may have critical sections, reduction variables, etc ➢ Determine the shared and private variables. ➢ Add directives. ➢ See for example pi.c and piomp.c program.

● Challenges in developing correct openMP programs ➢ Dealing with loop carried dependence ➢ Removing unnecessary dependencies ➢ Managing shared and private variables

Thanks and References  Wikipedia :  Msdn Magazine 