1 0907532 Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.

Slides:



Advertisements
Similar presentations
Introduction to Openmp & openACC
Advertisements

Introductions to Parallel Programming Using OpenMP
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.
Indian Institute of Science Bangalore, India भारतीय विज्ञान संस्थान बंगलौर, भारत Supercomputer Education and Research Centre (SERC) Adapted from: o “MPI-Message.
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.
Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.
1 OpenMP—An API for Shared Memory Programming Slides are based on:
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
OpenMPI Majdi Baddourah
INTEL CONFIDENTIAL OpenMP for Domain Decomposition Introduction to Parallel Programming – Part 5.
INTEL CONFIDENTIAL Confronting Race Conditions Introduction to Parallel Programming – Part 6.
Overview of Intel® Core 2 Architecture and Software Development Tools June 2009.
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.
– 1 – Basic Machine Independent Performance Optimizations Topics Load balancing (review, already discussed) In the context of OpenMP notation Performance.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Programming with Shared Memory Introduction to OpenMP
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Parallel Programming in Java with Shared Memory Directives.
Lecture 5: Shared-memory Computing with Open MP. Shared Memory Computing.
2 3 Parent Thread Fork Join Start End Child Threads Compute time Overhead.
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
OpenMP China MCP.
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
ECE 1747 Parallel Programming Shared Memory: OpenMP Environment and Synchronization.
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Threaded Programming Lecture 4: Work sharing directives.
Introduction to OpenMP
DEV490 Easy Multi-threading for Native.NET Apps with OpenMP ™ and Intel ® Threading Toolkit Software Application Engineer, Intel.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Threaded Programming Lecture 2: Introduction to OpenMP.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-3. OMP_INIT_LOCK OMP_INIT_NEST_LOCK Purpose: ● This subroutine initializes a lock associated with the lock variable.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
CS240A, T. Yang, Parallel Programming with OpenMP.
CS 110 Computer Architecture Lecture 20: Thread-Level Parallelism (TLP) and OpenMP Intro Instructor: Sören Schwertfeger School.
OpenMP – Part 2 * *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
B. Estrade, LSU – High Performance Computing Enablement Group OpenMP II B. Estrade.
Shared Memory Parallelism - OpenMP
Lecture 5: Shared-memory Computing with Open MP
SHARED MEMORY PROGRAMMING WITH OpenMP
CS427 Multicore Architecture and Parallel Computing
Loop Parallelism and OpenMP CS433 Spring 2001
Open[M]ulti[P]rocessing
Computer Engg, IIT(BHU)
Introduction to OpenMP
Shared-Memory Programming
September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.
Computer Science Department
Parallel Programming with OpenMP
Introduction to High Performance Computing Lecture 20
Programming with Shared Memory Introduction to OpenMP
Shared-Memory Paradigm & OpenMP
Parallel Programming with OPENMP
Presentation transcript:

Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing

Topics: Discussion of what OpenMP is and what it can do Parallel regions Work sharing – “parallel for” Work queuing – “taskq” Shared & private variables Protection of shared data, critical sections, locks etc Reduction clause

What is OpenMP A collection of compiler directives, library routines, and environment variables that can be used to specify shared memory parallelism. Designed with the cooperation of many computer vendors including Intel, HP, IBM, and SGI. For this reason it has become the standard (and therefore portable) way of programming SMPs. The Fortran directives are very similar to the C/C++ OpenMP directives.

OpenMP Programming Model Fork-join parallelism: Master thread spawns a team of threads as needed Parallelism is added incrementally: the sequential program evolves into a parallel program Fork/Join threading model Parallel Regions Master Thread time

Writing parallel Code The section of code that is meant to run in parallel is marked, with a compiler directive that will cause the threads to form before the section is executed. Example: Parallelize a “for loop”  Preceed the loop with the compiler directive #pragma omp parallel for OR #pragma omp parallel #pragma omp for

Writing parallel Code (continued) Each thread has an "id" attached to it. The thread id is an integer, and the master thread has an id of "0". After the execution of the parallelized code, the threads "join" back into the master thread, which continues onward to the end of the program.

Example Loop for(i = 1, i < 13, i++) c[i] = a[i] + b[i] On a three core chip, 4 iterations of the loop are assigned to a different thread which will execute on a different core Thread 0 executes i= 1, 2, 3, 4 Thread 1 executes i= 5, 6, 7, 8 Thread 2 executes i= 9, 10, 11, 12

Work-sharing Construct Threads are assigned an independent set of iterations Threads must wait at the end of work-sharing construct #pragma omp parallel #pragma omp for Implicit barrier i = 1 i = 2 i = 3 i = 4 i = 5 i = 6 i = 7 i = 8 i = 9 i = 10 i = 11 i = 12 #pragma omp parallel #pragma omp for for(i = 1, i < 13, i++) c[i] = a[i] + b[i] #pragma omp parallel for for(i = 1, i < 13, i++) c[i] = a[i] + b[i] Or

Race Condition A race condition is nondeterministic behavior caused by the times at which two or more threads access a shared variable

Race Condition For example, suppose both Thread A and Thread B are executing the statement area += 4.0 / (1.0 + x*x); If variable “area” is shared, then we could run into a race condition.  nondeterministic behavior caused by the times at which threads access shared variable area

Two Timings Value of area Thread AThread B

Two Timings Value of area Thread AThread B Value of area Thread AThread B Order of thread execution causes non determinant behavior in a data race

The Private Clause Can you spot the Race Condition? Make x & y private to each thread to resolve race condition void* work(float* c, int N) { float x, y; int i; #pragma omp parallel for for(i=0; i<N; i++) { x = a[i]; y = b[i]; c[i] = x + y; } private(x,y)

Cache effect Poor use of cache can degrade performance by a factor of 10 in some loops Techniques for exploiting cache reuse  Loop interchange  Cache Blocking  False Sharing

Loop Interchange for(i=0;i<NUM;i++) for(j=0;j<NUM;j++) for(k=0;k<NUM;k++) c[i][j] =c[i][j] + a[i][k] * b[k][j];

Loop Interchange for(i=0;i<NUM;i++) for(j=0;j<NUM;j++) for(k=0;k<NUM;k++) c[i][j] =c[i][j] + a[i][k] * b[k][j]; for(i=0;i<NUM;i++) for(k=0;k<NUM;k++) for(j=0;j<NUM;j++) c[i][j] =c[i][j] + a[i][k] * b[k][j]; Fast Loop Index Non unit stride skipping in memory can cause cache thrashing – particularly for arrays sizes 2^n

Pan ready to fry eggs Refrigerator Poor Cache Uilization - with Eggs : Carton represents cache line Refrigerator represents main memory Table represents cache When table is filled up – old cartons are evicted and most eggs are wasted Request for an egg not already on table, brings a new carton of eggs from the refrigerator, but user only fries one egg from each carton. When table fills up old carton is evicted User requests one specific egg User requests 2 nd specific egg User requests a 3rd egg – Carton evicted

Refrigerator : Good Cache Utilization - with Eggs Carton eviction doesn’t hurt us because we’ve already fried all the eggs in the cartons on the table – just like previous user User eventually asks for all the eggs Request for one egg brings new carton of eggs from refrigerator User specifically requests eggs form carton already on table User fries all eggs in carton before egg from next carton is requested