Dr. Muhammed Al-Mulhem 1ICS535-101 ICS 535 Design and Implementation of Programming Languages Part 1 OpenMP -Example ICS 535 Design and Implementation.

Slides:



Advertisements
Similar presentations
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
Advertisements

Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Operational Semantics ICS.
Games at Bolton OpenMP Techniques Andrew Williams
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Axiomatic Semantics ICS 535.
INTEL CONFIDENTIAL OpenMP for Domain Decomposition Introduction to Parallel Programming – Part 5.
INTEL CONFIDENTIAL Confronting Race Conditions Introduction to Parallel Programming – Part 6.
INTEL CONFIDENTIAL OpenMP for Task Decomposition Introduction to Parallel Programming – Part 8.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Denotational Semantics ICS.
OPERATING SYSTEMS DESIGN AND IMPLEMENTATION Third Edition ANDREW S. TANENBAUM ALBERT S. WOODHULL Yan hao (Wilson) Wu University of the Western.
Programming with Shared Memory Introduction to OpenMP
Parallel Programming in Java with Shared Memory Directives.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Algorithms and Programming
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
Describe the Program Development Cycle. Program Development Cycle The program development cycle is a series of steps programmers use to build computer.
Computer Organization David Monismith CS345 Notes to help with the in class assignment.
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
CS 320 Assignment 1 Rewriting the MISC Osystem class to support loading machine language programs at addresses other than 0 1.
Getting Started with MATLAB 1. Fundamentals of MATLAB 2. Different Windows of MATLAB 1.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
OpenMP fundamentials Nikita Panov
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Threaded Programming Lecture 4: Work sharing directives.
Chapter 7 -1 CHAPTER 7 PROCESS SYNCHRONIZATION CGS Operating System Concepts UCF, Spring 2004.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Introduction to OpenMP
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Threaded Programming Lecture 2: Introduction to OpenMP.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Symbolic Analysis of Concurrency Errors in OpenMP Programs Presented by : Steve Diersen Contributors: Hongyi Ma, Liqiang Wang, Chunhua Liao, Daniel Quinlen,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing SLR Parsing Muhammed Al-Mulhem March 1, 2009.
Concurrency in Java MD. ANISUR RAHMAN. slide 2 Concurrency  Multiprogramming  Single processor runs several programs at the same time  Each program.
4 - Conditional Control Structures CHAPTER 4. Introduction A Program is usually not limited to a linear sequence of instructions. In real life, a programme.
1 Programming with Shared Memory - 3 Recognizing parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Jan 22, 2016.
©2004 Joel Jones 1 CS 403: Programming Languages Lecture 3 Fall 2004 Department of Computer Science University of Alabama Joel Jones.
OpenMP – Part 2 * *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
OpenMP Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Introduction to OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
Memory Consistency Models
Open[M]ulti[P]rocessing
Memory Consistency Models
Computer Engg, IIT(BHU)
Introduction to OpenMP
OpenMP Quiz B. Wilkinson January 22, 2016.
MODERN OPERATING SYSTEMS Third Edition ANDREW S
Multi-core CPU Computing Straightforward with OpenMP
Synchronization Memory Consistency
Introduction to High Performance Computing Lecture 20
Programming with Shared Memory Introduction to OpenMP
Shared Memory Programming
2P13 Week 3.
Using compiler-directed approach to create MPI code automatically
Introduction to OpenMP
Programming with Shared Memory Specifying parallelism
Memory Consistency Models
OpenMP Quiz.
Lecture 2 The Art of Concurrency
Programming with Shared Memory Specifying parallelism
Speaker : Guo-Zhen Wu Date : 2008/12/28
WRITING AN ALGORITHM, PSEUDOCODE, AND FLOWCHART LESSON 2.
Presentation transcript:

Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 OpenMP -Example ICS 535 Design and Implementation of Programming Languages Part 1 OpenMP -Example

Dr. Muhammed Al-Mulhem 2ICS Example Consider a simple C program that calculates values of some mathematical function. Consider a simple C program that calculates values of some mathematical function. The source of these slides is the following: The source of these slides is the following: programming/?f=OpenMP_debug_and_optimization.html&lang=en&content= parallel-programming programming/?f=OpenMP_debug_and_optimization.html&lang=en&content= parallel-programming

Dr. Muhammed Al-Mulhem 3ICS Listing 1 When calling this function with N to 15000, we'll get When calling this function with N to 15000, we'll get

Dr. Muhammed Al-Mulhem 4ICS Listing 2 This function can be easily paralleled with the help of OpenMP. This function can be easily paralleled with the help of OpenMP. To do this we use the #pragma directive before the first for statement : To do this we use the #pragma directive before the first for statement :

Dr. Muhammed Al-Mulhem 5ICS Listing 2 Unfortunately, the code we've created is incorrect and the result of the function is in general undefined. Unfortunately, the code we've created is incorrect and the result of the function is in general undefined. For example, it can be For example, it can be Why it doesn’t work?? Why it doesn’t work??

Dr. Muhammed Al-Mulhem 6ICS Listing 2 The main cause of errors in parallel programs is incorrect work with shared resources, i.e. resources common for all launched processes, and in particular - with shared variables. The main cause of errors in parallel programs is incorrect work with shared resources, i.e. resources common for all launched processes, and in particular - with shared variables. Variables in OpenMP-programs are divided into: Variables in OpenMP-programs are divided into: shared, existing as single copies and available for all the threads, and shared, existing as single copies and available for all the threads, and private, localized in a concrete process. private, localized in a concrete process. By default all the variables in parallel regions of OpenMP are shared save for parallel loops' indexes and variables defined inside these parallel regions. By default all the variables in parallel regions of OpenMP are shared save for parallel loops' indexes and variables defined inside these parallel regions.

Dr. Muhammed Al-Mulhem 7ICS Listing 2 From the example above x, y and s are taken as shared variables which is incorrect. From the example above x, y and s are taken as shared variables which is incorrect. Only s variable should be shared Only s variable should be shared Each process calculates their value of x, y and writes it into the corresponding variable (x or y). Each process calculates their value of x, y and writes it into the corresponding variable (x or y). Considering x and y as shared variables, the result depends on the sequence of executing the parallel threads. Considering x and y as shared variables, the result depends on the sequence of executing the parallel threads.

Dr. Muhammed Al-Mulhem 8ICS Listing 2 To search such errors we need a debugger such as To search such errors we need a debugger such as Intel Thread Checker (dynamic code analyzer) Intel Thread Checker (dynamic code analyzer) VivaMP (static code analyzer) VivaMP (static code analyzer)

Dr. Muhammed Al-Mulhem 9ICS Listing 2 Let's consider s += j*y instruction. Let's consider s += j*y instruction. Originally it is suggested that each thread add the calculated result to the current value of s variable and then the same operations are executed by all the other threads. Originally it is suggested that each thread add the calculated result to the current value of s variable and then the same operations are executed by all the other threads. But in some cases the two threads begin to execute s += j*y instruction simultaneously, that is each of them first reads the current value of s variable, then adds the result of j*y to this value and writes the final result into s variable. But in some cases the two threads begin to execute s += j*y instruction simultaneously, that is each of them first reads the current value of s variable, then adds the result of j*y to this value and writes the final result into s variable.

Dr. Muhammed Al-Mulhem 10ICS Listing 2 You can avoid such a situation by making sure that at any moment only one thread is allowed to execute s += j*y operation. You can avoid such a situation by making sure that at any moment only one thread is allowed to execute s += j*y operation. Such operations are called indivisible or atomic. Such operations are called indivisible or atomic. To make some instruction atomic we use To make some instruction atomic we use #pragma omp atomic #pragma omp atomic The program code in which the described operations are corrected is shown in Listing 3. The program code in which the described operations are corrected is shown in Listing 3.

Dr. Muhammed Al-Mulhem 11ICS Listing 3 Does this work??? Is it effective????

Dr. Muhammed Al-Mulhem 12ICS Listing 3 Running the code shows that it Running the code shows that it Is parallelism effective?. Is parallelism effective?. Let's measure the execution time for three functions: Let's measure the execution time for three functions: 1. sequential, 2. parallel incorrect 3. parallel correct The results of this measuring for N=1500 are given in Table 1 The results of this measuring for N=1500 are given in Table 1

Dr. Muhammed Al-Mulhem 13ICS Listing 3 The correct variant works more than 60 times slower than the sequential one (Why???). The correct variant works more than 60 times slower than the sequential one (Why???). Do we need such parallelism? Of course not. Do we need such parallelism? Of course not.

Dr. Muhammed Al-Mulhem 14ICS Listing 3 The reason is that we have chosen a very ineffective method of solving the problem of summing the result in s variable by using atomic directive. The reason is that we have chosen a very ineffective method of solving the problem of summing the result in s variable by using atomic directive. This approach leads to that the threads wait for each other very often. This approach leads to that the threads wait for each other very often. To avoid constant deadlocks when executing atomic summing operation we can use the special directive reduction. To avoid constant deadlocks when executing atomic summing operation we can use the special directive reduction. reduction option defines that the variable will get the combined value at the exit from the parallel block. reduction option defines that the variable will get the combined value at the exit from the parallel block. The following operations are permissible: +, *, -, &, |, ^, &&, ||. The following operations are permissible: +, *, -, &, |, ^, &&, ||. The modified variant of the function is shown in Listing 4. The modified variant of the function is shown in Listing 4.

Dr. Muhammed Al-Mulhem 15ICS Listing 4 T able 2 shows the result of running this code T able 2 shows the result of running this code

Dr. Muhammed Al-Mulhem 16ICS Listing 4 The code is correct and of higher performance The code is correct and of higher performance The speed of calculations has almost doubled. The speed of calculations has almost doubled.

Dr. Muhammed Al-Mulhem 17ICS Conclusion Although parallel programming provides many ways to increase code effectiveness it demands attention and good knowledge of the used technologies from the programmer. Although parallel programming provides many ways to increase code effectiveness it demands attention and good knowledge of the used technologies from the programmer. Fortunately, there exist such tools as Intel Thread Checker and VivaMP which greatly simplify creation and testing of multi-thread applications. Fortunately, there exist such tools as Intel Thread Checker and VivaMP which greatly simplify creation and testing of multi-thread applications.