O PEN MP (O PEN M ULTI -P ROCESSING ) David Valentine Computer Science Slippery Rock University.

Slides:



Advertisements
Similar presentations
OpenMP.
Advertisements

Parallel Processing with OpenMP
Introductions to Parallel Programming Using OpenMP
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
1 OpenMP—An API for Shared Memory Programming Slides are based on:
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Parallel Programming by Tiago Sommer Damasceno Using OpenMP
Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
OpenMPI Majdi Baddourah
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
INTEL CONFIDENTIAL OpenMP for Domain Decomposition Introduction to Parallel Programming – Part 5.
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Programming with Shared Memory Introduction to OpenMP
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
1 Datamation Sort 1 Million Record Sort using OpenMP and MPI Sammie Carter Department of Computer Science N.C. State University November 18, 2004.
Lecture 5: Shared-memory Computing with Open MP. Shared Memory Computing.
Lecture 8: Caffe - CPU Optimization
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP Blue Waters Undergraduate Petascale Education Program May 29 – June
OpenMP: Open specifications for Multi-Processing What is OpenMP? Join\Fork model Join\Fork model Variables Variables Explicit parallelism Explicit parallelism.
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
SHARED-MEMORY PROGRAMMING 6 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM SHARED-MEMORY PROGRAMMING 6 th week References Introduction.
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Introduction to OpenMP
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,
Threaded Programming Lecture 2: Introduction to OpenMP.
Introduction to Pragnesh Patel 1 NICS CSURE th June 2015.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
Heterogeneous Computing using openMP lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
Heterogeneous Computing using openMP lecture 1 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
OpenMP An API : For Writing Portable SMP Application Software Rider NCHC GTD.
Introduction to OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
Shared Memory Parallelism - OpenMP
Lecture 5: Shared-memory Computing with Open MP
SHARED MEMORY PROGRAMMING WITH OpenMP
CS427 Multicore Architecture and Parallel Computing
Open[M]ulti[P]rocessing
Computer Engg, IIT(BHU)
Introduction to OpenMP
September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.
Exploiting Parallelism
Shared Memory Programming with OpenMP
Multi-core CPU Computing Straightforward with OpenMP
Parallel Programming with OpenMP
September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Wed. Jan. 31, 2001 *Parts.
September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson *Parts of this lecture.
Programming with Shared Memory Introduction to OpenMP
DNA microarrays. Infinite Mixture Model-Based Clustering of DNA Microarray Data Using openMP.
Shared-Memory Paradigm & OpenMP
Parallel Programming with OPENMP
Presentation transcript:

O PEN MP (O PEN M ULTI -P ROCESSING ) David Valentine Computer Science Slippery Rock University

T HE BUZZ FOR O PEN MP There are more than a dozen events at SC12 with “OpenMP” in their titles. OpenMP celebrating 15 years: booth #2237 API designed for C/C++ and FORTRAN Shared memory parallelism, in multicore world As such, is an incremental learning curve for current programmers: Start with serial code Grab the obvious parallelizable sections to get the quickest results (Amdahl’s Law).

S HARED M EMORY P ARALLELISM Our world has already gone multicore How best can we take advantage of the cores already on the desktop without jumping into the weeks of low-level thread manipulation? There are several choices: openMP Cilk Threaded Building Blocks (TBB)

O PEN MP ( OPEN MULTI - PROCESSING ) O PEN MP. ORG Started in 1997, as continuation of ANSI X3H5 Supported by industry (HP, IBM, Intel, Sun, et al) and government (DoE) Designed for shared memory, multicore Thread based parallelism Explicit programmer control Fork-join model

O PEN MP F ORK -J OIN M ODEL Explicit programmer control Can use thread number ( omp_get_thread_num() ) to set different tasks per thread in parallel region wikipedia.com For k Join

O PEN MP Made of 3 components: Compiler Directives (20 as of 3.1) #pragma omp parallel will spawn parallel region Run time library routines (32) int myNum = omp_get_thread_num( ); Environment Variables (9) setenv OMP_NUM_THREADS 8

O PEN MP G OALS Their 4 stated goals are: Standardization Lean and Mean Ease of Use Portability CS2 students see their programs “go parallel” with just 2 or 3 lines of additional code! At this level we are just exposing them to the concept of mulitcore, shared memory parallelism

G ENERAL C ODE S TRUCTURE ( FROM HTTPS :// COMPUTING. LLNL. GOV / TUTORIALS / OPEN MP/#A BSTRACT ) HTTPS :// COMPUTING. LLNL. GOV / TUTORIALS / OPEN MP/#A BSTRACT #include main () { int var1, var2, var3; Serial code … Beginning of parallel section. Fork a team of threads. Specify variable scoping #pragma omp parallel private(var1, var2) shared(var3) { Parallel section executed by all threads Other OpenMP directives Run-time Library calls All threads join master thread and disband } //parallel block Resume serial code … }//main

T HE OBLIGATORY H ELLO W ORLD EXAMPLE Compile with OpenMP enabled Project-Properties-Configuration Properties-C/C++ -Language – OpenMP Support – YES Or gcc uses -fopenmp #include int main() { printf("Getting started...\n\n"); #pragma omp parallel printf("Hello World from thread %i of %i\n", omp_get_thread_num(), omp_get_num_threads()); printf("\nThat's all Folks!\n"); return 0; }

F OR CS1/CS2 NB most programs are severely I/O bound But, we are looking for only: A simple counting loop (FOR) where each iteration is independent, and has enough work to distribute across our cores The first two requirements are easy- the third one can involve “handicapping” the loop work We won’t show them nearly all of OpenMP; we just want to whet their appetites here Tell them the Truth, tell them nothing but the Truth, but for heaven’s sake don’t tell them ALL the Truth!

EG. T RAPEZOIDAL R ULE float trap(float xLo, float xHi, int numIntervals) { float area;//area under the curve (the integral) float width;//width of each trapezoid float x;//our points of evaluation float sum;//sum up all our f(x)’s sum= 0.0;//init our summing var width = (xHi-xLo)/numIntervals;//width of each trap for(int i=1; i<numIntervals; i++) {//get the interior points x = xLo + i*width;//each iter. independent of others sum += f(x);//add the interior value }//for sum += (f(xLo) + f(xHi))/2.0; //add the endpoints area = width * sum;//calc the total area return area;//return the approximation }//trap

EG. T RAPEZOIDAL R ULE Students add two lines: #include #pragma When they see the cores all they are hooked. #pragma omp parallel for private(x) reduction(+:sum) for(int i=1; i<numIntervals; i++) {//get the interior points x = xLo + i*width;//each iteration independent of others sum += f(x);//add the interior value }//for