Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.

Slides:

Advertisements

Similar presentations

Parallel Processing with OpenMP

Advertisements

Introduction to Openmp & openACC

Distributed Systems CS

SE-292 High Performance Computing

Introductions to Parallel Programming Using OpenMP

NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.

May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.

1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.

Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.

Types of Parallel Computers

Day 10 Threads. Threads and Processes  Process is seen as two entities Unit of resource allocation (process or task) Unit of dispatch or scheduling (thread.

Reference: Message Passing Fundamentals.

1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.

Parallel Programming by Tiago Sommer Damasceno Using OpenMP

1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.

A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.

1 MPI-2 and Threads. 2 What are Threads? l Executing program (process) is defined by »Address space »Program Counter l Threads are multiple program counters.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

Task Farming on HPCx David Henty HPCx Applications Support

High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.

Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.

Programming with Shared Memory Introduction to OpenMP

Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.

CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.

Parallel Programming in Java with Shared Memory Directives.

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

MPI3 Hybrid Proposal Description

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.

OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.

OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)

04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.

Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,

Hybrid MPI and OpenMP Parallel Programming

Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j

OpenMP fundamentials Nikita Panov

Message Passing Programming Model AMANO, Hideharu Textbook pp. １４０－１４７.

High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.

1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.

Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.

Threaded Programming Lecture 2: Introduction to OpenMP.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.

3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,

Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.

CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.

COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University

Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.

Introduction to OpenMP

CS427 Multicore Architecture and Parallel Computing

Computer Engg, IIT(BHU)

Introduction to OpenMP

September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.

Shared Memory Programming with OpenMP

September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Wed. Jan. 31, 2001 *Parts.

September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson *Parts of this lecture.

Introduction to High Performance Computing Lecture 20

Programming with Shared Memory Introduction to OpenMP

Shared Memory Programming

Distributed Systems CS

DNA microarrays. Infinite Mixture Model-Based Clustering of DNA Microarray Data Using openMP.

Introduction to OpenMP

Chapter 4: Threads & Concurrency

Programming Parallel Computers

Presentation transcript:

Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface

 Shared memory machines: all processors are connected to each other and to the whole memory  Distributed memory machines: memory is divided between (groups of) processors which are still interconnected CPU Memory Bus CPU Memory CPU Memory CPU Memory CPU Memory CPU Memory

1. Totally independent programs that will run on separate processors and communicate with each other. 2. Parts of a single code that can be executed in arbitrary order. The compiler is instructed to create parallel threads for such parts.

 Works equally well on shared memory and on distributed memory computers  Offers full control: you can decide when to run in parallel and when serial, synchronize, etc.  The most popular incarnation is Message Passing Interface or MPI (look it up in Wikipedia)

 MPI is a library of functions  Using MPI functions each program (referred to as process) can:  Figure out the total number of processes and its own serial number  Send data to another process  Wait for a data to arrive from another process  Send data to all processes (this implies receive)  Add ID to each message so that one message is not taken for another  Impose synchronization: street lights, barriers or simply waiting until data you sent is received  Decide that a certain part of work (e.g. input/output) is done by one process only (convention is that process number 0 is the main process).

IMPLICIT NONE INCLUDE 'mpif.h' INTEGER IERR,MY_PROC,N_PROC,MAIN_PROC... CALL MPI_INIT(IERR) CALL MPI_COMM_SIZE(MPI_COMM_WORLD, N_PROC,IERR) CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_PROC,IERR) MAIN_PROC=0 if(MY_PROC.EQ.MAIN_PROC) then open(1,file='disk_model.dat',status='OLD',form='UNFORMATTED') read(1) (((rho(iX,iY,iZ),iX=1,nX),iY=1,nY),iZ=1,nZ) close(1) endif CALL MPI_BCAST(rho,ntot,MPI_REAL,MAIN_PROC,MPI_COMM_WORLD,IERR) do i=MY_PROC+1,ntot,N_PROC... acc=rho(kX(i),kY(i),kZ(i))*volume/dist_cube... enddo CALL MPI_REDUCE(acc,gX,ntot,MPI_REAL,MPI_SUM, CALL MPI_FINALIZE(IERR) end

 The simplest thing is to write a single program that decides what it is supposed to do based on the total number of processes and on the process number assigned to it  Now we can run a bunch of copies of this program  To the OS they are still a single program!  Normally when you install MPI you get special scripts for compilation and running:  mpif90 –o mpicode mpicode.f90  mpirun –np 10./mpicode

 An MPI code can send information and wait until it is received but it can also send and not wait (non- blocking send)  Alternatively one can issue a receive statement but not wait!  This allows automatic load balancing: 1. Send work order to one process 2. Issue a non-blocking “receive” and continue with the next process until all are busy 3. Wait until any response 4. Receive the result and send the next work order

 Master-Worker(s) requires having two different codes  Master: performs the input, initialization and then distributing the tasks between “workers”  Worker: wait for initialization then wait for task, complete it, send back the results and wait for more.  In the end Master must inform Workers that job is done and they can finish the execution.

MPI-2 (2009) adds:  one-sided communication (one process can read or write to the variables that belong to another process)  Establish a new MPI-communication space or join into existing one  The behavior of I/O is put under MPI control, that is additional sync allow to order read/write operations coming from different processes.

… can be found at:    

 Open Multi-Processing (OpenMP) is a set of specifications (or interface) for the compilers capable of creating multi-threaded applications.  The programmer gets a possibility to explain the compiler what parts of the code can be split in separate threads and run in parallel.  The compiler does not have to guess – it relies on the hints contained in the OpenMP directives.  Multi-threaded codes see the same variables and memory space: OpenMP is for shared memory computers!

Serial part 1 Parallel part 1 Parallel part 2 Serial part 2 Serial part 3 Serial part 1 Parallel part 1 Parallel part 2 Serial part 2 Serial part 3

 The hints for the compiler are given as pre-processor statements. C, C++ #pragma omp … FORTRAN C$omp... or !$omp …  Most of statements contain indications for starting/finishing multi-threaded section program hellof90 use omp_lib integer:: id, nthreads !$omp parallel private(id) id = omp_get_thread_num() write (*,*) 'Hello World ‘// & 'from thread', id !$omp barrier if(id == 0) then nthreads= omp_get_num_threads() write (*,*) 'There are', & nthreads, 'threads' end if !$omp end parallel end program hellof90 program hellof90 use omp_lib integer:: id, nthreads !$omp parallel private(id) id = omp_get_thread_num() write (*,*) 'Hello World ‘// & 'from thread', id !$omp barrier if(id == 0) then nthreads= omp_get_num_threads() write (*,*) 'There are', & nthreads, 'threads' end if !$omp end parallel end program hellof90

 In parallel statements one can specify variables that must be private for each thread.  The loop variable that is scheduled for parallel execution is automatically made private.  Sometimes it is easier to declare certain sections of the code as critical. This means that although it will be computed by several threads, it will be done sequentially.  OMP has even more sophisticated tools: barriers, reduction, conditional parallelization etc.  OMP directives are not 100% obligatory for the compiler, they are just hints!

… can be found here:    

MPI Runs on shared and distributed memory computers Program behavior and performance are fully predictable Very flexible Needs modifications of the source code (will not run without MPI installed) All message passing requires a separate buffer (two in distributed memory system) Heavily depends on the latency of interconnection OpenMP Does not require code modification and is fully portable Has minimum overhead in terms of memory Primarily aimed at parallelization of loops Only runs on shared memory machines No standard for I/O parallelization

 Nothing prevents you from combing MPI and OpenMP  Most of modern computing clusters consists of distributed memory machines with fast interconnect (up 100 Gbyte/second!)  Each processor is normally a multi-core CPU  UPPMAX Isis: 200 IBM x3455 compute nodes with two dual- and quad-core AMD CPU in each  I am done …