1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of.

Slides:



Advertisements
Similar presentations
The University of Adelaide, School of Computer Science
Advertisements

Shared-Memory Model and Threads Intel Software College Introduction to Parallel Programming – Part 2.
Chapter 8: Main Memory.
Process Description and Control
Distributed Systems Architectures
Processes and Operating Systems
Introduction to C Programming
1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
1 An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory.
1.
OpenMP.
Briana B. Morrison Adapted from William Collins
Chapter 10: Virtual Memory
MPI Message Passing Interface Portable Parallel Programs.
MPI Message Passing Interface
1 Processes and Threads Chapter Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling.
VSCSE Summer School Programming Heterogeneous Parallel Computing Systems Lecture 6: Basic CUDA and MPI.
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;
Practical techniques & Examples
Winter, 2004CSS490 MPI1 CSS490 Group Communication and MPI Textbook Ch3 Instructor: Munehiro Fukuda These slides were compiled from the course textbook,
MPI Collective Communications
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.
Reference: Message Passing Fundamentals.
Reference: Getting Started with MPI.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
Parallel Programming with Java
High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
Hybrid MPI and OpenMP Parallel Programming
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
Processes Introduction to Operating Systems: Module 3.
Introduction to OpenMP
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Chapter 4 Message-Passing Programming. The Message-Passing Model.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
MPI and OpenMP.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
Threaded Programming Lecture 2: Introduction to OpenMP.
An Introduction to MPI (message passing interface)
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
MPI Adriano Cruz ©2003 NCE/UFRJ e Adriano Cruz NCE e IM - UFRJ Summary n References n Introduction n Point-to-point communication n Collective.
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
Protection of System Resources
Introduction to MPI.
Computer Engg, IIT(BHU)
MPI Message Passing Interface
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Mary Hall November 3, /03/2011 CS4961.
MPI-Message Passing Interface
MPI Message Passing Interface
Programming Parallel Computers
Presentation transcript:

1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of the distributed memory machines The users have a virtual global address space and the message passing underneath is sorted out by DSM transparently from the users Then we can use shared memory programming techniques Software of implementing DSM

2 Computer Science, University of Warwick Three types of DSM implementations Page-based technique The virtual global address space is divided into equal sized chunks (pages) which are spread over the machines Page is the minimal sharing unit The request by a process to access a non-local piece of memory results in a page fault a trap occurs and the DSM software fetches the required page of memory and restarts the instruction a decision has to be made whether to replicate pages or maintain only one copy of any page and move it around the network The granularity of the pages has to be decided before implementation

3 Computer Science, University of Warwick Three types of DSM implementations Shared-variable based technique only the variables and data structures required by more than one process are shared. Variable is minimal sharing unit Trade-off between consistency and network traffic

4 Computer Science, University of Warwick Three types of DSM implementations Object-based technique memory can be conceptualized as an abstract space filled with objects (including data and methods) Object is minimal sharing unit Trade-off between consistency and network traffic

5 Computer Science, University of Warwick OpenMP OpenMP stands for Open specification for Multi-processing used to assist compilers to understand and parallelise the serial code better Can be used to specify shared memory parallelism in Fortran, C and C++ programs OpenMP is a specification for a set of compiler directives, RUN TIME library routines, and environment variables Started mid-late 80s with emergence of shared memory parallel computers with proprietary directive-driven programming environments OpenMP is industry standard

6 Computer Science, University of Warwick OpenMP OpenMP specifications include: OpenMP 1.0 for Fortran, 1997 OpenMP 1.0 for C/C++, 1998 OpenMP 2.0 for Fortran, 2000 OpenMP 2.0 for C/C++, 2002 OpenMP 2.5 for C/C++ and Fortran, 2005 OpenMP Architecture Review Board: Compaq, HP, IBM, Intel, SGI, SUN

7 Computer Science, University of Warwick OpenMP programming model Shared Memory, thread-based parallelism Explicit parallelism Fork-join model

8 Computer Science, University of Warwick OpenMP code structure in C #include main () { int var1, var2, var3; Serial code /*Beginning of parallel section. Fork a team of threads. Specify variable scoping*/ #pragma omp parallel private(var1, var2) shared(var3) { Parallel section executed by all threads … All threads join master thread and disband } Resume serial code }

9 Computer Science, University of Warwick OpenMP code structure in Fortran PROGRAM HELLO INTEGER VAR1, VAR2, VAR3 Serial code... !Beginning of parallel section. Fork a team of threads. Specify variable scoping !$OMP PARALLEL PRIVATE(VAR1, VAR2) SHARED(VAR3) Parallel section executed by all threads... All threads join master thread and disband !$OMP END PARALLEL Resume serial code... END

10 Computer Science, University of Warwick OpenMP Directives Format C/C++ Fortran

11 Computer Science, University of Warwick OpenMP features OpenMP directives are ignored by compilers that dont support OpenMP, so codes can also be run on sequential machines Compiler directives used to specify sections of code that can be executed in parallel critical sections Scope of variables (private or shared) Mainly used to parallelize loops, e.g. separate threads to handle separate iterations of the loop There is also a run-time library that has several useful routines for checking the number of threads and number of processors, changing the number of threads, etc

12 Computer Science, University of Warwick Fork-Join Model Multiple threads are created using the parallel construct For C and C++ #pragma omp parallel {... do stuff } For Fortran !$OMP PARALLEL... do stuff !$OMP END PARALLEL

13 Computer Science, University of Warwick How many threads generated The number of threads in a parallel region is determined by the following factors, in order of precedence: Use of the omp_set_num_threads() library function Setting of the OMP_NUM_THREADS environment variable Implementation default - the number of CPUs on a node Threads are numbered from 0 (master thread) to N-1

14 Computer Science, University of Warwick Parallelizing loops in OpenMP – Work Sharing construct Compiler directive specifies that loop can be done in parallel For C and C++ #pragma omp parallel for for (i=0;i++;i<N) { value[i] = compute(i); } For Fortran !$OMP PARALLEL DO DO (i=1:N) value(i) = compute(i); END DO !$OMP END PARALLEL DO Can use thread scheduling to specify partition and allocation of iterations to threads #pragma omp parallel for schedule(static,4) schedule(static [,chunk]) Deal out blocks of iterations of size chunk to each thread schedule(dynamic [,chunk]) Each thread grabs a chunk iterations off a queue until all are done schedule(runtime) Find schedule from an environment variable OMP_SCHEDULE

15 Computer Science, University of Warwick Synchronisation in OpenMP Critical construct Barrier construct

16 Computer Science, University of Warwick Example of Critical Section in OpenMP #include main() { int x; x = 0; #pragma omp parallel shared(x) { #pragma omp critical x = x+1; } /* end of parallel section */ }

17 Computer Science, University of Warwick Example of Barrier in OpenMP #include #include int main (int argc, char *argv[]) { int th_id, nthreads; #pragma omp parallel private(th_id) { th_id = omp_get_thread_num(); printf("Hello World from thread %d\n", th_id); #pragma omp barrier if ( th_id == 0 ) { nthreads = omp_get_num_threads(); printf("There are %d threads\n",nthreads); } } return 0; }

18 Computer Science, University of Warwick Data Scope Attributes in OpenMP OpenMP Data Scope Attribute Clauses are used to explicitly define how variables should be scoped These clauses are used in conjunction with several directives (e.g. PARALLEL, DO/for) to control the scoping of enclosed variables Three often encountered clauses: Shared Private Reduction

19 Computer Science, University of Warwick Shared and private data in OpenMP private(var) creates a local copy of var for each thread shared(var) states that var is a global variable to be shared among threads Default data storage attribute is shared !$OMP PARALLEL DO !$OMP& PRIVATE(xx,yy) SHARED(u,f) DO j = 1,m DO i = 1,n xx = dx * (i-1) yy = dy * (j-1) u(i,j) = 0.0 f(i,j) = -alpha * (1.0-xx*xx) * & (1.0-yy*yy) END DO !$OMP END PARALLEL DO

20 Computer Science, University of Warwick Reduction Clause Reduction - reduction (op : var) e.g. add, logical OR. A local copy of the variable is made for each thread. Reduction operation done for each thread, then local values combined to create global value double ZZ, res=0.0; #pragma omp parallel for reduction (+:res) private(ZZ) for (i=1;i<=N;i++) { ZZ = i; res = res + ZZ: }

21 Computer Science, University of Warwick Run-Time Library Routines Can perform a variety of functions, including Query the number of threads/thread no. Set number of threads …

22 Computer Science, University of Warwick Run-Time Library Routines query routines allow you to get the number of threads and the ID of a specific thread id = omp_get_thread_num(); //thread no. Nthreads = omp_get_num_threads(); //number of threads Can specify number of threads at runtime omp_set_num_threads(Nthreads);

23 Computer Science, University of Warwick Environment Variable Controlling the execution of parallel code Four environment variables OMP_SCHEDULE: how iterations of a loop are scheduled OMP_NUM_THREADS: maximum number of threads OMP_DYNAMIC: enable or disable dynamic adjustment of the number of threads OMP_NESTED: enable or disable nested parallelism

24 Computer Science, University of Warwick OpenMP compilers Since parallelism is mostly achieved by parallelising loops using shared memory, OpenMP compilers work well for multiprocessor SMPs and vector machines OpenMP could work for distributed memory machines, but would need to use a good distributed shared memory (DSM) implementation For more information on OpenMP, see

High Performance Computing Course Notes Message Passing Programming I

26 Computer Science, University of Warwick Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works by creating a number of tasks, uniquely named, that interact by sending and receiving messages to and from one another (hence the message passing) Generally, processes communicate through sending the data from the address space of one process to that of another Communication of processes (via files, pipe, socket) Communication of threads within a process (via global data area) Programs based on message passing can be based on standard sequential language programs (C/C++, Fortran), augmented with calls to library functions for sending and receiving messages

27 Computer Science, University of Warwick Message Passing Interface (MPI) MPI is a specification, not a particular implementation Does not specify process startup, error codes, amount of system buffer, etc MPI is a library, not a language The goals of MPI: functionality, portability and efficiency Message passing model > MPI specification > MPI implementation

28 Computer Science, University of Warwick OpenMP vs MPI In a nutshell MPI is used on distributed-memory systems OpenMP is used for code parallelisation on shared-memory systems Both are explicit parallelism High-level control (OpenMP), lower-level control (MPI)

29 Computer Science, University of Warwick A little history Message-passing libraries developed for a number of early distributed memory computers By 1993 there were loads of vendor specific implementations By 1994 MPI-1 came into being By 1996 MPI-2 was finalized

30 Computer Science, University of Warwick The MPI programming model MPI standards - MPI-1 (1.1, 1.2), MPI-2 (2.0) Forwards compatibility preserved between versions Standard bindings - for C, C++ and Fortran. Have seen MPI bindings for Python, Java etc (all non-standard) We will stick to the C binding, for the lectures and coursework. More info on MPI Implementations - For your laptop pick up MPICH (free portable implementation of MPI ( gov/mpi/mpich/index.htm) gov/mpi/mpich/index.htm Coursework will use MPICH

31 Computer Science, University of Warwick MPI MPI is a complex system comprising of 129 functions with numerous parameters and variants Six of them are indispensable, but can write a large number of useful programs already Other functions add flexibility (datatype), robustness (non-blocking send/receive), efficiency (ready-mode communication), modularity (communicators, groups) or convenience (collective operations, topology). In the lectures, we are going to cover most commonly encountered functions

32 Computer Science, University of Warwick The MPI programming model Computation comprises one or more processes that communicate via library routines and sending and receiving messages to other processes (Generally) a fixed set of processes created at outset, one process per processor Different from PVM

33 Computer Science, University of Warwick Intuitive Interfaces for sending and receiving messages Send(data, destination), Receive(data, source) minimal interface Not enough in some situations, we also need Message matching – add message_id at both send and receive interfaces they become Send(data, destination, msg_id), receive(data, source, msg_id) Message_id Is expressed using an integer, termed as message tag Allows the programmer to deal with the arrival of messages in an orderly fashion (queue and then deal with

34 Computer Science, University of Warwick How to express the data in the send/receive interfaces Early stages: (address, length) for the send interface (address, max_length) for the receive interface They are not always good The data to be sent may not be in the contiguous memory locations Storing format for data may not be the same or known in advance in heterogeneous platform Enventually, a triple (address, count, datatype) is used to express the data to be sent and (address, max_count, datatype) for the data to be received Reflecting the fact that a message contains much more structures than just a string of bits, For example, (vector_A, 300, MPI_REAL) Programmers can construct their own datatype Now, the interfaces become send(address, count, datatype, destination, msg_id) and receive(address, max_count, datatype, source, msg_id)

35 Computer Science, University of Warwick How to distinguish messages Message tag is necessary, but not sufficient So, communicator is introduced …

36 Computer Science, University of Warwick Communicators Messages are put into contexts Contexts are allocated at run time by the system in response to programmer requests The system can guarantee that each generated context is unique The processes belong to groups The notions of context and group are combined in a single object, which is called a communicator A communicator identifies a group of processes and a communication context The MPI library defines a initial communicator, MPI_COMM_WORLD, which contains all the processes running in the system The messages from different process groups can have the same tag So the send interface becomes send(address, count, datatype, destination, tag, comm)

37 Computer Science, University of Warwick Status of the received messages The structure of the message status is added to the receive interface Status holds the information about source, tag and actual message size In the C language, source can be retrieved by accessing status.MPI_SOURCE, tag can be retrieved by status.MPI_TAG and actual message size can be retrieved by calling the function MPI_Get_count(&status, datatype, &count) The receive interface becomes receive(address, maxcount, datatype, source, tag, communicator, status)

38 Computer Science, University of Warwick How to express source and destination The processes in a communicator (group) are identified by ranks If a communicator contains n processes, process ranks are integers from 0 to n-1 Source and destination processes in the send/receive interface are the ranks

39 Computer Science, University of Warwick Some other issues In the receive interface, tag can be a wildcard, which means any message will be received In the receive interface, source can also be a wildcard, which match any source

40 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Send (buf, count, datatype, dest, tag, comm) Send a message buf address of send buffer countno. of elements to send (>=0) datatype of elements destprocess id of destination tagmessage tag commcommunicator (handle)

41 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Send (buf, count, datatype, dest, tag, comm) Send a message buf address of send buffer countno. of elements to send (>=0) datatype of elements destprocess id of destination tagmessage tag commcommunicator (handle)

42 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Send (buf, count, datatype, dest, tag, comm) Send a message buf address of send buffer countno. of elements to send (>=0) datatype of elements destprocess id of destination tagmessage tag commcommunicator (handle)

43 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Send (buf, count, datatype, dest, tag, comm) Calculating the size of the data to be send … buf address of send buffer count* sizeof (datatype) bytes of data

44 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Send (buf, count, datatype, dest, tag, comm) Send a message buf address of send buffer countno. of elements to send (>=0) datatype of elements destprocess id of destination tagmessage tag commcommunicator (handle)

45 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Send (buf, count, datatype, dest, tag, comm) Send a message buf address of send buffer countno. of elements to send (>=0) datatype of elements destprocess id of destination tagmessage tag commcommunicator (handle)

46 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Recv (buf, count, datatype, source, tag, comm, status) Receive a message buf address of receive buffer (var param) countmax no. of elements in receive buffer (>=0) datatype of receive buffer elements sourceprocess id of source process, or MPI_ANY_SOURCE tagmessage tag, or MPI_ANY_TAG commcommunicator statusstatus object

47 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Init (int *argc, char ***argv) Initiate a computation argc (number of arguments) and argv (argument vector) are main programs arguments Must be called first, and once per process MPI_Finalize ( ) Shut down a computation The last thing that happens

48 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Comm_size (MPI_Comm comm, int *size) Determine number of processes in comm comm is communicator handle, MPI_COMM_WORLD is the default (including all MPI processes) size holds number of processes in group MPI_Comm_rank (MPI_Comm comm, int *pid) Determine id of current (or calling) process pid holds id of current process

49 Computer Science, University of Warwick #include "mpi.h" #include int main(int argc, char *argv[]) { int rank, nprocs; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&nprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); printf("Hello, world. I am %d of %d\n", rank, nprocs); MPI_Finalize(); }MPI_InitMPI_Comm_sizeMPI_Comm_rankMPI_Finalize MPI basics – a basic example mpirun –np 4 myprog Hello, world. I am 1 of 4 Hello, world. I am 3 of 4 Hello, world. I am 0 of 4 Hello, world. I am 2 of 4

50 Computer Science, University of Warwick MPI basics – send and recv example (1) #include "mpi.h" #include int main(int argc, char *argv[]) { int rank, size, i; int buffer[10]; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (size < 2) { printf("Please run with two processes.\n"); MPI_Finalize(); return 0; } if (rank == 0) { for (i=0; i<10; i++) buffer[i] = i; MPI_Send(buffer, 10, MPI_INT, 1, 123, MPI_COMM_WORLD); }MPI_InitMPI_Comm_sizeMPI_Comm_rankMPI_FinalizeMPI_Send

51 Computer Science, University of Warwick MPI basics – send and recv example (2) if (rank == 1) { for (i=0; i<10; i++) buffer[i] = -1; MPI_Recv(buffer, 10, MPI_INT, 0, 123, MPI_COMM_WORLD, &status); for (i=0; i<10; i++) { if (buffer[i] != i) printf("Error: buffer[%d] = %d but is expected to be %d\n", i, buffer[i], i); } } MPI_Finalize(); }MPI_RecvMPI_Finalize

52 Computer Science, University of Warwick MPI language bindings Standard (accepted) bindings for Fortran, C and C++ Java bindings are work in progress JavaMPIJava wrapper to native calls mpiJavaJNI wrappers jmpipure Java implementation of MPI library MPIJsame idea Java Grande Forum trying to sort it all out We will use the C bindings