Parallel Processing Javier Delgado

Slides:

Advertisements

Similar presentations

MPI Message Passing Interface

Advertisements

CS 140: Models of parallel programming: Distributed memory and MPI.

Reference: Message Passing Fundamentals.

CS 240A: Models of parallel programming: Distributed memory and MPI.

Message-Passing Programming and MPI CS 524 – High-Performance Computing.

Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.

1 Parallel Computing—Introduction to Message Passing Interface (MPI)

Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)

1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.

Basics of Message-passing Mechanics of message-passing –A means of creating separate processes on different computers –A way to send and receive messages.

Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 17, 2012.

1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.

Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.

CS 240A Models of parallel programming: Distributed memory and MPI.

Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.

Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.

Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.

Hybrid MPI and OpenMP Parallel Programming

Message Passing Programming Model AMANO, Hideharu Textbook pp. １４０－１４７.

Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.

Message Passing Interface (MPI) 1 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.

MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.

CSCI-455/522 Introduction to High Performance Computing Lecture 4.

1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.

Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.

Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.

Introduction to MPI CDP 1. Shared Memory vs. Message Passing Shared Memory Implicit communication via memory operations (load/store/lock) Global address.

Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.

1 HPCI Presentation Kulathep Charoenpornwattana. March 12, Outline Parallel programming with MPI Running MPI applications on Azul & Itanium Running.

Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.

Project18 Communication Design + Parallelization Camilo A Silva BIOinformatics Summer 2008.

3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.

Message Passing Interface Using resources from

1 ITCS4145 Parallel Programming B. Wilkinson March 23, hybrid-abw.ppt Hybrid Parallel Programming Introduction.

3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:

Course Contents KIIT UNIVERSITY Sr # Major and Detailed Coverage Area

User-Written Functions

Introduction to parallel computing concepts and technics

CS4402 – Parallel Computing

Introduction to MPI.

The University of Adelaide, School of Computer Science

MPI Message Passing Interface

Is System X for Me? Cal Ribbens Computer Science Department

MPI: The Message-Passing Interface

CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Mary Hall November 3, /03/2011 CS4961.

Introduction to Message Passing Interface (MPI)

Message Passing Models

Lecture 14: Inter-process Communication

MPI: Message Passing Interface

May 19 Lecture Outline Introduce MPI functionality

Quiz Questions ITCS 4145/5145 Parallel Programming MPI

Lab Course CFD Parallelisation Dr. Miriam Mehl.

COMP60621 Fundamentals of Parallel and Distributed Systems

Introduction to parallelism and the Message Passing Interface

Hybrid Parallel Programming

Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes

Hybrid Parallel Programming

COMP60611 Fundamentals of Parallel and Distributed Systems

Hello, world in MPI #include <stdio.h> #include "mpi.h"

Distributed Memory Programming with Message-Passing

Hello, world in MPI #include <stdio.h> #include "mpi.h"

Parallel Processing - MPI

MPI Message Passing Interface

CS 584 Lecture 8 Assignment?.

Programming Parallel Computers

Presentation transcript:

Parallel Processing Javier Delgado Grid-Enabledment of Scientific Applications Professor S. Masoud Sadjadi

Parallel Processing - GCB Outline Why parallel processing Overview The Message Passing Interface (MPI)‏ Introduction Basics Examples OpenMP Alternatives to MPI Parallel Processing - GCB

Why parallel processing? Computationally-intensive scientific applications Hurricane modelling Bioinformatics High-Energy Physics Physical limits of one processor There are many open areas in science that require massive computation power to solve. Many new areas have emerge recently, such as bioinformatics. As the professor has discussed in class, there are physical limitations to how many fast a single processor can go. Even if there were not, we are still many years away from having a single processor to solve these problems Parallel Processing - GCB

Types of Parallel Processing Shared Memory e.g. Multiprocessor computer Distributed Memory e.g. Compute Cluster Parallel Processing - GCB

Parallel Processing - GCB Shared Memory Advantages No explicit message passing Fast Disadvantages Scalability Synchronizaton Since all processors are on the same box, the user does not need to pass messages as in a distributed system. In many cases, a simple pragma statement will take care of everything. Also, since all load balancing is done in one processor, it is fast However, as more and more processors/cores are added, simultaneious access to memory can lead to bus saturation. Also, synchronizatino becomes a problem if multiple cores are reading and writing the same area in memory. Parallel Processing - GCB Source: http://kelvinscale.net

Parallel Processing - GCB Distributed Memory Advantages Each processor has its own memory Usually more cost-effective Disadvantages More programmer involvement Slower Parallel Processing - GCB

Parallel Processing - GCB Combination of Both Emerging trend Best and worst of both worlds As processors themselves are scaling out instead of up, we end up with a combination of shared memory and distributed memory Parallel Processing - GCB

Parallel Processing - GCB Outline Why parallel processing Overview The Message Passing Interface (MPI)‏ Introduction Basics Examples OpenMP Alternatives to MPI Parallel Processing - GCB

Parallel Processing - GCB Message Passing Standard for Distributed Memory systems Networked workstations can communicate De Facto specification: The Message Passing Interface (MPI)‏ Free MPI Implementations: MPICH OpenMPI LAM-MPI “Specification” is highlighted since MPI is not really an implementation. It is a specification of what the implementations should do. There are several implementations available today. Parallel Processing - GCB

Parallel Processing - GCB MPI Basics Design Virtues Defines communication, but not its hardware Expressive Performance Concepts No adding/removing of processors during computation Same program runs on all processors Single-Program, Multiple Data (SPMD)‏ Multiple Instruction, Multiple Data (MIMD)‏ Processes identified by “rank” MPI specifies communication directives that are allowed by the system, but it does not limit to any kind of hardware implementation. Whether you are using ethernet, myrinet, or even shared memory systems, although by default this is disabled in most implementations, as far as I know. It is designed such that programs can be written with a minimal subset of the specified functions. However, many powerful functions are provided for optimal performance and programming power. Since it is an open standard, a lot of thought when into its design. Also, it is optimized for parallel programs. It works with other compiler optimizations since standard system compilers are used. Number of nodes doing computation stays constant. This provides an easier implementation and is generally safe for jobs that complete in a reasonable amount of time and servers and not in a “dangerous environment”. One of the main problems of grid computing, which Marlon will be covering in a later lecture, is that this is does not hold. MPI programs consist of a single executable that runs on all participating nodes. Somtimes, the same instructions are carried on different data, other times different instructions are carried out on the data. Process determines its role from program logic Master node is the entry point Core commands: Init, Send, Receive, Finalize Parallel Processing - GCB

Parallel Processing - GCB Communication Types Standard Synchronous (blocking send)‏ Ready Buffered (asynchronous)‏ For non-blocking communication: MPI_Wait – block until receive MPI_Test - true/false At the heart of MPI, is message passing. In other words sending (and receiving) messages. Here we begin to see the flexibility provided by MPI. It defines a standard communication type, we may be synchronous or asynchronous, the underlying implementation tries to make the best decision. Synchronous communication requires the call to block until a “receive” is snet from the destination node Ready mode assumes that the destination node is ready. So it will complete even if the receiver was not ready, which could be dangerous Buffered mode makes a copy of the message to a local buffer, so that it can execute when ready. With nonblocking calls, other work could be performed while the message is transfering. To sort of convert them to blocking calls, MPI_WAit (or one of its variants) can be used. To Test if a transfer has been completed, MPI_TEST (or one of its variants) may be used. Parallel Processing - GCB

Parallel Processing - GCB Message Structure Data Length Data Type Data Length Data Type Variable Name Data Send Recv Destination Status Communication context Tag Communication context Tag Naturally, for things like Send/Receive and Wait/Test to work, there needs to be a way of identifying messages. Various parameters related to the data being transferred must be specified. Also, in order to differentiate messages, a tag needs to be issued. Since tags are user-generated and collisions are possible, There is a need for contexts as well. Contexts are system-generated. Parallel Processing - GCB

Data Types and Functions Uses its own types for consistency MPI_INT, MPI_CHAR, etc. All Functions prefixed with “MPI_” MPI_Init, MPI_Send, MPI_Recv, etc. Parallel Processing - GCB

Our First Program: Numerical Integration Objective: Calculate area under f(x) = x2 Outline: Define variables Initialize MPI Determine subset of program to calculate Perform Calculation Collect Information (at Master)‏ Send Information (Slaves)‏ Finalize Problem: Determine the area under the curve f(x)=x^2, between x = [2,5], using a 50 rectangle resolution Parallel Processing - GCB

Parallel Processing - GCB Our First Program Download Link: http://www.fiu.edu/~jdelga06/integration.c Parallel Processing - GCB

Variable Declarations #include "mpi.h" #include <stdio.h> /* problem parameters */ #define f(x) ((x) * (x))‏ #define numberRects 50 #define lowerLimit 2.0 #define upperLimit 5.0 int main( int argc, char * argv[] )‏ { /* MPI variables */ int dest, noProcesses, processId, src, tag; MPI_Status status; /* problem variables */ int i; double area, x, height, lower, width, total, range; ... Parallel Processing - GCB

Variable Declarations #include "mpi.h" #include <stdio.h> /* problem parameters */ #define f(x) ((x) * (x))‏ #define numberRects 50 #define lowerLimit 2.0 #define upperLimit 5.0 int main( int argc, char * argv[] )‏ { /* MPI variables */ int dest, noProcesses, processId, src, tag; MPI_Status status; /* problem variables */ int i; double area, x, height, lower, width, total, range; ... Parallel Processing - GCB

Variable Declarations #include "mpi.h" #include <stdio.h> /* problem parameters */ #define f(x) ((x) * (x))‏ #define numberRects 50 #define lowerLimit 2.0 #define upperLimit 5.0 int main( int argc, char * argv[] )‏ { /* MPI variables */ int dest, noProcesses, processId, src, tag; MPI_Status status; /* problem variables */ int i; double area, x, height, lower, width, total, range; ... Parallel Processing - GCB

Variable Declarations #include "mpi.h" #include <stdio.h> /* problem parameters */ #define f(x) ((x) * (x))‏ #define numberRects 50 #define lowerLimit 2.0 #define upperLimit 5.0 int main( int argc, char * argv[] )‏ { /* MPI variables */ int dest, noProcesses, processId, src, tag; MPI_Status status; /* problem variables */ int i; double area, x, height, lower, width, total, range; ... Parallel Processing - GCB

Parallel Processing - GCB MPI Initialization int main( int argc, char * argv[] )‏ { ... MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &noProcesses); MPI_Comm_rank(MPI_COMM_WORLD, &processId); This is the same main, I'm just copying again so you realize we are still in it Parallel Processing - GCB

Parallel Processing - GCB Calculation int main( int argc, char * argv[] )‏ { ... /* adjust problem size for subproblem*/ range = (upperLimit - lowerLimit) / noProcesses; width = range / numberRects; lower = lowerLimit + range * processId; /* calculate area for subproblem */ area = 0.0; for (i = 0; i < numberRects; i++)‏ { x = lower + i * width + width / 2.0; height = f(x); area = area + width * height; } Parallel Processing - GCB

Parallel Processing - GCB Sending and Receiving int main( int argc, char * argv[] )‏ { ... tag = 0; if (processId == 0) /* MASTER */ { total = area; for (src=1; src < noProcesses; src++)‏ { MPI_Recv(&area, 1, MPI_DOUBLE, src, tag, MPI_COMM_WORLD, &status); total = total + area; } fprintf(stderr, "The area from %f to %f is: %f\n", lowerLimit, upperLimit, total ); else /* WORKER (i.e. compute node) */ { dest = 0; MPI_Send(&area, 1, MPI_DOUBLE, dest, tag, MPI_COMM_WORLD); }; Using “0” as the destination is good since there will always be a processor with rank of 0. If you are going to be testing code in a single-processor system, as is often the case, this is especially applicable. Parallel Processing - GCB

Parallel Processing - GCB Finalizing int main( int argc, char * argv[] )‏ { ... MPI_Finalize(); return 0; } Parallel Processing - GCB

Parallel Processing - GCB Communicators MPI_COMM_WORLD – All processes involved What if different workers have different tasks? MPI_COMM_WORLD is the default simple example: you have one process that acts as a random number generator that distributes unique numbers to the other nodes. The MASTER node sends the compute tasks to the rest of the nodes. In this case, you could have a communicator called “WORKER”. When an MPI call is given WORKER as the communicator, only the “WORKER” processes will be involved. Parallel Processing - GCB

Parallel Processing - GCB Additional Functions Data Management MPI_Bcast (broadcast)‏ Collective Computation Min, Max, Sum, AND, etc. Benefits: Abstraction Optimized As mentioned earlier, many complete MPI programs can be created with the 6 basic functions. However, for optimal performance and development time, it is sometimes necessary to use other functions. These functions still use send and receive internally, but provide abstraction. Also, they are interanlly optimized for performance. I can't go over everything, here but these are a couple. A typical example is Data management, and the most common one is the broadcast message. This is used to send something to all participating nodes. For example, a constant variable Another example is a collective computation function. For example, if you calculate different subsets of a problem at different nodes and need to get the sum of them all, a sum function is provided. Parallel Processing - GCB Source: http://www.pdc.kth.se

Parallel Processing - GCB Typical Problems Designing Debugging Scalability The first two are existing problems in computer science in general. The fact that you are dealing with a distributed environment merely makes them even bigger problems. Scalability is the new problem. Since the programs must deal with communication problems, it is usually difficult to increase the computation time in a nearly-linear fashion Parallel Processing - GCB

Parallel Processing - GCB Scalability Analysis Definition: Estimation of resource (computation and computation) requirements of a program as problem size and/or number of processors increases Require knowledge of communication time Assume otherwise idle nodes Ignore data requirements of node When performing scalability analysis, we need knowledge of the propogation time of messages in order to make an estimate. Also, we assume that the nodes are not performing any other computation or communication. In other words, 100 percent of their resources are devoted to the task at hand. Lastly, we ignore the fact that as problem size increases, the likelihood of having to use virtual memory does also, which can have a profound effect on computation time. Parallel Processing - GCB

Simple Scalability Example Tcomm = Time to send a message Tcomm = s + rn s = start-up time r = time to send a single byte (i.e. 1/bandwidth)‏ n = size of the data type (int, double, etc.)‏ Parallel Processing - GCB

Simple Scalability Example Matrix Multiplication of two square matrices of size (N x N). First Matrix is broadcasted to all nodes Cost for the rest Computation n multiplications and (n – 1) additions per cell n2 x (2n – 1) = 2n3 -n2 floating point operations Communication Send n elements to worker node, and return the resulting n elements to the master node (2n)‏ After doing this for each column in the result matrix: n x 2n Parallel Processing - GCB

Simple Scalability Example Therefore, we get the following ratio of communication to computation As n becomes very large, the ratio approaches 1/n. So this problem is not severely affected by communication overhead Parallel Processing - GCB

Parallel Processing - GCB References http://nf.apac.edu.au/training/MPIProg/mpi- slides/allslides.html High Performance Linux Clusters. By Joseph D. Sloan. O'Reilly Press. Using MPI, second edition. By Gropp, Lusk, and Skjellum. MIT Press. Parallel Processing - GCB