Parallel Programming & Cluster Computing Distributed Multiprocessing David Joiner, Kean University Tom Murphy, Contra Costa College Henry Neeman, University.

Slides:



Advertisements
Similar presentations
MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College.
Advertisements

National Center for Supercomputing Applications MPI for better scalability & application performance Byoung-Do Kim, Ph.D. National Center for Supercomputing.
Introduction to Parallel Programming & Cluster Computing Distributed Multiprocessing Josh Alexander, University of Oklahoma Ivan Babic, Earlham College.
Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.
Parallel & Cluster Computing Distributed Cartesian Meshes Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy,
High Performance Computing
Introduction MPI Mengxia Zhu Fall An Introduction to MPI Parallel Programming with the Message Passing Interface.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
12b.1 Introduction to Message-passing with MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
Parallel & Cluster Computing Linear Algebra Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma SC08 Education.
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.
Supercomputing in Plain English Distributed Multiprocessing Blue Waters Undergraduate Petascale Education Program May 23 – June
Supercomputing in Plain English Supercomputing in Plain English Distributed Multiprocessing Henry Neeman, Director Director, OU Supercomputing Center for.
Parallel & Cluster Computing Overview of Parallelism Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma SC08.
Paul Gray, University of Northern Iowa Henry Neeman, University of Oklahoma Charlie Peck, Earlham College Tuesday October University of Oklahoma.
Supercomputing in Plain English Supercomputing in Plain English Distributed Multiprocessing Henry Neeman, Director OU Supercomputing Center for Education.
Introduction to Research Consulting Henry Neeman, University of Oklahoma Director, OU Supercomputing Center for Education & Research (OSCER) Assistant.
Parallel & Cluster Computing MPI Basics Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy, Contra Costa College.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
Supercomputing in Plain English Distributed Multiprocessing PRESENTERNAME PRESENTERTITLE PRESENTERDEPARTMENT PRESENTERINSTITUTION DAY MONTH DATE YEAR Your.
Parallel & Cluster Computing MPI Introduction Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma SC08 Education.
Director of Contra Costa College High Performance Computing Center
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
HPCA2001HPCA Message Passing Interface (MPI) and Parallel Algorithm Design.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Parallel & Cluster Computing Monte Carlo Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma SC08 Education.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Supercomputing in Plain English Supercomputing in Plain English Parallel Programming for Beginners Henry Neeman, University of Oklahoma Assistant Vice.
Parallel Programming & Cluster Computing Distributed Multiprocessing Henry Neeman, University of Oklahoma Charlie Peck, Earlham College Tuesday October.
Supercomputing in Plain English Distributed Multiprocessing Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma.
Introduction to Parallel Programming & Cluster Computing Introduction to Parallel Programming & Cluster Computing Distributed Multiprocessing: The Desert.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Message Passing Interface (MPI) 1 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
1 The Message-Passing Model l A process is (traditionally) a program counter and address space. l Processes may have multiple threads (program counters.
Parallel & Cluster Computing Distributed Parallelism Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma SC08.
Parallel & Cluster Computing Stupid Compiler Tricks Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma SC08.
Parallel & Cluster Computing N-Body Simulation and Collective Communications Henry Neeman, Director OU Supercomputing Center for Education & Research University.
Parallel Programming & Cluster Computing MPI Collective Communications Dan Ernst Andrew Fitz Gibbon Tom Murphy Henry Neeman Charlie Peck Stephen Providence.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
Supercomputing in Plain English An Introduction to High Performance Computing Part VI: Distributed Multiprocessing Henry Neeman, Director OU Supercomputing.
Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.
Supercomputing in Plain English Instruction Level Parallelism Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma.
Parallel Programming & Cluster Computing N-Body Simulation and Collective Communications Henry Neeman, University of Oklahoma Paul Gray, University of.
Supercomputing in Plain English Part VI: Distributed Multiprocessing Henry Neeman, Director OU Supercomputing Center for Education & Research University.
Parallel & Cluster Computing Transport Codes and Shifting Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma.
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
Introduction to Parallel Programming & Cluster Computing MPI Collective Communications Josh Alexander, University of Oklahoma Ivan Babic, Earlham College.
Parallel Programming & Cluster Computing Distributed Multiprocessing Henry Neeman, Director OU Supercomputing Center for Education & Research University.
Supercomputing in Plain English Collective Communications and N-Body Problems Henry Neeman, Director OU Supercomputing Center for Education & Research.
2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()
Introduction to Parallel Programming & Cluster Computing MPI Collective Communications Joshua Alexander, U Oklahoma Ivan Babic, Earlham College Michial.
High Performance Computing Modernization Program (HPCMP) Summer 2011 Puerto Rico Workshop on Intermediate Parallel Programming & Cluster Computing in conjunction.
MPI: Message Passing Interface An Introduction S. Lakshmivarahan School of Computer Science.
Setting Up a Low Cost Statewide Cyberinfrastructure Initiative
MPI Basics.
Introduction to Research Facilitation
MPI Message Passing Interface
Supercomputing in Plain English Distributed Multiprocessing
CS 584.
MPI: The Message-Passing Interface
Parallel & Cluster Computing 2005 Distributed Multiprocessing and MPI
Supercomputing in Plain English Distributed Multiprocessing
Parallel Programming & Cluster Computing Distributed Multiprocessing
Parallel Programming & Cluster Computing Distributed Parallelism
MPI Message Passing Interface
Some codes for analysis and preparation for programming
Presentation transcript:

Parallel Programming & Cluster Computing Distributed Multiprocessing David Joiner, Kean University Tom Murphy, Contra Costa College Henry Neeman, University of Oklahoma Charlie Peck, Earlham College Kay Wanous, Earlham College SC09 Education Program, Louisiana State University, July

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Message = Envelope+Contents MPI_Send(message, strlen(message) + 1, MPI_CHAR, destination, tag, MPI_COMM_WORLD); When MPI sends a message, it doesn’t just send the contents; it also sends an “envelope” describing the contents: Size (number of elements of data type) Data type Source: rank of sending process Destination: rank of process to receive Tag (message ID) Communicator (for example, MPI_COMM_WORLD )

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July MPI Data Types CFortran charMPI_CHARCHARACTERMPI_CHARACTER intMPI_INTINTEGERMPI_INTEGER floatMPI_FLOATREALMPI_REAL doubleMPI_DOUBLEDOUBLE PRECISION MPI_DOUBLE_PRECISION MPI supports several other data types, but most are variations of these, and probably these are all you’ll use.

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Message Tags My daughter was born in mid-December. So, if I give her a present in December, how does she know which of these it’s for? Her birthday Christmas Hanukah She knows because of the tag on the present: A little cake and candles means birthday A little tree or a Santa means Christmas A little menorah means Hanukah

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Message Tags for (source = 0; source < num_procs; source++) { if (source != server_rank) { mpi_error_code = MPI_Recv(message, maximum_message_length + 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); fprintf(stderr, "%s\n", message); } /* if (source != server_rank) */ } /* for source */ The greetings are printed in deterministic order not because messages are sent and received in order, but because each has a tag (message identifier), and MPI_Recv asks for a specific message (by tag) from a specific source (by rank).

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Parallelism is Nondeterministic for (source = 0; source < num_procs; source++) { if (source != server_rank) { mpi_error_code = MPI_Recv(message, maximum_message_length + 1, MPI_CHAR, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, &status); fprintf(stderr, "%s\n", message); } /* if (source != server_rank) */ } /* for source */ But here the greetings are printed in non-deterministic order.

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Communicators An MPI communicator is a collection of processes that can send messages to each other. MPI_COMM_WORLD is the default communicator; it contains all of the processes. It’s probably the only one you’ll need. Some libraries create special library-only communicators, which can simplify keeping track of message tags.

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Broadcasting What happens if one process has data that everyone else needs to know? For example, what if the server process needs to send an input value to the others? MPI_Bcast(length, 1, MPI_INTEGER, source, MPI_COMM_WORLD); Note that MPI_Bcast doesn’t use a tag, and that the call is the same for both the sender and all of the receivers. All processes have to call MPI_Bcast at the same time; everyone waits until everyone is done.

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Broadcast Example: Setup PROGRAM broadcast IMPLICIT NONE INCLUDE "mpif.h" INTEGER,PARAMETER :: server = 0 INTEGER,PARAMETER :: source = server INTEGER,DIMENSION(:),ALLOCATABLE :: array INTEGER :: length, memory_status INTEGER :: num_procs, my_rank, mpi_error_code CALL MPI_Init(mpi_error_code) CALL MPI_Comm_rank(MPI_COMM_WORLD, my_rank, & & mpi_error_code) CALL MPI_Comm_size(MPI_COMM_WORLD, num_procs, & & mpi_error_code) [input] [broadcast] CALL MPI_Finalize(mpi_error_code) END PROGRAM broadcast

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Broadcast Example: Input PROGRAM broadcast IMPLICIT NONE INCLUDE "mpif.h" INTEGER,PARAMETER :: server = 0 INTEGER,PARAMETER :: source = server INTEGER,DIMENSION(:),ALLOCATABLE :: array INTEGER :: length, memory_status INTEGER :: num_procs, my_rank, mpi_error_code [MPI startup] IF (my_rank == server) THEN OPEN (UNIT=99,FILE="broadcast_in.txt") READ (99,*) length CLOSE (UNIT=99) ALLOCATE(array(length), STAT=memory_status) array(1:length) = 0 END IF !! (my_rank == server)...ELSE [broadcast] CALL MPI_Finalize(mpi_error_code) END PROGRAM broadcast

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Broadcast Example: Broadcast PROGRAM broadcast IMPLICIT NONE INCLUDE "mpif.h" INTEGER,PARAMETER :: server = 0 INTEGER,PARAMETER :: source = server [other declarations] [MPI startup and input] IF (num_procs > 1) THEN CALL MPI_Bcast(length, 1, MPI_INTEGER, source, & & MPI_COMM_WORLD, mpi_error_code) IF (my_rank /= server) THEN ALLOCATE(array(length), STAT=memory_status) END IF !! (my_rank /= server) CALL MPI_Bcast(array, length, MPI_INTEGER, source, & MPI_COMM_WORLD, mpi_error_code) WRITE (0,*) my_rank, ": broadcast length = ", length END IF !! (num_procs > 1) CALL MPI_Finalize(mpi_error_code) END PROGRAM broadcast

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Broadcast Compile & Run % mpif90 -o broadcast broadcast.f90 % mpirun -np 4 broadcast 0 : broadcast length = : broadcast length = : broadcast length = : broadcast length =

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Reductions A reduction converts an array to a scalar: for example, sum, product, minimum value, maximum value, Boolean AND, Boolean OR, etc. Reductions are so common, and so important, that MPI has two routines to handle them: MPI_Reduce : sends result to a single specified process MPI_Allreduce : sends result to all processes (and therefore takes longer)

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Reduction Example PROGRAM reduce IMPLICIT NONE INCLUDE "mpif.h" INTEGER,PARAMETER :: server = 0 INTEGER :: value, value_sum INTEGER :: num_procs, my_rank, mpi_error_code CALL MPI_Init(mpi_error_code) CALL MPI_Comm_rank(MPI_COMM_WORLD, my_rank, mpi_error_code) CALL MPI_Comm_size(MPI_COMM_WORLD, num_procs, mpi_error_code) value_sum = 0 value = my_rank * num_procs CALL MPI_Reduce(value, value_sum, 1, MPI_INT, MPI_SUM, & & server, MPI_COMM_WORLD, mpi_error_code) WRITE (0,*) my_rank, ": reduce value_sum = ", value_sum CALL MPI_Allreduce(value, value_sum, 1, MPI_INT, MPI_SUM, & & MPI_COMM_WORLD, mpi_error_code) WRITE (0,*) my_rank, ": allreduce value_sum = ", value_sum CALL MPI_Finalize(mpi_error_code) END PROGRAM reduce

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Compiling and Running % mpif90 -o reduce reduce.f90 % mpirun -np 4 reduce 3 : reduce value_sum = 0 1 : reduce value_sum = 0 2 : reduce value_sum = 0 0 : reduce value_sum = 24 0 : allreduce value_sum = 24 1 : allreduce value_sum = 24 2 : allreduce value_sum = 24 3 : allreduce value_sum = 24

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Why Two Reduction Routines? MPI has two reduction routines because of the high cost of each communication. If only one process needs the result, then it doesn’t make sense to pay the cost of sending the result to all processes. But if all processes need the result, then it may be cheaper to reduce to all processes than to reduce to a single process and then broadcast to all.

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Non-blocking Communication MPI allows a process to start a send, then go on and do work while the message is in transit. This is called non-blocking or immediate communication. Here, “immediate” refers to the fact that the call to the MPI routine returns immediately rather than waiting for the communication to complete.

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Immediate Send mpi_error_code = MPI_Isend(array, size, MPI_FLOAT, destination, tag, communicator, request); Likewise: mpi_error_code = MPI_Irecv(array, size, MPI_FLOAT, source, tag, communicator, request); This call starts the send/receive, but the send/receive won’t be complete until: MPI_Wait(request, status); What’s the advantage of this?

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Communication Hiding In between the call to MPI_Isend/Irecv and the call to MPI_Wait, both processes can do work! If that work takes at least as much time as the communication, then the cost of the communication is effectively zero, since the communication won’t affect how much work gets done. This is called communication hiding.

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July Rule of Thumb for Hiding When you want to hide communication: as soon as you calculate the data, send it; don’t receive it until you need it. That way, the communication has the maximal amount of time to happen in background (behind the scenes).

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July SC09 Summer Workshops 1.May 17-23: Oklahoma State U: Computational Chemistry 2.May 25-30: Calvin Coll (MI): Intro to Computational Thinking 3.June 7-13: U Cal Merced: Computational Biology 4.June 7-13: Kean U (NJ): Parallel Progrmg & Cluster Comp 5.July 5-11: Atlanta U Ctr: Intro to Computational Thinking 6.July 5-11: Louisiana State U: Parallel Progrmg & Cluster Comp 7.July 12-18: U Florida: Computational Thinking Grades July 12-18: Ohio Supercomp Ctr: Computational Engineering 9.Aug 2- 8: U Arkansas: Intro to Computational Thinking 10.Aug 9-15: U Oklahoma: Parallel Progrmg & Cluster Comp

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July OK Supercomputing Symposium Keynote: Dan Atkins Head of NSF’s Office of Cyber- infrastructure 2004 Keynote: Sangtae Kim NSF Shared Cyberinfrastructure Division Director 2003 Keynote: Peter Freeman NSF Computer & Information Science & Engineering Assistant Director 2005 Keynote: Walt Brooks NASA Advanced Supercomputing Division Director 2007 Keynote: Jay Boisseau Director Texas Advanced Computing Center U. Texas Austin FREE! Wed Oct 7 OU Over 235 registrations already! Over 150 in the first day, over 200 in the first week, over 225 in the first month Keynote: José Munoz Deputy Office Director/ Senior Scientific Advisor Office of Cyber- infrastructure National Science Foundation 2009 Keynote: Ed Seidel Director NSF Office of Cyber- infrastructure Parallel Programming Workshop FREE! Tue Oct 6 OU Sponsored by SC09 Education Program FREE! Symposium Wed Oct 7 OU

Thanks for your attention! Questions?

Parallel & Cluster Computing: Distributed Multiprocess SC09 Parallel & Cluster, LSU, July References [1] P.S. Pacheco, Parallel Programming with MPI, Morgan Kaufmann Publishers, [2] W. Gropp, E. Lusk and A. Skjellum, Using MPI: Portable Parallel Programming with the Message-Passing Interface, 2 nd ed. MIT Press, 1999.