A Short Introduction to PVM and MPI Philip Papadopoulos University of California, San Diego Department of CSE San Diego Supercomputer Center.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Parallel Virtual Machine Rama Vykunta. Introduction n PVM provides a unified frame work for developing parallel programs with the existing infrastructure.
Distributed Processing, Client/Server and Clusters
MPI Message Passing Interface
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
Remote Procedure Call (RPC)
Company LOGO Parallel Virtual Machine Issued by: Ameer Mosa Al_Saadi 1 University of Technology Computer Engineering and Information Technology Department.
Beowulf Supercomputer System Lee, Jung won CS843.
Distributed Processing, Client/Server, and Clusters
Slide 1 Client / Server Paradigm. Slide 2 Outline: Client / Server Paradigm Client / Server Model of Interaction Server Design Issues C/ S Points of Interaction.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
3.5 Interprocess Communication
Introduction to PVM PVM (Parallel Virtual Machine) is a package of libraries and runtime daemons that enables building parallel apps easily and efficiently.
Data Communications Architecture Models. What is a Protocol? For two entities to communicate successfully, they must “speak the same language”. What is.
20101 Chapter 7 The Application Layer Message Passing.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
1 What is message passing? l Data transfer plus synchronization l Requires cooperation of sender and receiver l Cooperation not always apparent in code.
1 Lecture 4: Distributed-memory Computing with PVM/MPI.
Managing Heterogeneous MPI Application Interoperation and Execution. From PVMPI to SNIPE based MPI_Connect() Graham E. Fagg*, Kevin S. London, Jack J.
Socket Programming -What is it ? -Why bother ?. Basic Interface for programming networks at transport level It is communication end point Used for inter.
Optimizing Threaded MPI Execution on SMP Clusters Hong Tang and Tao Yang Department of Computer Science University of California, Santa Barbara.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Chapter 17 Networking Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William Stallings.
Chapter 9 Message Passing Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere2 Introduction.
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
PVM and MPI What is more preferable? Comparative analysis of PVM and MPI for the development of physical applications on parallel clusters Ekaterina Elts.
PVM. PVM - What Is It? F Stands for: Parallel Virtual Machine F A software tool used to create and execute concurrent or parallel applications. F Operates.
Protocol Architectures. Simple Protocol Architecture Not an actual architecture, but a model for how they work Similar to “pseudocode,” used for teaching.
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Introduction Slide 1 A Communications Model Source: generates.
SPREAD TOOLKIT High performance messaging middleware Presented by Sayantam Dey Vipin Mehta.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Internetworking Internet: A network among networks, or a network of networks Allows accommodation of multiple network technologies Universal Service Routers.
The Socket Interface Chapter 21. Application Program Interface (API) Interface used between application programs and TCP/IP protocols Interface used between.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 11.
PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.
William Stallings Data and Computer Communications
Towards MPI progression layer elimination with TCP and SCTP
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
Chapter 4 Message-Passing Programming. The Message-Passing Model.
Parallel and Distributed Programming Kashif Bilal.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.
PVM (Parallel Virtual Machine)‏ By : Vishal Prajapati Course CS683 Computer Architecture Prof. Moreshwar R Bhujade.
PVM: Parallel Virtual Machine anonymous ftp ftp ftp.netlib.org cd pvm3/book get pvm-book.ps quit
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
CSE 160 – Lecture 16 MPI Concepts, Topology and Synchronization.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
FTOP: A library for fault tolerance in a cluster R. Badrinath Rakesh Gupta Nisheeth Shrivastava.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 6, 2006 Session 22.
PVM and MPI.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Parallel Virtual Machine
Chapter 4: Threads.
MPI-Message Passing Interface
MPJ: A Java-based Parallel Computing System
Presentation transcript:

A Short Introduction to PVM and MPI Philip Papadopoulos University of California, San Diego Department of CSE San Diego Supercomputer Center

Outline What is message passing? Why do I care? “Hello-World” for message passing Level 0 Issues What are PVM and MPI? MPI Implementations Inner-workings of PVM

But First … Please ask questions at any time Things will be more interesting when you do I’d rather answer questions. Got it?

What is Message Passing? Why Do I Care Message passing allows two processes to: –Exchange information –Synchronize with each other Message passing is “Sockets for Dummies” So? – Applications need much more power and or memory than a single machine can deliver –Large parallel programs need well-defined mechanisms to coordinate and exchange info

Message Passing in the HPC World Large scientific applications scale to 100’s of processors (routinely) and 1000’s of processors (in rare cases) –Climate/Ocean modeling –Molecular physics (QCD, dynamics, materials, …) –Computational Fluid Dynamics –And many more … Message passing and SPMD programming style have been key infrastructure enablers –Why not shared memory?

How Does Message Passing Differ from Socket Programming Socket programming (OS 101) is a type of message passing –Open, bind, connect, accept too arcane –sendto, recvfrom (UDP) not reliable –Good point-to-point, multicast, broadcast are limited Message passing usually means (pt-2-pt) –Low latency –High performance –Reliable, in-sequence delivery +Group operations +Broadcast +Reduce (eg, sum an array whose parts are held in different processes) +Group synchronize (barrier)

Hello World – MP Style Process A Initialize Send(B, “Hello World”) Recv(B, String) Print String –“Hi There” Finalize Process B Initialize Recv(A, String) Print String –“Hello World” Send(A, “Hi There”) Finalize

Message Addressing Identify an endpoint Use a tag to distinguish a particular message –pvm_send(dest, tag) –MPI_SEND(COMM, dest, tag, buf, len, type) Receiving –recv(src, tag); recv(*,tag); recv (src, *); recv(*,*); What if you want to build a library that uses message passing? Is (src, tag) safe in all instances?

Level O Issues Basic Pt-2-Pt Message Passing is straightforward, but how does one … –Make it go fast Eliminate extra memory copies Take advantage of specialized hardware –Move complex data structures (packing) –Receive from one-of-many (wildcarding) –Synchronize a group of tasks –Recover from errors –Start tasks –Build safe libraries –Monitor tasks –…

MPI-1 addresses many of the level 0 issues (but not all)

A long history of research efforts in message passing P4 Chameleon Parmacs TCGMSG CHIMP NX (Intel i860, Paragon) PVM … And these begot MPI

So What is MPI It is a standard message passing API –Specifies many variants of send/recv 9 send interface calls –Eg., synchronous send, asynchronous send, ready send, asynchronous ready send –Plus other defined APIs Process topologies Group operations Derived Data types Profiling API (standard way to instrument MPI code) –Implemented and optimized by machine vendors –Should you use it? Absolutely!

So What’s Missing in MPI-1? Process control –How do you start 100 tasks? –How do you kill/signal/monitor remote tasks I/O –Addressed in MPI-2 Fault-tolerance –One MPI process dies, the rest eventually hang Interoperability –No standard set for running a single parallel job across architectures (eg. Cannot split computation between x86 Linux and Alpha)

What is PVM? Resource Management –add/delete hosts from a virtual machine Process Control –spawn/kill tasks dynamically Message Passing –blocking send, blocking and non-blocking receive, mcast Dynamic Task Groups –task can join or leave a group at any time Fault Tolerance –VM automatically detects faults and adjusts Heterogeneous Virtual Machine support for:

Popular PVM Uses Poor man’s Supercomputer –Beowulf (PC) clusters, Linux, Solaris, NT –Cobble together whatever resources you can get Metacomputer linking multiple Supercomputers –ultimate performance: eg. have combined nearly 3000 processors and up to 53 supercomputers Education Tool –teaching parallel programming –academic and thesis research

PVM In a Nutshell Each host (could be an MPP or SMP) runs a PVMD A collection of PVMDs define a virtual machine Once configured, tasks can be started (spawned), killed, signaled from a console Basic message passing Performance is OK, But API Semantics limit optimizations

MPI Design Goals Make it go as fast as possible Operate in a serverless (daemonless environment) Specify portability but not interoperability Standardize best practices of research environments Encourage competing implementations Enable the building of safe libraries The “assembly language” of Message Passing

MPI in the Marketplace MPICH Mississippi-Argonne open source –A top-quality reference implementation – High Performance Cluster MPIs –AM-MPI, FM-MPI, PM-MPI, GM-MPI, BIP-MPI 10us latency, 100MB/sec on Myrinet Vendor supported MPI –SGI, Cray, IBM, Fujitsu, Sun, Hitachi, … MPI Vendors –ScaMPI, MPI Soft-Tech, Genias, …

Comparisons interoperability fault tolerance heterogeneity resource control dynamic model MPP performance many communication methods topology static model (SPMD) PVMMPI Best Distributed Computing Best Large Multiprocessor Each API has its unique strengths Evaluate the needs of your application then choose

PVM? MPI? PVM is easy to use, especially on a network of workstations. Its message passing API is relatively simple MPI is a standard, has a steeper learning curve and doesn’t have a standard way to start tasks –MPICH does have an “mpirun” command If building a new scalable, production code, should use MPI (widely supported now) If experimenting with message passing, are interested in dynamics, use PVM.

Some Inner Workings of PVM Every process has a unique, virtual-machine-wide, identifier called a task ID (TID) PVMDs run on each host and act as points of presence A single master PVMD disseminates current virtual machine configuration and holds something called the PVM mailbox. The VM can grow and shrink around the master (if the master dies, the machine falls apart) Dynamic configuration is used whenever practical

host (one per IP address) pvmd - one PVM daemon per host pvmd How PVM is Designed libpvm - task linked to PVM library pvmds fully connected using UDP task Unix Domain Sockets inner host messages OS network interface task Shared Memory shared memory multiprocessor P0P1P2 task distributed memory MPP task internal interconnect tcp direct connect

PVM Tasks Can Use Multiple Transports Uses sockets mostly –Unix-domain on host –TCP between tasks on different hosts –UDP between Daemons (custom reliability) SysV Shared Memory Transport for SMPs –Tasks still use pvm_send(), pvm_recv() Native MPP –PVM can ride atop a native MPI implementation

PVM uses tid to identify pvmd, tasks, groups Fits into 32-bit integer S bit addresses pvmd, G bit forms mcast address Local part defined by each pvmd - eg. for PGON Task ID (tid) 18 bits12 bits S G host ID local part 12 bits S G host ID process node ID 11 bits7 bits 4096 hosts2048 nodeseach with

Things to note about PVM Addressing Addresses contain routing information by virtue of the host part –Transport selection at runtime is simplified Bit-mask + table lookup Moving a PVM task is very difficult –Condor (U. Wisc) with effort Group/multicast bit makes it straightforward to implement multicast within pt-2-pt infrastructure

Communication Context in MPI MPI Wraps together Group and Context into a single entity called a Communicator MPI program starts with one Communicator –MPI_COMM_WORLD All communicators are derived from this Library implementers are passed a communicator (group) and derive a new communicator -> Safe comm envelope Messages have a 3-tuple to identify them –(comm, src, tag) –Comm cannot be wildcarded

Communication Context in PVM One task gets and distributes a new globally unique context newcontext = pvm_newcontext( ); broadcast newcontext to all tasks or put it in persistent message oldcontext = pvm_setcontext( newcontext)); newcontext = pvm_setcontext( oldcontext)); pvm_freecontext( newcontext); Safe communication for your application or library All tasks switch to safe context Be aware: Unlike MPI, the current Context is not explicit in the Send/recv API

Receiving a message (library viewpoint) Messages arrive into a process and must be discriminated –Message header contains, src, tag, context, length, flags –Library “buffers” incoming messages until task receives Must be match available messages with match criteria –Tasks may ask to process messages in a different order than they are actually received (MPI has many variants of send/recv to handle various cases for optimization) PVM allows message handlers to that when a particular match criteria occurs, a subroutine is called

Message Handlers Source,tag,context VM control messages User defined handlers Handler function Incoming mesg. Data or Control messages Active mesg.

Persistent Messages Tasks can store and retrieve messages by name Distributed information database for dynamic programs –provides rendezvous, attachment, groups, many uses. Multiple messages per “name” possible index = pvm_putinfo( name, msgbuf, flag) pvm_recvinfo( name, index, flag ) pvm_delinfo( name, index, flag ) pvm_getmboxinfo( pattern, #names, array of struct )

Persistent Messages Message box Message box storage is coordinated across pvmds Key: message Task 2 Task stores information eg. How to contact application, or Network load forecast, etc. Task 1 Later, another task can request this message and receive it normally Task can specify when and who can replace a message it has placed in the message box.

Monitoring Performance PVM allows messages to be “traced” so that flows can be debugged MPI provides a standard profiling interface to build profiling tools –Nupshot, Jumpshot, MPITrace, VaMPIr, … XPVM (screen shot next slide) provides visual information about machine utilization, flows, configuration

XPVM Screen Shot

Wrapping Up MPI has a very rich messaging interface and designed for efficiency – PVM has a simple messaging interface + –Process control, Interoperability, Dynamics – Perform comparably when on Ethernet MPI outperforms when on MPP Both are still popular, but MPI is an accepted community standard with many support chains.

Questions?