Distributed-Memory Programming Using MPIGAP Vladimir Janjic International Workhsop “Parallel Programming in GAP” Aug 2013.

Slides:



Advertisements
Similar presentations
Instructor Notes This lecture describes the different ways to work with multiple devices in OpenCL (i.e., within a single context and using multiple contexts),
Advertisements

MPI Message Passing Interface Portable Parallel Programs.
MPI Message Passing Interface
The eCos real-time operating system an open source tool to create embedded kernels and applications.
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, Tyson Condie,
Reference: / MPI Program Structure.
High Performance Computing
Day 10 Threads. Threads and Processes  Process is seen as two entities Unit of resource allocation (process or task) Unit of dispatch or scheduling (thread.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
1 File Output. 2 So far… So far, all of our output has been to System.out  using print(), println(), or printf() All input has been from System.in 
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Point-to-Point Communication Self Test with solution.
CS 240A: Models of parallel programming: Distributed memory and MPI.
CS490T Advanced Tablet Platform Applications Network Programming Evolution.
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
High Performance Communication using MPJ Express 1 Presented by Jawad Manzoor National University of Sciences and Technology, Pakistan 29 June 2015.
Parallel Programming with Java
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Task Farming on HPCx David Henty HPCx Applications Support
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Chapter 9 Message Passing Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere2 Introduction.
Multithreading Allows application to split itself into multiple “threads” of execution (“threads of execution”). OS support for creating threads, terminating.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov.
An Introduction to Parallel Programming and MPICH Nikolaos Hatzopoulos.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
War of the Worlds -- Shared-memory vs. Distributed-memory In distributed world, we have heavyweight processes (nodes) rather than threads Nodes communicate.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
ESC499 – A TMD-MPI/MPE B ASED H ETEROGENEOUS V IDEO S YSTEM Tony Zhou, Prof. Paul Chow April 6 th, 2010.
Hybrid MPI and OpenMP Parallel Programming
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
Threads G.Anuradha (Reference : William Stallings)
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 11.
The Mach System Abraham Silberschatz, Peter Baer Galvin, Greg Gagne Presentation By: Agnimitra Roy.
Session 7 Methods Strings Constructors this Inheritance.
Chapter 4 Message-Passing Programming. The Message-Passing Model.
Introduction to MPI CDP 1. Shared Memory vs. Message Passing Shared Memory Implicit communication via memory operations (load/store/lock) Global address.
CIS 720 Distributed Shared Memory. Shared Memory Shared memory programs are easier to write Multiprocessor systems Message passing systems: - no physically.
MPI and OpenMP.
The Mach System Silberschatz et al Presented By Anjana Venkat.
1 Classes II Chapter 7 2 Introduction Continued study of –classes –data abstraction Prepare for operator overloading in next chapter Work with strings.
Threads. Readings r Silberschatz et al : Chapter 4.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Parallel Computing Presented by Justin Reschke
Implementing Processes and Threads CS550 Operating Systems.
Process by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
MULTIPROCESSING MODULE OF PYTHON. CPYTHON  CPython is the default, most-widely used implementation of the Python programming language.  CPython - single-threaded.
 Dan Ibanez, Micah Corah, Seegyoung Seol, Mark Shephard  2/27/2013  Scientific Computation Research Center  Rensselaer Polytechnic Institute 1 Advances.
MPI: Message Passing Interface An Introduction S. Lakshmivarahan School of Computer Science.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
PVM and MPI.
Day 12 Threads.
multiprocessing and mpi4py Python for computational science
MPI Message Passing Interface
MPI-Message Passing Interface
MPJ: A Java-based Parallel Computing System
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Programming Parallel Computers
Presentation transcript:

Distributed-Memory Programming Using MPIGAP Vladimir Janjic International Workhsop “Parallel Programming in GAP” Aug 2013

What is MPIGAP? A library for distributed-memory programming in GAP –Based on the ParGAP package by Gene Cooperman –Uses MPI communication library for communication between distributed nodes –Each node itself can be a multicore processor => shared- memory operations within a node are supported –Easiest to use in batch/SIMD way Supports distributed implementation of many shared-memory parallel primitives –RunTask, TaskResult, WaitTask, … Also supports explicit copying/sharing of objects between distributed nodes and explicit task placement –RemoteCopyObj, RemotePushObj, SendTask, … In the final version, it will support implicitly-distributed data structures and skeletons

What is this MPI thing? Message Passing Interface –Standardised and portable message-passing protocol Contains operations for sending and receiving binary messages between nodes in distributed system Supports point-to-point and collective operations, synchronisation primitives (barriers)… Bindings exist for C, C++, Fortran, Python… Two best known C implementations are MPICH and OpenMPI

MPIGAP Architecture Object Marshalling Low-Level MPI Bindings Global Object Pointers Shared Objects Distributed Tasks Implicitly Distributed Data Skeletons User Application

MPIGAP Architecture Object Marshalling Low-Level MPI Bindings Global Object Pointers Shared Objects Distributed Tasks Implicitly Distributed Data Skeletons

Object Marshalling Since we work with GAP objects, and MPI can only send binary data, we need a method for converting GAP objects into their binary (string) representation Object Marshalling Two methods for marshalling currently supported: –Object serialisation ( SerializeToNativeString, DeserializeNativeString ) -- default method –IO Pickling ( IO_Pickle, IO_Unpickle ) -- requires IO package Object Serialisation : Faster, less general, not architecture independent IO Pickling : Slower, more general, architecture independent

What method of object marshalling should I use? Most of the time, no need to worry about this By default, object serialisation is used –To use IO Pickling, set MPIGAP_MARSHALLING variable to “Pickle” in your init file If you can, use object serialisation If –you need to transfer “unusual” objects between nodes (the ones that do not have serialisation primitive installed) –you are working on a platform where not all nodes have the same architecture then use IO Pickling

MPIGAP Architecture Object Serialisation Low-Level MPI Bindings Global Object Pointers Shared Objects Distributed Tasks Implicitly Distributed Data Skeletons

Low-level MPI Bindings Borrowed from ParGAP package Simplified GAP bindings for a small subset of MPI operations –MPI_Send(msg, dest), MPI_Binsend(msg,size, dest) –MPI_Recv(msg, [source, [tag]]), MPI_ProbeRecv(msg) –MPI_Probe(), MPI_Iprobe() –MPI_comm_rank(), MPI_Get_size(), MPI_Get_source() All of these bindings work with string You need to explicitly marshall your objects into strings using some marshalling method –For IO pickling, use MPI_Send –For object serialization, use MPI_Binsend Not recommended to use, too low-level –Use if your application cannot be written using task/shared object abstractions

Some Useful (Readonly) Variables processId –rank (id) of a node in a distributed system commSize –number of nodes in a system

MPIGAP Architecture Object Serialisation Low-Level MPI Bindings Global Object Pointers Shared Objects Distributed Tasks Implicitly Distributed Data Skeletons

Global Object Pointers (Handles) Global object handle is a GAP object that represents a global pointer to an object Handles can be copied to multiple distributed nodes –Can be used to access the same object on different nodes Handles (and the underlying shared objects) managed in semi-automatic way –Reference counting –Manual creation and destruction

Creation of Global Object Handles CreateHandleFromObj (obj, accessType) –Creates a handle for object obj (with access type accessType ) on the node where it is called Access types limit the operations that can be used on underying objects : –ACCESS_TYPE.READ_ONLY –ACCESS_TYPE.READ_WRITE –ACCESS_TYPE.VOLATILE Internally, handles are identified by a combination of node id and local id (unique on a node)

Opening and Closing of Handles Before you do anything with a handle, you need to open it –Each thread that works with the handle needs to open it separately –Open(handle) After a thread finishes working with a handle, it needs to close it –Close(handle)

Distribution of Global Object Handles SendHandle (handle, node) –Sends handle to the distributed node node –SendHandle (h, 1); SendAndAssignHandle (handle, node, name) –Sends handle to the distributed node node, creates a global variable with name name there and assigns the handle to it –SendAndAssignHandle (handle, node, “h”);

Accessing underlying objects GetHandleObj (handle) –Returns an underlying object of handle –This does not copy the object to the node where it is called SetHandleObj (handle, obj) –Sets an underying object of handle to obj –Only if handle is not read-only SetHandleObjList (handle, index, obj) –If an underlying object of handle is a list, puts obj in the place index in that list –Only if handle is not read-only CAUTION : Wrapping an object in a handle automatically shares that object (and a handle), therefore a lock needs to be obtained to use it

Global Object Handles Example Node 0 Node 1 [1,2,3] x

Global Object Handles Example (2) Node 0 Node 1 42 x h := CreateHandleObj(x, ACCESS_TYPES.READ_WRITE) On node 0: h [1,2,3]

Global Object Handles Example (3) Node 0 Node 1 42 x SendAndAssignHandle(h, 1, “h”); On node 0: hh [1,2,3]

Global Object Handles Example (4) Node 0 Node 1 21 x SetByHandleList(h, 2, 4); On node 1: hh [1,4,3]

MPIGAP Architecture Object Serialisation Low-Level MPI Bindings Global Object Pointers Shared Objects Distributed Tasks Implicitly Distributed Data Skeletons

Operations on Shared Objects Global object handles enable user to have pointers on different distributed nodes that point to the same object They do not allow user to transfer (copy/move) object between nodes That is where operations on shared objects come into play They use global object handles and enable user to copy, push, clone and pull the objects pointed to by handles between nodes

Copying of Shared Objects Allowed for read-only and volatile handles RemoteCopyObj(handle, dest) –Copies the object that handle points to to the dest node RemoteCloneObj(handle) –Copies the object that handle points to from the node that owns the object –RemoteCloneObj(handle) called on dest node has the same effect as calling RemoteCopyObj(handle, dest) on the node that owns the object pointed to by handle If handle does not exist on the destination node, RemoteCopyObj will also copy it there

Pushing of Shared Objects Allowed for all types of handles RemotePushObj(handle, dest) –Pushes the object that handle points to to the dest node –dest node becomes the owner of the object RemotePullObj(handle) –Pulls the object that handle points to from the node that owns the object –The node on which this is called becomes the owner of the object –RemotePullObj(handle) called on dest node has the same effect as calling RemotePushObj(handle, dest) on the node that owns the object pointed to by handle As with copying, if handle does not exist on dest node, RemotePushObj copies it there

Shared Object Example (1.1) Node 0 Node 1

Shared Objects Example (1.2) Node 0 Node 1 h := CreateHandleObj([1,2,3], ACCESS_TYPES.READ_WRITE) On node 0: h [1,2,3]

Shared Objects Example (1.3) Node 0 Node 1 h := RemotePushObj(h, 1) On node 0: hh [1,2,3]

Shared Objects Example (2.1) Node 0 Node 1 h := CreateHandleObj([1,2,3], ACCESS_TYPES.READ_WRITE) On node 0: h [1,2,3]

Shared Objects Example (2.2) Node 0 Node 1 h := RemoteCopyObj(h,1); -- error, read-write handle! On node 0: h [1,2,3]

Shared Objects Example (3.1) Node 0 Node 1 h := CreateHandleObj([1,2,3], ACCESS_TYPES.READ_ONLY) On node 0: h [1,2,3]

Shared Objects Example (3.2) Node 0 Node 1 h := RemoteCopyObj(h, 1) On node 0: h [1,2,3]

Shared Objects Example (4.1) Node 0 Node 1 h := CreateHandleObj([1,2,3], ACCESS_TYPES.READ_WRITE) On node 0: h [1,2,3]

Shared Objects Example (4.2) Node 0 Node 1 SendAndAssignHandle(h,1,”h”); On node 0: h [1,2,3] h

Shared Objects Example (4.2) Node 0 Node 1 RemotePullObj(h); On node 1: h [1,2,3] h

MPIGAP Architecture Object Serialisation Low-Level MPI Bindings Global Object Pointers Shared Objects Distributed Tasks Implicitly Distributed Data Skeletons

Explicit Task Placement MPIGAP supports explicit task placement on nodes CreateTask ([taskArgs]) creates a task (but does not execute it) –taskArgs is a list where the first element is task function name, and the rest of the elements are task arguments SendTask(t,dest) sends the task t to the destination node dest SendTask creates a handle for the task result and returns this handle Task result can be obtained using TaskResult(t), or it can be fetched back to the node that called SendTask

Explicit Task Placement Example DeclareGlobalFunction (“f”); InstallGlobalFunction (f, function(handle, num) local l, res; res := []; l := GetHandleObj(handle); atomic readonly l do res := List(l, x -> x + num); od; return res; end); If processId = 0 then h := CreateHandleFromObj([1,2,3,4,5]); t := CreateTask ([“f”,h,1]); SendTask(t, 1); result := TaskResult(t); fi;

Explicit Task Placement Example DeclareGlobalFunction (“f”); InstallGlobalFunction (f, function(handle, num); local l, res; res := []; l := GetHandleObj(handle); atomic readonly l do res := List (l, x -> x + num); od; return res; end); If processId = 0 then h := CreateHandleFromObj([1,2,3,4,5]); RemotePushObj (h, 1); t := CreateTask ([“f”,h,1]); SendTask(t, 1); result := TaskResult(t); fi;

Implicit Task Placement Some of the task primitives have distributed implementations that have almost the same API as their shared-memory counterparts –RunTask(f,arg1,arg2,…,argN) –TaskResult(t) –WaitTask(t) Tasks are “magically” distributed to the nodes One minor difference in API: –If object serialisation is used as a marshalling method, first argument of RunTask needs to be function name (rather than function object) –Functions that are used for tasks need to be global (need to be implemented using DeclareGlobalFunction and InstallGlobalFunction )

Implicit Task Placement -- Task Distribution Task distribution over distributed nodes is done using work-stealing (more about it in a minute) –Work-stealing needs to be enabled using StartStealing() function –It can be turn off using StopStealing() function

Implicit Task Placement -- Details (1) … Node 0 Message Manager Task Manager Worker 1 Worker 2 Worker n … Task queue

Implicit Task Placement -- Details (2) … Node 0 Message Manager Task Manager Worker 1 Worker 2 Worker n … Task queue … Message Manager Task Manager Worker 1 Worker 2 Worker m … Task queue Node 1

Implicit Task Placement -- Details (3) … Node 0 Message Manager Task Manager Worker 1 Worker 2 Worker n … Task queue … Message Manager Task Manager Worker 1 Worker 2 Worker m … Task queue Node 1 STEAL_MSG

Implicit Task Placement -- Details (4) … Node 0 Message Manager Task Manager Worker 1 Worker 2 Worker n … Task queue … Message Manager Task Manager Worker 1 Worker 2 Worker m … Task queue Node 1

Implicit Task Placement -- Details (5) … Node 0 Message Manager Task Manager Worker 1 Worker 2 Worker n … Task queue … Message Manager Task Manager Worker 1 Worker 2 Worker m … Task queue Node 1

Implicit Task Placement -- Details (6) … Node 0 Message Manager Task Manager Worker 1 Worker 2 Worker n … Task queue … Message Manager Task Manager Worker 1 Worker 2 Worker m … Task queue Node 1 STEAL_MSG

Implicit Task Placement Example -- Bad Fibonacci DeclareGlobalFunction(“badFib”); InstallGlobalFunction(badFib, function(n) if n < 3 then return 1; else t1 := RunTask(“badFib”, n-1); t2 := RunTask(“badFib”, n-2); return TaskResult(t1) + TaskResult(t2); fi; end); if processId = 0 then res := badFib(20); Print (res,”\n”); fi;

Summary MPIGAP supports distributed-memory computing, with multiple threads within the same distributed node Supports sharing objects on multiple distributed nodes, and explicit moving of objects between nodes Supports task management in distributed world, where each node has multiple worker threads –Implicit task placement using RunTask –Explicit task placement using SendTask Still to come: implicitly distributed data structures and skeletons ( ParList, ParDivideConquer, ParMasterWorker etc.)