Distributed-Memory Programming Using MPIGAP Vladimir Janjic International Workhsop “Parallel Programming in GAP” Aug 2013
What is MPIGAP? A library for distributed-memory programming in GAP –Based on the ParGAP package by Gene Cooperman –Uses MPI communication library for communication between distributed nodes –Each node itself can be a multicore processor => shared- memory operations within a node are supported –Easiest to use in batch/SIMD way Supports distributed implementation of many shared-memory parallel primitives –RunTask, TaskResult, WaitTask, … Also supports explicit copying/sharing of objects between distributed nodes and explicit task placement –RemoteCopyObj, RemotePushObj, SendTask, … In the final version, it will support implicitly-distributed data structures and skeletons
What is this MPI thing? Message Passing Interface –Standardised and portable message-passing protocol Contains operations for sending and receiving binary messages between nodes in distributed system Supports point-to-point and collective operations, synchronisation primitives (barriers)… Bindings exist for C, C++, Fortran, Python… Two best known C implementations are MPICH and OpenMPI
MPIGAP Architecture Object Marshalling Low-Level MPI Bindings Global Object Pointers Shared Objects Distributed Tasks Implicitly Distributed Data Skeletons User Application
MPIGAP Architecture Object Marshalling Low-Level MPI Bindings Global Object Pointers Shared Objects Distributed Tasks Implicitly Distributed Data Skeletons
Object Marshalling Since we work with GAP objects, and MPI can only send binary data, we need a method for converting GAP objects into their binary (string) representation Object Marshalling Two methods for marshalling currently supported: –Object serialisation ( SerializeToNativeString, DeserializeNativeString ) -- default method –IO Pickling ( IO_Pickle, IO_Unpickle ) -- requires IO package Object Serialisation : Faster, less general, not architecture independent IO Pickling : Slower, more general, architecture independent
What method of object marshalling should I use? Most of the time, no need to worry about this By default, object serialisation is used –To use IO Pickling, set MPIGAP_MARSHALLING variable to “Pickle” in your init file If you can, use object serialisation If –you need to transfer “unusual” objects between nodes (the ones that do not have serialisation primitive installed) –you are working on a platform where not all nodes have the same architecture then use IO Pickling
MPIGAP Architecture Object Serialisation Low-Level MPI Bindings Global Object Pointers Shared Objects Distributed Tasks Implicitly Distributed Data Skeletons
Low-level MPI Bindings Borrowed from ParGAP package Simplified GAP bindings for a small subset of MPI operations –MPI_Send(msg, dest), MPI_Binsend(msg,size, dest) –MPI_Recv(msg, [source, [tag]]), MPI_ProbeRecv(msg) –MPI_Probe(), MPI_Iprobe() –MPI_comm_rank(), MPI_Get_size(), MPI_Get_source() All of these bindings work with string You need to explicitly marshall your objects into strings using some marshalling method –For IO pickling, use MPI_Send –For object serialization, use MPI_Binsend Not recommended to use, too low-level –Use if your application cannot be written using task/shared object abstractions
Some Useful (Readonly) Variables processId –rank (id) of a node in a distributed system commSize –number of nodes in a system
MPIGAP Architecture Object Serialisation Low-Level MPI Bindings Global Object Pointers Shared Objects Distributed Tasks Implicitly Distributed Data Skeletons
Global Object Pointers (Handles) Global object handle is a GAP object that represents a global pointer to an object Handles can be copied to multiple distributed nodes –Can be used to access the same object on different nodes Handles (and the underlying shared objects) managed in semi-automatic way –Reference counting –Manual creation and destruction
Creation of Global Object Handles CreateHandleFromObj (obj, accessType) –Creates a handle for object obj (with access type accessType ) on the node where it is called Access types limit the operations that can be used on underying objects : –ACCESS_TYPE.READ_ONLY –ACCESS_TYPE.READ_WRITE –ACCESS_TYPE.VOLATILE Internally, handles are identified by a combination of node id and local id (unique on a node)
Opening and Closing of Handles Before you do anything with a handle, you need to open it –Each thread that works with the handle needs to open it separately –Open(handle) After a thread finishes working with a handle, it needs to close it –Close(handle)
Distribution of Global Object Handles SendHandle (handle, node) –Sends handle to the distributed node node –SendHandle (h, 1); SendAndAssignHandle (handle, node, name) –Sends handle to the distributed node node, creates a global variable with name name there and assigns the handle to it –SendAndAssignHandle (handle, node, “h”);
Accessing underlying objects GetHandleObj (handle) –Returns an underlying object of handle –This does not copy the object to the node where it is called SetHandleObj (handle, obj) –Sets an underying object of handle to obj –Only if handle is not read-only SetHandleObjList (handle, index, obj) –If an underlying object of handle is a list, puts obj in the place index in that list –Only if handle is not read-only CAUTION : Wrapping an object in a handle automatically shares that object (and a handle), therefore a lock needs to be obtained to use it
Global Object Handles Example Node 0 Node 1 [1,2,3] x
Global Object Handles Example (2) Node 0 Node 1 42 x h := CreateHandleObj(x, ACCESS_TYPES.READ_WRITE) On node 0: h [1,2,3]
Global Object Handles Example (3) Node 0 Node 1 42 x SendAndAssignHandle(h, 1, “h”); On node 0: hh [1,2,3]
Global Object Handles Example (4) Node 0 Node 1 21 x SetByHandleList(h, 2, 4); On node 1: hh [1,4,3]
MPIGAP Architecture Object Serialisation Low-Level MPI Bindings Global Object Pointers Shared Objects Distributed Tasks Implicitly Distributed Data Skeletons
Operations on Shared Objects Global object handles enable user to have pointers on different distributed nodes that point to the same object They do not allow user to transfer (copy/move) object between nodes That is where operations on shared objects come into play They use global object handles and enable user to copy, push, clone and pull the objects pointed to by handles between nodes
Copying of Shared Objects Allowed for read-only and volatile handles RemoteCopyObj(handle, dest) –Copies the object that handle points to to the dest node RemoteCloneObj(handle) –Copies the object that handle points to from the node that owns the object –RemoteCloneObj(handle) called on dest node has the same effect as calling RemoteCopyObj(handle, dest) on the node that owns the object pointed to by handle If handle does not exist on the destination node, RemoteCopyObj will also copy it there
Pushing of Shared Objects Allowed for all types of handles RemotePushObj(handle, dest) –Pushes the object that handle points to to the dest node –dest node becomes the owner of the object RemotePullObj(handle) –Pulls the object that handle points to from the node that owns the object –The node on which this is called becomes the owner of the object –RemotePullObj(handle) called on dest node has the same effect as calling RemotePushObj(handle, dest) on the node that owns the object pointed to by handle As with copying, if handle does not exist on dest node, RemotePushObj copies it there
Shared Object Example (1.1) Node 0 Node 1
Shared Objects Example (1.2) Node 0 Node 1 h := CreateHandleObj([1,2,3], ACCESS_TYPES.READ_WRITE) On node 0: h [1,2,3]
Shared Objects Example (1.3) Node 0 Node 1 h := RemotePushObj(h, 1) On node 0: hh [1,2,3]
Shared Objects Example (2.1) Node 0 Node 1 h := CreateHandleObj([1,2,3], ACCESS_TYPES.READ_WRITE) On node 0: h [1,2,3]
Shared Objects Example (2.2) Node 0 Node 1 h := RemoteCopyObj(h,1); -- error, read-write handle! On node 0: h [1,2,3]
Shared Objects Example (3.1) Node 0 Node 1 h := CreateHandleObj([1,2,3], ACCESS_TYPES.READ_ONLY) On node 0: h [1,2,3]
Shared Objects Example (3.2) Node 0 Node 1 h := RemoteCopyObj(h, 1) On node 0: h [1,2,3]
Shared Objects Example (4.1) Node 0 Node 1 h := CreateHandleObj([1,2,3], ACCESS_TYPES.READ_WRITE) On node 0: h [1,2,3]
Shared Objects Example (4.2) Node 0 Node 1 SendAndAssignHandle(h,1,”h”); On node 0: h [1,2,3] h
Shared Objects Example (4.2) Node 0 Node 1 RemotePullObj(h); On node 1: h [1,2,3] h
MPIGAP Architecture Object Serialisation Low-Level MPI Bindings Global Object Pointers Shared Objects Distributed Tasks Implicitly Distributed Data Skeletons
Explicit Task Placement MPIGAP supports explicit task placement on nodes CreateTask ([taskArgs]) creates a task (but does not execute it) –taskArgs is a list where the first element is task function name, and the rest of the elements are task arguments SendTask(t,dest) sends the task t to the destination node dest SendTask creates a handle for the task result and returns this handle Task result can be obtained using TaskResult(t), or it can be fetched back to the node that called SendTask
Explicit Task Placement Example DeclareGlobalFunction (“f”); InstallGlobalFunction (f, function(handle, num) local l, res; res := []; l := GetHandleObj(handle); atomic readonly l do res := List(l, x -> x + num); od; return res; end); If processId = 0 then h := CreateHandleFromObj([1,2,3,4,5]); t := CreateTask ([“f”,h,1]); SendTask(t, 1); result := TaskResult(t); fi;
Explicit Task Placement Example DeclareGlobalFunction (“f”); InstallGlobalFunction (f, function(handle, num); local l, res; res := []; l := GetHandleObj(handle); atomic readonly l do res := List (l, x -> x + num); od; return res; end); If processId = 0 then h := CreateHandleFromObj([1,2,3,4,5]); RemotePushObj (h, 1); t := CreateTask ([“f”,h,1]); SendTask(t, 1); result := TaskResult(t); fi;
Implicit Task Placement Some of the task primitives have distributed implementations that have almost the same API as their shared-memory counterparts –RunTask(f,arg1,arg2,…,argN) –TaskResult(t) –WaitTask(t) Tasks are “magically” distributed to the nodes One minor difference in API: –If object serialisation is used as a marshalling method, first argument of RunTask needs to be function name (rather than function object) –Functions that are used for tasks need to be global (need to be implemented using DeclareGlobalFunction and InstallGlobalFunction )
Implicit Task Placement -- Task Distribution Task distribution over distributed nodes is done using work-stealing (more about it in a minute) –Work-stealing needs to be enabled using StartStealing() function –It can be turn off using StopStealing() function
Implicit Task Placement -- Details (1) … Node 0 Message Manager Task Manager Worker 1 Worker 2 Worker n … Task queue
Implicit Task Placement -- Details (2) … Node 0 Message Manager Task Manager Worker 1 Worker 2 Worker n … Task queue … Message Manager Task Manager Worker 1 Worker 2 Worker m … Task queue Node 1
Implicit Task Placement -- Details (3) … Node 0 Message Manager Task Manager Worker 1 Worker 2 Worker n … Task queue … Message Manager Task Manager Worker 1 Worker 2 Worker m … Task queue Node 1 STEAL_MSG
Implicit Task Placement -- Details (4) … Node 0 Message Manager Task Manager Worker 1 Worker 2 Worker n … Task queue … Message Manager Task Manager Worker 1 Worker 2 Worker m … Task queue Node 1
Implicit Task Placement -- Details (5) … Node 0 Message Manager Task Manager Worker 1 Worker 2 Worker n … Task queue … Message Manager Task Manager Worker 1 Worker 2 Worker m … Task queue Node 1
Implicit Task Placement -- Details (6) … Node 0 Message Manager Task Manager Worker 1 Worker 2 Worker n … Task queue … Message Manager Task Manager Worker 1 Worker 2 Worker m … Task queue Node 1 STEAL_MSG
Implicit Task Placement Example -- Bad Fibonacci DeclareGlobalFunction(“badFib”); InstallGlobalFunction(badFib, function(n) if n < 3 then return 1; else t1 := RunTask(“badFib”, n-1); t2 := RunTask(“badFib”, n-2); return TaskResult(t1) + TaskResult(t2); fi; end); if processId = 0 then res := badFib(20); Print (res,”\n”); fi;
Summary MPIGAP supports distributed-memory computing, with multiple threads within the same distributed node Supports sharing objects on multiple distributed nodes, and explicit moving of objects between nodes Supports task management in distributed world, where each node has multiple worker threads –Implicit task placement using RunTask –Explicit task placement using SendTask Still to come: implicitly distributed data structures and skeletons ( ParList, ParDivideConquer, ParMasterWorker etc.)