Parallel Programming with PVM Prof. Sivarama Dandamudi School of Computer Science Carleton University
© S. Dandamudi2 Parallel Algorithm Models Five basic models Data parallel model Task graph model Work pool model Master-slave model Pipeline model Hybrid models
Carleton University© S. Dandamudi3 Parallel Algorithm Models (cont’d) Data parallel model One of the simplest of all the models Tasks are statically mapped onto processors Each task performs similar operation on different data Called data parallelism model Work may be done in phases Operations in different phases may be different Ex: Matrix multiplication
Carleton University© S. Dandamudi4 Parallel Algorithm Models (cont’d) Data parallel model A11 A12 B11 B12 C11 C12 A21 A22 B21 B22 C21 C22 A decomposition into four tasks Task 1: C11 = A11 B11 + A12 B21 Task 2: C12 = A11 B12 + A12 B22 Task 3: C21 = A21 B11 + A22 B21 Task 4: C11 = A21 B21 + A22 B22. =
Carleton University© S. Dandamudi5 Parallel Algorithm Models (cont’d) Task graph model Parallel algorithm is viewed as a task-dependency graph Called task parallelism model Typically used for tasks that have large amount of data Static mapping is used to optimize data movement cost Locality-based mapping is important Ex: Divide-and-conquer algorithms, parallel quicksort
Carleton University© S. Dandamudi6 Parallel Algorithm Models (cont’d) Task parallelism
Carleton University© S. Dandamudi7 Parallel Algorithm Models (cont’d) Work pool model Dynamic mapping of tasks onto processors Important for load balancing Used on message passing systems When the data associated with a task is relatively small Granularity of tasks Too small: overhead in accessing tasks can increase Too big: Load imbalance Ex: Parallelization of loops by chunk scheduling
Carleton University© S. Dandamudi8 Parallel Algorithm Models (cont’d) Master-slave model One or more master processes generate work and allocate it to worker processes Also called manager-worker model Suitable for both shared-memory and message passing systems Master can potentially become a bottleneck Granularity of tasks is important
Carleton University© S. Dandamudi9 Parallel Algorithm Models (cont’d) Pipeline model A stream of data passes through a series of processors Each process performs some task on the data Also called stream parallelism model Uses producer-consumer relationship Overlapped execution Useful in applications such as database query processing Potential problem One process can delay the whole pipeline
Carleton University© S. Dandamudi10 Parallel Algorithm Models (cont’d) Pipeline model R5 R4 R3 R2R1 Pipelined processing can avoid writing temporary results on disk and reading them back
Carleton University© S. Dandamudi11 Parallel Algorithm Models (cont’d) Hybrid models Possible to use multiple models Hierarchical Different models at different levels Sequentially Different models in different phases Ex: Major computation may use task graph model Each node of the graph may use data parallelism or pipeline model
Carleton University© S. Dandamudi12 PVM Parallel virtual machine Collaborative effort Oak Ridge National Lab, University of Tennessee, Emory University, and Carnegie Mellon University Began in 1989 Version 1.0 was used internally Version 2.0 released in March 1991 Version 3.0 in February 1993
Carleton University© S. Dandamudi13 PVM (cont’d) Parallel virtual machine Targeted for heterogeneous network computing Different architectures Data formats Computational speeds Machine loads Network load
Carleton University© S. Dandamudi14 PVM Calls Process control int tid = pvm_mytid(void) Returns pid of the calling process Can be called multiple times int info = pvm_exit(void) Does not kill the process Tells the local pvmd that this process is leaving PVM info < 0 indicates error (error: pvmd not responding)
Carleton University© S. Dandamudi15 PVM Calls (cont’d) Process control int numt = pvm_spawn(char *task, char **argv, int flag, char *where, int ntask, int *tids) Starts ntask copies of the executable file task Arguments to task (NULL terminated) Specific host Specific architecture (PVM_ARCH)
Carleton University© S. Dandamudi16 PVM Calls (cont’d) flag specifies options ValueOptionMeaning 0 PvmTaskDefault PVM chooses where to span 1 PvmTaskHost where specifies a host 2 PvmTaskArch where specifies a architecture 3 PvmTaskDebug Starts tasks under debugger
Carleton University© S. Dandamudi17 PVM Calls (cont’d) Process control int info = pvm_kill(int tid) Kills the PVM task identified by tid Does not kill the calling task To kill the calling task First call pvm_exit() Then exit() Writes to the file /tmp/pvml.
Carleton University© S. Dandamudi18 PVM Calls (cont’d) Information int tid = pvm_parent(void) Returns the tid of the process that spawned the calling task Returns PvmNoParent if the task is not created by pvm_spawn()
Carleton University© S. Dandamudi19 PVM Calls (cont’d) Information int info = pvm_config( int *nhost, int *narch, struct pvmhostinfo **hostp) Returns nhost = number of hosts Returns narch = number of different data formats
Carleton University© S. Dandamudi20 PVM Calls (cont’d) Message sending Involves three steps Send buffer must be initialized Use pvm_initsend() Message must be packed Use pvm_pk*() Several pack routines are available Send the message Use pvm_send()
Carleton University© S. Dandamudi21 PVM Calls (cont’d) Message sending int bufid = pvm_initsend( int encoding) Called before packing a new message into the buffer Clears the send buffer and creates a new one for packing a new message bufid = new buffer id
Carleton University© S. Dandamudi22 PVM Calls (cont’d) encoding can have three options: PvmDataDefault XDR encoding is used by default Useful for heterogeneous architectures PvmDataRaw No encoding is done Messages sent in their original form PvmDataInPLace No buffer copying Buffer should not be modified until sent
Carleton University© S. Dandamudi23 PVM Calls (cont’d) Packing data Several routines are available (one for each data type) Each takes three arguments int info = pvm_pkbyte(char *cp, int nitem, int stride) nitem = # items to be packed stride = stride in elements
Carleton University© S. Dandamudi24 PVM Calls (cont’d) Packing data pvm_pkint pvm_pklong pvm_pkfloat pvm_pkdouble pvm_pkshort Pack string routine requires only the NULL-terminated string pointer pvm_pkstr(char *cp)
Carleton University© S. Dandamudi25 PVM Calls (cont’d) Sending data int info = pvm_send(int tid, int msgtag) Sends the message in the packed buffer to task tid Message is tagged with msgtag Message tags are useful to distinguish different types of messages
Carleton University© S. Dandamudi26 PVM Calls (cont’d) Sending data (multicast) int info = pvm_mcast(int *tids, int ntask, int msgtag) Sends the message in the packed buffer to all tasks in the tid array (except itself) tid array length is given by ntask
Carleton University© S. Dandamudi27 PVM Calls (cont’d) Receiving data Two steps Receive data Unpack it Two versions Blocking Waits until the message arrives Non-blocking Does not wait
Carleton University© S. Dandamudi28 PVM Calls (cont’d) Receiving data Blocking receive int info = pvm_recv(int tid, int msgtag) Wait until a message with msgtag has arrived from task tid Wildcard value ( 1) is allowed for both msgtag and tid
Carleton University© S. Dandamudi29 PVM Calls (cont’d) Receiving data Non-blocking receive int info = pvm_nrecv(int tid, int msgtag) If no message with msgtag has arrived from task tid Returns bufid = 0 Otherwise, behaves like the blocking receive
Carleton University© S. Dandamudi30 PVM Calls (cont’d) Receiving data Probing for a message int info = pvm_probe(int tid, int msgtag) If no message with msgtag has arrived from task tid Returns bufid = 0 Otherwise, returns a bufid for the message Does not receive the message
Carleton University© S. Dandamudi31 PVM Calls (cont’d) Unpacking data (similar to packing routines) pvm_upkint pvm_upklong pvm_upkfloat pvm_upkdouble pvm_upkshort pvm_upkbyte Pack string routine requires only the NULL-terminated string pointer pvm_upkstr(char *cp)
Carleton University© S. Dandamudi32 PVM Calls (cont’d) Buffer information Useful to find the size of the received message int info = pvm_bufinfo(int bufid, int *bytes, int *msgtag, int *tid) Returns msgtag, source tid, and size in bytes
Carleton University© S. Dandamudi33 Example Finds sum of elements of a given vector Vector size is given as input The program can be run on a PVM with up to 10 nodes Can be modified by changing a constant Vector is assumes to be evenly divisable by number of nodes in PVM Easy to modify this restriction Master ( vecsum.c ) and slave ( vecsum_slave.c ) programs
Carleton University© S. Dandamudi34 Example (cont’d) vecsum.c #include #include "pvm3.h" #define MAX_SIZE /* max. vector size */ #define NPROCS 10 /* max. number of PVM nodes */
Carleton University© S. Dandamudi35 Example (cont’d) main() { int cc, tid[NPROCS]; long vector[MAX_SIZE]; double sum = 0, partial_sum; /* partial sum received from slaves */ long i, vector_size;
Carleton University© S. Dandamudi36 Example (cont’d) int nhost, /* actual # of hosts in PVM */ size; /* size of vector to be distributed */ struct timeval start_time, finish_time; long sum_time;
Carleton University© S. Dandamudi37 Example (cont’d) printf("Vector size = "); scanf("%ld", &vector_size); for(i=0; i<vector_size; i++) /* initialize vector */ vector[i] = i; gettimeofday(&start_time, (struct timezone*)0); /* start time */
Carleton University© S. Dandamudi38 Example (cont’d) tid[0] = pvm_mytid(); /* establish my tid */ /* get # of hosts using pvm_config() */ pvm_config(&nhost, (int *)0, (struct hostinfo *)0); size = vector_size/nhost; /* size of vector to send to slaves */
Carleton University© S. Dandamudi39 Example (cont’d) if (nhost > 1) pvm_spawn("vecsum_slave", (char **)0, 0, "", nhost-1, &tid[1]); for (i=1; i<nhost;i++){ /* distribute data to slaves */ pvm_initsend(PvmDataDefault); pvm_pklong(&vector[i*size],size,1); pvm_send(tid[i],1); }
Carleton University© S. Dandamudi40 Example (cont’d) for (i=0; i<size;i++) /* perform local sum */ sum += vector[i]; for (i=1; i<nhost;i++){ /* collect partial sums from slaves */ pvm_recv(-1,2); pvm_upkdouble(&partial_sum,1,1); sum += partial_sum; }
Carleton University© S. Dandamudi41 Example (cont’d) gettimeofday(&finish_time, (struct timezone*)0); /* finish time */ sum_time = (finish_time.tv_sec – start_time.tv_sec) * finish_time.tv_usec - start_time.tv_usec; Time in secs
Carleton University© S. Dandamudi42 Example (cont’d) printf("Sum = %lf\n",sum); printf("Sum time on %d hosts = %lf sec\n", nhost, (double)sum_time/ ); pvm_exit(); }
Carleton University© S. Dandamudi43 Example (cont’d) vecsum_slave.c #include "pvm3.h" #define MAX_SIZE main() { int ptid, bufid, vector_bytes; long vector[MAX_SIZE]; double sum = 0; int i;
Carleton University© S. Dandamudi44 Example (cont’d) ptid = pvm_parent(); /* find parent tid */ bufid = pvm_recv(ptid,1); /* receive data from master */ /* use pvm_bufinfo() to find the number of bytes received */ pvm_bufinfo(bufid, &vector_bytes, (int *)0, (int *) 0);
Carleton University© S. Dandamudi45 Example (cont’d) pvm_upklong(vector, vector_bytes/sizeof(long),1); /* unpack */ for (i=0; i<vector_bytes/sizeof(long); i++) /* local summation */ sum += vector[i];
Carleton University© S. Dandamudi46 Example (cont’d) pvm_initsend(PvmDataDefault); /* send sum to master */ pvm_pkdouble(&sum,1,1); pvm_send(ptid, 2); /* use msg type 2 for partial sum */ pvm_exit(); }