CS 4410 – Parallel Computing 2 Chap 9 Manager-Worker Paradigm In Chapter 9, we will explore the Manager-Worker paradigm –Most of the problems we have seen so far have lent themselves to a domain decomposition –Here we will look at a problem which can be functionally decomposed Document cataloging –Given a directory structure of documents and a list of desired keywords to search for in those documents, count the keywords in each document –Tasks Determine file list, read keywords Examine each document for keywords Consolidate and save results
CS 4410 – Parallel Computing 3 Chap 9 Manager-Worker Paradigm Partitioning –Make each operation a primitive task, using functional decomposition Communication –The task-channel graph is our communication pattern Find Docs Read Keywords Read Docs Generate Summary Write Results Find Docs Read Keywords Read Doc Generate Summary Write Results Read Doc Generate Summary Read Doc Generate Summary
CS 4410 – Parallel Computing 4 Chap 9 Manager-Worker Paradigm Agglomeration and Mapping –It probably makes sense to combine the ReadDocs and GenerateSummary steps –But we don’t know a priori how many documents there will be Some documents will be smaller than others Apportioning them statically may result in load imbalance We want to assign them “on demand” as processors become available to do tasks This is the Manager-Worker paradigm –One processor is dedicated to keeping all of the others busy Like a professor, he hands out work, but never does any himself This “manager” knows when the job is done and notifies others appropriately
CS 4410 – Parallel Computing 5 Chap 9 Manager-Worker Paradigm Manager needs list of documents that he will assign Workers analyzes documents –They are the user of the keywords, not the manager Manager assembles results and writes them Find Docs Read Keywords Read/Gen Summary Write Results Read/Gen Summary Read/Gen Summary
CS 4410 – Parallel Computing 6 Chap 9 Manager-Worker Paradigm The Manager-Worker paradigm is significantly different than our previous SPMD model –Manager performs a much different role than the workers, so they don’t really need to run the same code –How would we handle this? –Still run same program, but split into different subroutines early on main (int argc, char *argv[] ) { MPI_Init (); MPI_Comm_rank (MPI_COMM_WORLD, &myRank); MPI_Comm_size (MPI_COMM_WORLD, &numProcs); if (myRank == 0) { doManager (myRank, numProcs, argc, argv); } else { doWorker (myRank, numProcs, argc, argv); }
CS 4410 – Parallel Computing 7 Chap 9 Manager-Worker Paradigm Let’s think about what the Manager needs to do 1.Get list of documents to be processed 2.Hand out documents to workers (assume they report in for work) 3.Receive keyword counts from workers Needs to know # of keywords 4.Write overall results to file Find Docs Read Keywords Read/Gen Summary Write Results Read/Gen Summary Read/Gen Summary
CS 4410 – Parallel Computing 8 Chap 9 Manager-Worker Paradigm Pseudocode for doManager () doManager (myRank, numProcs, argc, argv) { getFilenames (argc, argv) // path to directory in argv receive #keywords from Worker 0 allocate storage array for n documents by k keywords while (terminatedWorkers < numProcs - 1) { receive a msg from any worker if (msgType == initialCheckin) do nothing else if (msgType == gotResults) store result in storage array if (docsAssigned < n) assign nextDoc to this worker docsAssigned++ else terminate this worker terminatedWorkers++ }
CS 4410 – Parallel Computing 9 Chap 9 Manager-Worker Paradigm What do the Workers need to do? 1.Read the keyword file Individually, or just one followed by broadcast? 2.If Worker 0, send Manager size k 3.Build hash to process documents 4.Report for duty to Manager 5.While (receive msg w/ work) Read document Generate keyword count Send results to Manager
CS 4410 – Parallel Computing 10 Chap 9 Manager-Worker Paradigm Pseudocode for doWorker () doWorker (myRank, numProcs, argc, argv) { send (Manager, RequestForWork); // get started on handshake early if (worker0) ReadKeywordsFromFile(); Broadcast (keywords, worker0, all workers) if (worker0) send (Manager, numKeywords); Build hash table for keywords while (!terminated) recv (Manager, Document) if (tag == terminate) terminated = true; else read Document analyze Document send (Manager, ResultsMsg);
CS 4410 – Parallel Computing 11 Chap 9 Splitting Comm Groups An issue – The worker broadcast of keywords should go just to workers, not to the Manager also (not to MPI_COMM_WORLD) –We need to create a separate comm group –Easiest way is MPI_Comm_split –MPI_Comm_split (oldComm, color, key, &newComm) old_comm = MPI_COMM_WORLD color: int representing which group to put in –If don’t want a group assigned, can pass MPI_UNDEFINED key: int representing what rank to give this process in new group –If all pass in same int, stay in same order –If pass in different, ranks according to key order new_comm: the comm group containing your color
CS 4410 – Parallel Computing 12 Chap 9 Splitting Comm Groups MPI_Comm workerComm; MPI_Comm_split(MPI_COMM_WORLD, 0, id, &workerComm); We could have also done (forms 2 groups) MPI_Comm workerComm; if (myRank == 0) MPI_Comm_split(MPI_COMM_WORLD, MPI_UNDEFINED, myRank, &workerComm); else MPI_Comm_split(MPI_COMM_WORLD, 0, myRank, &workerComm); MPI_Comm workerComm; MPI_Comm_split(MPI_COMM_WORLD, (myRank==0), myRank, &workerComm);
CS 4410 – Parallel Computing 13 Chap 9 NonBlocking Communications So far we’ve looked only at blocking sends and receives –But sometimes we may be able to overlap computation and communication by initiating a nonblocking receive Consider the work the Manager has to do –He must get filenames and receive keywords before he can allocate storage array –But there is no implied order of getting filenames and receiving keywords –Perhaps we could do them in parallel! doManager (myRank, numProcs, argc, argv) { getFilenames (argc, argv) // path to directory in argv receive #keywords from Worker 0 allocate storage array for n documents by k keywords while (terminatedWorkers < numProcs - 1) {
CS 4410 – Parallel Computing 14 Chap 9 NonBlocking Communications With a nonblocking recv, the Manager can initiate the receive, then go off and get filenames, and then just has to be sure the receive finishes before he allocates the array –MPI supports nonblocking communication with MPI_Isend MPI_Irecv MPI_Wait doManager (myRank, numProcs, argc, argv) { getFilenames (argc, argv) receive #keywords from Worker 0 allocate storage array while (terminatedWorkers < numProcs - 1) { doManager (myRank, numProcs, argc, argv) { initiate receive of #keywords getFilenames (argc, argv) complete receive of #keywords allocate storage array while (terminatedWorkers < numProcs - 1) {
CS 4410 – Parallel Computing 15 Chap 9 NonBlocking Communications With either MPI_Isend or MPI_Irecv, the requirement is that your program cannot change the buffer which is holding the data to be sent or received until the operation has completed –You can test for this with MPI_Test –More commonly, you will use MPI_Wait to wait for a particular asynchronous operation to complete –To tie the MPI_Wait to a particular MPI_Isend or MPI_Irecv, a handle (of type MPI_Request) is used –MPI_Isend (buffer, count, type, dest, tag, comm, handle); MPI_Isend(&numKeys, 1, MPI_INT, 0, KEY_TAG, MPI_COMM_WORLD, &myHandle); –MPI_Wait (handle, status) MPI_Wait (&myHandle, &status);
CS 4410 – Parallel Computing 16 Chap 9 NonBlocking Communications MPI_Irecv(buffer, count, type, src, tag, comm, handle); –Essentially same as Isend except for src/dest –MPI_Irecv(&numKeys, 1, MPI_INT, MPI_ANY_SOURCE, KEY_TAG, MPI_COMM_WORLD, &myHandle); Another useful command is MPI_Probe –It allows you to see how long an incoming message is before reading it This allows you to create the appropriate size buffer before you actually read MPI_Probe(src, tag, comm, status) –MPI_Probe(0, NAME_TAG, MPI_COMM_WORLD, &status); –Can use MPI_ANY_SOURCE and MPI_ANY_TAG to check for any incoming message –Status can be used to check src, tag, message length
CS 4410 – Parallel Computing 17 Chap 9 Other Communication MPI_Get_count is another useful function –Given a status (e.g., returned from MPI_Probe), returns a count of the number of elements in the message –MPI_Get_count (status, dataType, count) –MPI_Get_count (&status, MPI_CHAR, &count);
CS 4410 – Parallel Computing 18 Chap 9 Manager Code Look at the code on pgs doManager (myRank, numProcs, argc, argv) { getFilenames (argc, argv) // path to directory in argv receive #keywords from Worker 0 allocate storage array for n documents by k keywords while (terminatedWorkers < numProcs - 1) { receive a msg from any worker if (msgType == initialCheckin) do nothing else if (msgType == gotResults) store result in storage array if (docsAssigned < n) assign nextDoc to this worker docsAssigned++ else terminate this worker terminatedWorkers++ }
CS 4410 – Parallel Computing 19 Chap 9 Worker Code doWorker (myRank, numProcs, argc, argv) { send (Manager, RequestForWork); // get started on handshake early if (worker0) ReadKeywordsFromFile Broadcast (keywords, worker0, all workers) if (worker0) send (Manager, numKeywords); send (Manager, RequestForWork); // get started on handshake early Build hash table for keywords while (!terminated) recv (Manager, Document) if (tag == terminate) terminated = true; else read Document analyze Document send (Manager, ResultsMsg);
CS 4410 – Parallel Computing 20 Chap 9 Enhancing the Code Idea #1 –Why didn’t we just divide the documents evenly between workers? Because we would likely have gotten a load imbalance –So, we went hard over the opposite direction, and handed out one document at a time What’s the problem with that? –We spend a lot of time doing communication –Is there a middle ground approach? Could we hand out groups of tasks to workers? Could we vary the size of the groups? Maybe give out large groups at first, and then reduce group size?
CS 4410 – Parallel Computing 21 Chap 9 Enhancing the Code For a particular problem, a certain groupsize may produce optimal results Documents Allocated per Request n/p Load imbalance 1 Excessive communication overhead Time
CS 4410 – Parallel Computing 22 Chap 9 Enhancing the Code Idea #2 –The Manager searches and finds all of the filenames before it ever sends out the first task –If the time seeking the filenames is at all a significant portion of the runtime, the problem will not scale well with increased processors Why? –Could we pipeline this task to reduce it’s overhead? –Also, if the finding filenames is a significant portion of the work, is it possible that having someone other than the Manager do the writing of the results to file?
CS 4410 – Parallel Computing 24 Chap 9 Enhancing the Code Pseudocode for Pipelined Manager a 0 {assigned jobs} j 0 {available jobs} w 0 {workers waiting for assignment} repeat if (j > 0) and (w > 0) then assign job to worker j j – 1; w w – 1; a a + 1 elseif (j > 0) then check for an incoming message from workers increment w if appropriate else get another job increment j endif until (a = n) and (w = p)
CS 4410 – Parallel Computing 25 Chap 9 Enhancing the Code One of the things we need to do in the previous algorithm is to check and see if there are any incoming messages –How do we go about this When the Manager hands out a task, he immediately posts a non-blocking MPI_Irecv –This gives him a handle for this recv –How can he know which handles have completed? MPI_Testsome solves this –Given a set of handles, it will return which have completed –MPI_Testsome (handleCount, handleArray, doneCount, doneArray, statusArray) doneArray will hold indices of handleArray that are done statusArray holds status of all handles
CS 4410 – Parallel Computing 26 Chap 9 Summary Manager/worker paradigm – useful when: –Dynamic number of tasks –Variable task lengths –No communications between tasks New tools –MPI_Comm_split – limit comm to particular processes –Non-blocking send/receive –Testing length of received message –Testing for completed communications