Mapping Techniques for Load Balancing
Overheads Two sources Time spent in inter-process interaction Time of being idle A good mapping must ensure that computations and interactions among processes at each stage of the execution of the parallel algorithms are well balance
Mapping Techniques Static Mapping distribute the tasks among processes prior to the execution of the algorithm Dynamic mapping distribute the work among processes during the execution of the algorithm Dynamic mapping apply to if tasks are generated dynamically if task size unknown entail but if the amount of data associated with tasks is large
Parallel Algorithm Models Data-Parallel Model Work Pool Model Master-Slave Model Pipeline or Producer-Consumer Model Hybrid Model
Communication Model of Parallel Platforms Two primary form of data exchange between parallel tasks Accessing a shared data space Exchanging message Shared-Address-Space Platforms view of a parallel platform supports a common data space that is accessible to all processors processors interact by modifying data objects stored in it
Message-Passing Platform (MPP) Logic machine view of MPP consists p processing nodes either a single processor or shared-address-space multi-processors each with its own exclusive address space e.g. cluster workstations Messages data and work synchronize
MP Paradigm Four basic operations: MP APIs: MP Paradigms support execution of a different program on each nodes Four basic operations: Interactions: send and receive ID for each processes. Using function whoami numprocs which specify the number of processes MP APIs: MPI (Message Passing Interface) PVM (Parallel Virtual Machine)
Explicit Parallel Programming Numerous programming and libraries have been developed Difference in their view of address space degree of synchronization multiplicity of programs MPI is wide-spread adoption due to the fact that it impose minimal requirements on hardware
Principles of Message-Passing Programming Two key attributes of MPP It assumes a partitioned address space each data element must belong to one of the partitions of the space, i.e. data must explicitly partitioned and placed all interactions requires cooperation of two processes for dynamic and/or unstructured interactions the complexity of code written is very high It suppose only explicit parallelization decompose computations and extract concurrency
Structure of MP Programs(1) Asynchronous all concurrent tasks execute asynchronously harder to reason about; can have non-deterministic behavior due to race conditions Loosely synchronous tasks or subtasks synchronize to perform interactions between these interactions, tasks execute asynchronously easy to reason about
Structure of MP Programs (2) MP Paradigm supports execution of a different program on each of the p processes but makes the job of writing parallel program s effectively unscalable Most use single program multiple data(SPMD) code executed by different processes is identical except for a small processes (e.g. the ‘root’ process) SPMD can be loosely synchronous or asynchronous
Building Blocks Send and Receive Operations (1) send (void *sendbuf, int nelems, int dest) receive(void *recvbuf,int nelems,int source) sendbuf points to a buffer that store data to be sent recvbuf points to a buffer that store data to be received nelems is the number of data units dest is the identifier that the process receives data source is the identifier of the process that sends data
Send and Receive Operations (2) P0 P1 a=100; receive (&a,1,0); send(&a,1,1); printf(“%d\n”,a); a=0; >>>>> What p1 prints out?