Download presentation
Presentation is loading. Please wait.
1
An Advanced Simulation & Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical Thermonuclear Flashes FLASH Tutorial May 13, 2004 Parallel Computing and MPI
2
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago What is Parallel Computing ? And why is it useful qParallel Computing is more than one cpu working together on one problem qIt is useful when q Large problem, could take very long q Data size too big to fit in the memory of one processor qWhen to parallelize q Problem could be subdivided into relatively independent tasks qHow much to parallelize q While the speedup in computation relative to single processor is of the order of number of processors
3
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Parallel paradigms qSIMD – Single instruction multiple data q Processors work in lock-step qMIMD – Multiple instruction multiple data q Processors do their own thing with occasional synchronization qShared Memory q One way communications qDistributed Memory q Message passing qLoosely Coupled q When the process on each cpu is fairly self contained and relatively independent of processes on other cpu’s qTightly Coupled q When cpu’s need to communicate with each other frequently
4
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago How to Parallelize qDivide a problem into a set of mostly independent tasks q Partitioning a problem qTasks get their own data q Localize a task qThey operate on their own data for the most part q Try to make it self contained qOccasionally q Data may be needed from other tasks q Inter-process communication q Synchronization may be required between tasks q Global operation qMap tasks to different processors q One processor may get more than one task q Task distribution should be well balanced
5
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago New Code Components qInitialization qQuery parallel state q Identify process q Identify number of processes qExchange data between processes q Local, Global qSynchronization q Barriers, Blocking Communication, Locks qFinalization
6
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago MPI qMessage Passing Interface, standard for distributed memory model of parallelism qMPI-2 will support one-way communication, commonly associated with shared memory operations qWorks with communicators; a collection of processors q MPI_COMM_WORLD default qHas support for lowest level communication operations and composite operations qHas blocking and non-blocking operations
7
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Communicators COMM1 COMM2
8
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Low level Operations in MPI qMPI_Init qMPI_Comm_size q Find number of processors qMPI_Comm_rank q Find my processor number qMPI_Send/Recv q Communicate with other processors one at a time qMPI_Bcast q Global data transmission qMPI_Barrier q Synchronization qMPI_Finalize
9
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Advanced Constructs in MPI qComposite Operations q Gather/Scatter q Allreduce q Alltoall qCartesian grid operations q Shift qCommunicators q Creating subgroups of processors to operate on qUser-defined Datatypes qI/O q Parallel file operations
10
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Communication Patterns 10 32 Collective 0123 Shift 10 2 All to All 10 32 Point to Point 10 32 One to All Broadcast
11
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Communication Overheads qLatency vs. Bandwidth qBlocking vs. Non-Blocking q Overlap q Buffering and copy qScale of communication q Nearest neighbor q Short range q Long range qVolume of data q Resource contention for links qEfficiency q Hardware, software, communication method
12
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Parallelism in FLASH qShort range communications q Nearest neighbor qLong range communications q Regridding qOther global operations q All-reduce operations on physical quantities q Specific to solvers q multi-pole method q FFT based solvers
13
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Domain Decomposition P0 P1 P2P3
14
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Border Cells / Ghost Points qWhen splitting up solnData, need data from other processors. qNeed a layer of cells from each processor qNeed to update each time step
15
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Border/Ghost Cells Short Range communication
16
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Two MPI Methods for doing it qMPI_Cart_create q Create topology qMPE_Decomp1d q Domain decomp on topology qMPI_Cart_shift q Who’s on the left/right? qMPI_SendRecv q Ghost cells left qMPI_SendRecv q Ghost cells right qMPI_Comm_rank qMPI_Comm_size qManually decompose grid over processors qCalculate left/right qMPI_Send/MPI_Recv q Carefully to avoid deadlocks
17
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Adaptive Grid Issues qDiscretization not uniform qSimple left-right guard cell fills inadequate qAdjacent grid points may not be mapped to the nearest neighbors in processors topology qRedistribution of work necessary
18
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Regridding qChange in number of cells/blocks qSome processors get more work than others qLoad imbalance qRedistribute data to even out work on all processors qLong range communications qLarge quantities of data moved
19
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Regridding
20
The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Other parallel operations in FLASH qGlobal max/sum etc (Allreduce) q Physical quantities q In solvers q Performance monitoring qAlltoall q FFT based solver on UG qUser defined datatypes and file operations q Parallel I/O
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.