Download presentation
Presentation is loading. Please wait.
Published byShannon Andrews Modified over 9 years ago
1
Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens Computing Systems Laboratory {ndros,nkoziris}@cslab.ece.ntua.gr www.cslab.ece.ntua.gr
2
April 27, 2004IPDPS 20042 Overview Introduction Pure Message-passing Model Hybrid Models Hyperplane Scheduling Fine-grain Model Coarse-grain Model Experimental Results Conclusions – Future Work
3
April 27, 2004IPDPS 20043 Motivation Active research interest in SMP clusters Hybrid programming models However: Mostly fine-grain hybrid paradigms (masteronly model) Mostly DOALL multi-threaded parallelization
4
April 27, 2004IPDPS 20044 Contribution Comparison of 3 programming models for the parallelization of tiled loops algorithms pure message-passing fine-grain hybrid coarse-grain hybrid Advanced hyperplane scheduling minimize synchronization need overlap computation with communication preserves data dependencies
5
April 27, 2004IPDPS 20045 Algorithmic Model Tiled nested loops with constant flow data dependencies FORACROSS tile 0 DO … FORACROSS tile n-2 DO FOR tile n-1 DO Receive(tile); Compute(tile); Send(tile); END FOR END FORACROSS … END FORACROSS
6
April 27, 2004IPDPS 20046 Target Architecture SMP clusters
7
April 27, 2004IPDPS 20047 Overview Introduction Pure Message-passing Model Hybrid Models Hyperplane Scheduling Fine-grain Model Coarse-grain Model Experimental Results Conclusions – Future Work
8
April 27, 2004IPDPS 20048 Pure Message-passing Model tile 0 = pr 0 ; … tile n-2 = pr n-2 ; FOR tile n-1 = 0 TO DO Pack(snd_buf, tile n-1 – 1, pr); MPI_Isend(snd_buf, dest(pr)); MPI_Irecv(recv_buf, src(pr)); Compute(tile); MPI_Waitall; Unpack(recv_buf, tile n-1 + 1, pr); END FOR
9
April 27, 2004IPDPS 20049 Pure Message-passing Model
10
April 27, 2004IPDPS 200410 Overview Introduction Pure Message-passing Model Hybrid Models Hyperplane Scheduling Fine-grain Model Coarse-grain Model Experimental Results Conclusions – Future Work
11
April 27, 2004IPDPS 200411 Hyperplane Scheduling Implements coarse-grain parallelism assuming inter-tile data dependencies Tiles are organized into data-independent subsets (groups) Tiles of the same group can be concurrently executed by multiple threads Barrier synchronization between threads
12
April 27, 2004IPDPS 200412 Hyperplane Scheduling tile ( mpi_rank, omp_tid, tile ) group
13
April 27, 2004IPDPS 200413 Hyperplane Scheduling #pragma omp parallel { group 0 = pr 0 ; … group n-2 = pr n-2 ; tile 0 = pr 0 * m 0 + th 0 ; … tile n-2 = pr n-2 * m n-2 + th n-2 ; FOR(group n-1 ){ tile n-1 = group n-1 - ; if(0 <= tile n-1 <= ) compute(tile); #pragma omp barrier }
14
April 27, 2004IPDPS 200414 Overview Introduction Pure Message-passing Model Hybrid Models Hyperplane Scheduling Fine-grain Model Coarse-grain Model Experimental Results Conclusions – Future Work
15
April 27, 2004IPDPS 200415 Fine-grain Model Incremental parallelization of computationally intensive parts Pure MPI + hyperplane scheduling Inter-node communication outside of multi- threaded part ( MPI_THREAD_MASTERONLY ) Thread synchronization through implicit barrier of omp parallel directive
16
April 27, 2004IPDPS 200416 Fine-grain Model FOR(group n-1 ){ Pack(snd_buf, tile n-1 – 1, pr); MPI_Isend(snd_buf, dest(pr)); MPI_Irecv(recv_buf, src(pr)); #pragma omp parallel { thread_id=omp_get_thread_num(); if(valid(tile,thread_id,group n-1 )) Compute(tile); } MPI_Waitall; Unpack(recv_buf, tile n-1 + 1, pr); }
17
April 27, 2004IPDPS 200417 Overview Introduction Pure Message-passing Model Hybrid Models Hyperplane Scheduling Fine-grain Model Coarse-grain Model Experimental Results Conclusions – Future Work
18
April 27, 2004IPDPS 200418 Coarse-grain Model Threads are only initialized once SPMD paradigm (requires more programming effort) Inter-node communication inside multi- threaded part (requires MPI_THREAD_FUNNELED ) Thread synchronization through explicit barrier ( omp barrier directive)
19
April 27, 2004IPDPS 200419 Coarse-grain Model #pragma omp parallel { thread_id=omp_get_thread_num(); FOR(group n-1 ){ #pragma omp master{ Pack(snd_buf, tile n-1 – 1, pr); MPI_Isend(snd_buf, dest(pr)); MPI_Irecv(recv_buf, src(pr)); } if(valid(tile,thread_id,group n-1 )) Compute(tile); #pragma omp master{ MPI_Waitall; Unpack(recv_buf, tile n-1 + 1, pr); } #pragma omp barrier }
20
April 27, 2004IPDPS 200420 Overview Introduction Pure Message-passing Model Hybrid Models Hyperplane Scheduling Fine-grain Model Coarse-grain Model Experimental Results Conclusions – Future Work
21
April 27, 2004IPDPS 200421 Experimental Results 8-node SMP Linux Cluster (800 MHz PIII, 128 MB RAM, kernel 2.4.20) MPICH v.1.2.5 ( --with-device=ch_p4, --with-comm=shared ) Intel C++ compiler 7.0 ( -O3 -mcpu=pentiumpro -static ) FastEthernet interconnection ADI micro-kernel benchmark (3D)
22
April 27, 2004IPDPS 200422 Alternating Direction Implicit (ADI) Stencil computation used for solving partial differential equations Unitary data dependencies 3D iteration space (X x Y x Z)
23
April 27, 2004IPDPS 200423 ADI – 2 dual SMP nodes
24
April 27, 2004IPDPS 200424 ADI X=128 Y=512 Z=8192 – 2 nodes
25
April 27, 2004IPDPS 200425 ADI X=256 Y=512 Z=8192 – 2 nodes
26
April 27, 2004IPDPS 200426 ADI X=512 Y=512 Z=8192 – 2 nodes
27
April 27, 2004IPDPS 200427 ADI X=512 Y=256 Z=8192 – 2 nodes
28
April 27, 2004IPDPS 200428 ADI X=512 Y=128 Z=8192 – 2 nodes
29
April 27, 2004IPDPS 200429 ADI X=128 Y=512 Z=8192 – 2 nodes Computation Communication
30
April 27, 2004IPDPS 200430 ADI X=512 Y=128 Z=8192 – 2 nodes Computation Communication
31
April 27, 2004IPDPS 200431 Overview Introduction Pure Message-passing Model Hybrid Models Hyperplane Scheduling Fine-grain Model Coarse-grain Model Experimental Results Conclusions – Future Work
32
April 27, 2004IPDPS 200432 Conclusions Tiled loop algorithms with arbitrary data dependencies can be adapted to the hybrid parallel programming paradigm Hybrid models can be competitive to the pure message-passing paradigm Coarse-grain hybrid model can be more efficient than fine-grain one, but also more complicated Programming efficiently in OpenMP not easier than programming efficiently in MPI
33
April 27, 2004IPDPS 200433 Future Work Application of methodology to real applications and standard benchmarks Work balancing for coarse-grain model Investigation of alternative topologies, irregular communication patterns Performance evaluation on advanced interconnection networks (SCI, Myrinet)
34
April 27, 2004IPDPS 200434 Thank You! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.