Download presentation
Presentation is loading. Please wait.
Published byKathleen Cunliffe Modified over 10 years ago
1
Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens Computing Systems Laboratory {ndros,nkoziris}@cslab.ece.ntua.gr www.cslab.ece.ntua.gr
2
Oslo, June 15, 2005ICPP-HPSEC 20052 Motivation fully permutable loops always a computational challenge for HPC hybrid parallelization attractive for DSM architectures currently, popular free message passing libraries provide limited multi-threading support SPMD hybrid parallelization suffers from intrinsic load imbalance
3
Oslo, June 15, 2005ICPP-HPSEC 20053 Contribution two static thread load balancing schemes (constant-variable) for coarse-grain funneled hybrid parallelization of fully permutable loops generic simple to implement experimental evaluation against micro-kernel benchmarks of different programming models message passing fine-grain hybrid coarse-grain hybrid (unbalanced, balanced)
4
Oslo, June 15, 2005ICPP-HPSEC 20054 Algorithmic model foracross tile 1 do … foracross tile N do for tile n-1 do Receive(tile); Compute(A,tile); Send(tile); Restrictions: fully permutable loops unitary inter-process dependencies
5
Oslo, June 15, 2005ICPP-HPSEC 20055 Message passing parallelization tiling transformation (overlapped?) computation and communication phases pipelined execution portable scalable highly optimized
6
Oslo, June 15, 2005ICPP-HPSEC 20056 Hybrid parallelization So… why bother?
7
Oslo, June 15, 2005ICPP-HPSEC 20057 Hybrid parallelization: why bother I shared memory programming model vs message passing programming model for shared memory architecture
8
Oslo, June 15, 2005ICPP-HPSEC 20058 Hybrid parallelization: why bother II DSM architectures are popular!
9
Oslo, June 15, 2005ICPP-HPSEC 20059 Fine-grain hybrid parallelization incremental parallelization of loops relatively easy to implement popular Amdahl’s law restricts parallel efficiency overhead of thread structures re-initialization restrictive programming model for many applications
10
Oslo, June 15, 2005ICPP-HPSEC 200510 Coarse-grain hybrid parallelization generic SPMD programming style good parallelization efficiency no thread re-initialization overhead more difficult to implement intrinsic load imbalance assuming common funneled thread support level
11
Oslo, June 15, 2005ICPP-HPSEC 200511 MPI thread support levels single masteronly funneled serialized multiple fine-grain hybrid coarse-grain hybrid comm comp comm … comp …
12
Oslo, June 15, 2005ICPP-HPSEC 200512 Load balancing Idea Consequence master thread assumes a smaller fraction of the process tile computational load compared to other threads
13
Oslo, June 15, 2005ICPP-HPSEC 200513 Load balancing (2) T………total number of threads p………current process id Assuming It follows
14
Oslo, June 15, 2005ICPP-HPSEC 200514 Load balancing (3)
15
Oslo, June 15, 2005ICPP-HPSEC 200515 Experimental Results 8-node dual SMP Linux Cluster (800 MHz PIII, 256 MB RAM, kernel 2.4.26) MPICH v.1.2.6 ( --with-device=ch_p4, --with-comm=shared, P4_SOCKBUFSIZE=104KB ) Intel C++ compiler 8.1 ( -O3 -static -mcpu=pentiumpro ) FastEthernet interconnection network
16
Oslo, June 15, 2005ICPP-HPSEC 200516 Alternating Direction Implicit (ADI) Stencil computation used for solving partial differential equations Unitary data dependencies 3D iteration space (X x Y x Z)
17
Oslo, June 15, 2005ICPP-HPSEC 200517 ADI
18
Oslo, June 15, 2005ICPP-HPSEC 200518 Synthetic benchmark
19
Oslo, June 15, 2005ICPP-HPSEC 200519 Conclusions fine-grain hybrid parallelization inefficient unbalanced coarse-grain hybrid parallelization also inefficient balancing improves hybrid model performance variable balanced coarse-grain hybrid model most efficient approach overall relative performance improvement increases for higher communication vs computation needs
20
Oslo, June 15, 2005ICPP-HPSEC 200520 Thank You! Questions?
21
Oslo, June 15, 2005ICPP-HPSEC 200521 ADI
22
Oslo, June 15, 2005ICPP-HPSEC 200522 Synthetic benchmark
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.