Download presentation
Presentation is loading. Please wait.
Published byNoreen Heath Modified over 9 years ago
1
Application-specific Topology-aware Mapping for Three Dimensional Topologies Abhinav Bhatelé Laxmikant V. Kalé
2
2 Outline Motivation The Mapping Problem Static Mapping: 3D Stencil Load Balancing: NAMD Future Work
3
3 The network latency for wormhole routing is (L f /B)*D + L/B Lf = Length of each flit, B = bandwidth D = number of hops, L = length of message Lionel M. Ni and Philip K. McKinley, “A Survey of Wormhole Routing Techniques in Direct Networks”, Computer, Volume 26, Issue 2, pages 62-76, 1993
4
4 Message Latencies NN = Near Neighbor, RND = Random
5
5 Hardware Latencies Blue Gene/L –Near neighbor: < 1 µs –Worst case: 7 µs Blue Gene/P –Near neighbor: < 1 µs –Worst case: 5 µs Corresponding differences for MPI messages
6
6 Topology-aware mapping Problem: Given a object communication graph and a processor graph, find an optimal mapping –Minimizes communication –Ensure load balance Metric for communication traffic –Hop-bytes = number of links (hops) traversed X message size
7
7 Machine Topology Information required at runtime –No. of processors in the allocated partition –No. of processors along each dimension –Physical coordinates of each processor
8
8
9
9 Communication Graph Static –3D Stencil: regular communication graph Dynamic –Molecular dynamics application –Changes as atoms migrate from one processor to another
10
10 Static Graph - 3D Stencil
11
11 Performance
12
12 Hop counts
13
13 Dynamic Graph - NAMD Molecular Dynamics (MD) application Simulation box is a 3D cell full of atoms
14
14
15
15 Load Balancing in NAMD Measurement-based (Charm++) –Principle of persistence Patches are statically mapped –Orthogonal recursive bisection Computes can be migrated Load balancing framework gathers the communication information Goal –Minimize communication –Maximize load balance
16
16
17
17 Old strategy Greedy approach Pick the heaviest compute Place it on a processor with one of the patches OR On a processor which already has a compute for this patch
18
18
19
19 Hop-bytes ~17 %
20
20 Future Work Reason for contention –Heavy communication exceeding bandwidth –Link contention (such as in deterministic routing) Use UPC/PAPI on Blue Gene/L and P
21
21 Future Work Automatic Mapping –Initial Static Mapping –Use case – meshing applications Extend work on the Charm++ load balancers –Section-multicast aware load balancers –Useful in matrix multiplication
22
22 Future Work Optimization on other topologies –SiCortex (Kautz Graph) –Infiniband clusters (Fat-tree)
23
23 Summary Topology mapping helps! –Especially heavily communication bound applications Static mapping Dynamic mapping during load balancing Automatic mapping to relieve the user
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.