Presentation is loading. Please wait.

Presentation is loading. Please wait.

Application-specific Topology-aware Mapping for Three Dimensional Topologies Abhinav Bhatelé Laxmikant V. Kalé.

Similar presentations


Presentation on theme: "Application-specific Topology-aware Mapping for Three Dimensional Topologies Abhinav Bhatelé Laxmikant V. Kalé."— Presentation transcript:

1 Application-specific Topology-aware Mapping for Three Dimensional Topologies Abhinav Bhatelé Laxmikant V. Kalé

2 2 Outline Motivation The Mapping Problem Static Mapping: 3D Stencil Load Balancing: NAMD Future Work

3 3 The network latency for wormhole routing is (L f /B)*D + L/B Lf = Length of each flit, B = bandwidth D = number of hops, L = length of message Lionel M. Ni and Philip K. McKinley, “A Survey of Wormhole Routing Techniques in Direct Networks”, Computer, Volume 26, Issue 2, pages 62-76, 1993

4 4 Message Latencies NN = Near Neighbor, RND = Random

5 5 Hardware Latencies Blue Gene/L –Near neighbor: < 1 µs –Worst case: 7 µs Blue Gene/P –Near neighbor: < 1 µs –Worst case: 5 µs Corresponding differences for MPI messages

6 6 Topology-aware mapping Problem: Given a object communication graph and a processor graph, find an optimal mapping –Minimizes communication –Ensure load balance Metric for communication traffic –Hop-bytes = number of links (hops) traversed X message size

7 7 Machine Topology Information required at runtime –No. of processors in the allocated partition –No. of processors along each dimension –Physical coordinates of each processor

8 8

9 9 Communication Graph Static –3D Stencil: regular communication graph Dynamic –Molecular dynamics application –Changes as atoms migrate from one processor to another

10 10 Static Graph - 3D Stencil

11 11 Performance

12 12 Hop counts

13 13 Dynamic Graph - NAMD Molecular Dynamics (MD) application Simulation box is a 3D cell full of atoms

14 14

15 15 Load Balancing in NAMD Measurement-based (Charm++) –Principle of persistence Patches are statically mapped –Orthogonal recursive bisection Computes can be migrated Load balancing framework gathers the communication information Goal –Minimize communication –Maximize load balance

16 16

17 17 Old strategy Greedy approach Pick the heaviest compute Place it on a processor with one of the patches OR On a processor which already has a compute for this patch

18 18

19 19 Hop-bytes ~17 %

20 20 Future Work Reason for contention –Heavy communication exceeding bandwidth –Link contention (such as in deterministic routing) Use UPC/PAPI on Blue Gene/L and P

21 21 Future Work Automatic Mapping –Initial Static Mapping –Use case – meshing applications Extend work on the Charm++ load balancers –Section-multicast aware load balancers –Useful in matrix multiplication

22 22 Future Work Optimization on other topologies –SiCortex (Kautz Graph) –Infiniband clusters (Fat-tree)

23 23 Summary Topology mapping helps! –Especially heavily communication bound applications Static mapping Dynamic mapping during load balancing Automatic mapping to relieve the user


Download ppt "Application-specific Topology-aware Mapping for Three Dimensional Topologies Abhinav Bhatelé Laxmikant V. Kalé."

Similar presentations


Ads by Google