1 1 Zoltan: Toolkit of parallel combinatorial algorithms for unstructured, dynamic and/or adaptive computations Unstructured Communication Tools -Communication “plans” describe complex interprocessor patterns -Data migration -Mapping between different partitions Dynamic Load Balancing: -Geometric (coordinate-based): fast, maintains geometric locality -Topological (graph/hypergraph-based): explicitly model application communication costs -Interfaces to ParMETIS, Scotch, PaToH Distributed Data Directories -Parallel look-ups of off-processor data -Scalable (O(N)) total memory usage ABC 010 DEF 210 GHI 121 Parallel Graph Coloring -Finds disjoint sets of vertices, identifying independence -Distance 1 -Distance 2
2 2 Zoltan2: Next generation toolkit targeting needs of applications on emerging architectures Multijagged (MJ) Geometric (Coordinate) Partitioning MPI+OpenMP implementation Multisection results in less data movement, greater scalability during partitioning than RCB Ex: Used in Trilinos’ MueLu multigrid solver on 524K cores On-node Balanced Graph Coloring Finds disjoint sets of vertices for parallelism in multicore execution Each label has roughly equal number of vertices Balanced coloring reduces idle cores in GPUs 16-part 4x4 MJ partition Tasks Allocated Nodes in Torus Network Architecture-aware Geometric Task Mapping Map MPI tasks to cores to keep congestion and communication costs low Uses MJ to assign interdependent tasks to “nearby” cores Reduced MiniGhost execution time on 64K cores of Cielo by 34% relative to default; by 24% relative to custom 2x2x4 grouping XX 2XXXXXX 3XXX 4XXXX 5XX 6XXX
3 3 Further information: Download via the Trilinos toolkit: ZoltanZoltan2 Parallelism:MPI-onlyMPI+X API:Application builds model (e.g., graph, hypergraph) for algorithm Application describes its data (matrix, mesh, coordinates); algorithm builds model Capabilities:Parallel partitioning Parallel coloring Global and local ordering Parallel partitioning Architecture-aware task placement On-node balanced coloring On-node ordering Optional TPLs:Scotch (INRIA/Bordeaux) ParMETIS (U. Minnesota) PaToH (Ohio St. U.) Scotch (INRIA/U.Bordeaux) ParMETIS (U.Minnesota) ParMA Partition Improvement (RPI) AMD (U.Florida) LDMS: Lightweight Distributed Metric Service (Sandia) Maturity:Highly mature; maintenance onlyResearch platform for emerging archs Integration:No dependence on TrilinosIntegrated with Trilinos next-generation software stack Language:C (with F90 and C++ APIs)Templated C++11 Distribution:Stand-alone or in TrilinosIn Trilinos