Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Linear Solver Using Hierarchical Matrix

Similar presentations


Presentation on theme: "Parallel Linear Solver Using Hierarchical Matrix"— Presentation transcript:

1 Parallel Linear Solver Using Hierarchical Matrix
C. Chen, L. Cambier, E. Darve, Stanford; S. Rajamanickam, R. Tuminaro, E. Boman, Sandia

2 [NY times, 2015, Liu, 2017, Aminfar, 2015, Takahashi, 2017]
Motivations (1/2) Increasingly large linear systems from various applications Ice Sheet Simulation Structural Mechanics cite Solid Mechanics Electro- magnetics [NY times, 2015, Liu, 2017, Aminfar, 2015, Takahashi, 2017]

3 Motivations (2/2) Heterogeneous machine architectures
NERSC Cori Computer cite NVIDIA GPU Intel KNL

4 Solve Ax = b Existing linear solvers
- sparse direct solvers (e.g., SuperLU) - iterative methods - preconditioners (e.g., ILU, multigrid) Hierarchical solvers/preconditioners - tunable accuracy - fast and robust - high arithmetic intensity - reduced communication and synchronization cite

5 Key tool: hierarchical matrices
[Hackbusch, Bebendorf, Borm, Gu, etc]: off-diagonal blocks of typical differential and integral operators are effectively low-rank Theoretically optimal complexity of operation counts and memory footprint Data-sparse representation - H, H2, HSS and HODLR cite [Aminfar, 2015]

6 Key idea: compressing fill-in
Exploit the low-rank structure of fill-in - singular value-based truncation - in contrast to level-based and threshold-based dropping rules in ILU or IChol - compression of fill-in in a hierarchical fashion cite [Aminfar, 2015]

7 Algorithm of our solver
Partition the sparse matrix (mesh or graph) Compress fill-in and create coarse vertices (red in fig.) Eliminate fine vertices (green in fig.) Recurse on the coarse system (level-1 fill-in) Partition of mesh grid/unknowns [Pouransari, 2017] Far-field (fill-in) compression

8 Algorithm illustration
(1) original partition (2) one coarse node (3) two coarse nodes The coarse level carries all information of the fill-in (up to approximation error). (6) coarse level (5) many coarse nodes (4) three coarse nodes

9 Parallel algorithm

10 Parallel algorithm: concurrency
Fill-in exists between two clusters with at most distance 2 Fill-in pattern: ILU(1) cite [Yang, 2017]

11 Parallel algorithm: data dependency
d1: boundary nodes; on critical path; requires coloring d2: lower priority than d1 d3: interior nodes; lowest priority; no communication needed

12 d1: relaxed distance-2 coloring
Four colors for regular grids P0 P1 P3 P2

13 Parallel algorithm Asynchronous algorithm:
d3 computation overlaps with d1, d2 communication. d1 nodes d2 nodes d3 nodes No communication

14 Communication & Computation
Local communication with neighbors at every level Batched dense linear algebra, good for many-core processors (e.g., GPU, MIC) Assume: problem size=N, # processors=p, rank=r and M = Nr2/p - computation: O(M) - communication: O(M2/3) - # messages: O( log(N/rp)+log(p) )

15 Comparison with SuperLU-Dist
16 cores on NERSC Edison

16 Matrices from UFL collections

17 Application: ice sheet modeling

18 Solve for ice sheet velocity
Larsen C ice shelf Extruded (thin) mesh in the vertical direction Neumann & Robin boundary conditions [NASA, 2017, Tuminaro, 2016]

19 Improved convergence Model problem on a cube: strong vertical coupling; anisotropic Preserve near-null space (piecewise constant) in the low-rank approximation [Yang, 2017]

20 Summary Hierarchical solver - data-sparse representation
- fast and robust Parallel algorithm - local communication - high computational intensity - reduced global synchronization Various applications - symmetric and nonsymmetric - positive definite or indefinite

21 Acknowledgements

22 References Liu, Yinjian, Jinyou Xiao, An Enhanced Hierarchical Solver for Preconditioning of Large-Scale Finite Element, working manuscript. Aminfar, AmirHossein, A fast and memory efficient sparse solver with applications to finite element matrices (PhD thesis), Stanford University, 2015. Takahashi, Toru, Pieter Coulier, and Eric Darve. "Application of the inverse fast multipole method as a preconditioner in a 3D Helmholtz boundary element method." Journal of Computational Physics 341 (2017): Pouransari, Hadi, Pieter Coulier, and Eric Darve. "Fast hierarchical solvers for sparse matrices using extended sparsification and low-rank approximation." SIAM Journal on Scientific Computing 39, no. 3 (2017): A797-A830. Yang, Kai, Hadi Pouransari, and Eric Darve. "Sparse hierarchical solvers with guaranteed convergence." arXiv preprint arXiv: (2016). Tuminaro, Raymond, Mauro Perego, Irina Tezaur, Andrew Salinger, and Stephen Price. "A matrix dependent/algebraic multigrid approach for extruded meshes with applications to ice sheet modeling." SIAM Journal on Scientific Computing 38, no. 5 (2016): C504-C532. Chen, Chao, Hadi Pouransari, Siva Rajamanickam, Erik G. Boman, and Erik Darve. "A distributed-memory hierarchical solver for general sparse linear systems." Journal of Parallel Computing, accepted subject to minor changes.


Download ppt "Parallel Linear Solver Using Hierarchical Matrix"

Similar presentations


Ads by Google