Download presentation
Presentation is loading. Please wait.
Published byRodger Oliver Modified over 6 years ago
1
A computational loop k k Integration Newton Iteration
Linear system solvers k k t
2
SPIKE: A Parallel Banded System Solver – an introduction
after RCM reordering Large sparse linear systems arise often in various computational science & engineering applications. Banded, or low-rank perturbations of banded, systems (dense or sparse within the band) are sometimes obtained after reordering. SPIKE is proposed as a parallel solver for banded systems with the potential of exhibiting multilevel parallelism
3
SPIKE design principles
Reducing memory references and interprocessor communication at the cost of extra arithm. operations compared to LAPACK. Allowing multiple levels of parallelism. Creating a polyalgorithm – versions vary from direct to preconditioned iterative schemes.
4
Ax = f Next Generation Sparse Solvers: The SPIKE Algorithm A = D S
B1 C2 C3 C4 B2 B3 x1 x4 x3 x2 f1 f4 f3 f2 = Ax = f Solve Dy = f Solve Sx = y A = D S D = diag (A1, A2, A3, A4)
5
The Spike Matrix “S” Reduced System . . . = Sx = y := m m
. = Sx = y The Spike Matrix “S” := m m := m m Reduced System I o Order: 2m (p-1) =
6
SPIKE: A Polyalgorithm
Different choices depending on the properties of the matrix and platform architecture (towards an adaptive library) The diagonal blocks can be solved: Directly (LU, Cholesky, or sparse counterparts) Iteratively (with a preconditioning strategy) The spikes can be computed: Explicitly (fully or partially) Approximately On the Fly The reduced system can be solved: Directly (Recursive SPIKE) Approximately (Truncated SPIKE) Iteratively (with a preconditioning scheme)
7
SPIKE vs ScaLapack ScaLapack SPIKE U L A1 A2 A3 A4 I V W S
AX=F and A=L*U Reduced system V1 V2 V3 W2 W3 W4 AX=F and A=D*S A1 A2 A3 A4 C2 C3 C4 B1 B2 B3 Retrieve solution Spike matrix SPIKE Algorithm design: no LU factorization, no reordering, no Schur complement. New banded primitives using BLAS-3 Polyalgorithm implementation
8
Multilevel Parallelism: SPIKE calling MKL-Pardiso for banded systems that are sparse within the band
Node 1 Node 2 Node 3 Node 4 Pardiso SPIKE SPIKE uses Pardiso on each cluster node.
9
SPIKE Options (dense within the band)
Solving the reduced system R = recursive E = explicit F = on-the-fly T = truncated 2. Factorization (diagonal blocks) No pivoting (diagonal boosting, if necessary): L = LU U = LU & UL A = alternate LU or UL Pivoting: P = LU 3. Solution improvement: 0 direct solver only 2 iterative refinement 3 outer Bicgstab iterations
10
Hierarchy of Computational Modules
The SPIKE algorithm Hierarchy of Computational Modules Level Description 3 SPIKE 2 Lapack Pardiso, SuperLU, MUMPS Iterative solvers 1 Primitives for banded matrices (our own): banded triangular solve banded UL BLAS3 (dense matrix-matrix primitives) Sparse BLAS
11
SPIKE algorithms Algorithm E Explicit R Recursive T Truncated F
Factorization E Explicit R Recursive T Truncated F on the Fly P LU w/ pivoting Explicit generation of spikes- reduced system is solved iteratively with a preconditioner. EP Explicit generation of spikes- reduced system is solved directly using recursive SPIKE RP Implicit generation of reduced system which is solved on-the-fly using an iterative method. FP L LU w/o pivoting EL Explicit generation of spikes- reduce system is solved directly using recursive SPIKE RL Truncated generation of spike tips: Vb is exact, Wt is approx.- reduced system is solved directly TL FL U LU and UL w/o pivot. Truncated generation of spike tips: Vb, Wt are exact- reduced system is solved directly TU Implicit generation of reduced system which is solved on-the-fly using an iterative method with precond. FU A alternate LU / UL Explicit generation of spikes using new partitioning- reduced system is solved iteratively with a preconditioner. EA Truncated generation of spikes using new partitioning- reduced system is solved directly TA
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.