Presentation is loading. Please wait.

Presentation is loading. Please wait.

A computational loop k k Integration Newton Iteration

Similar presentations

Presentation on theme: "A computational loop k k Integration Newton Iteration"— Presentation transcript:

1 A computational loop k k Integration Newton Iteration
Linear system solvers k k t

2 SPIKE: A Parallel Banded System Solver – an introduction
after RCM reordering Large sparse linear systems arise often in various computational science & engineering applications. Banded, or low-rank perturbations of banded, systems (dense or sparse within the band) are sometimes obtained after reordering. SPIKE is proposed as a parallel solver for banded systems with the potential of exhibiting  multilevel parallelism

3 SPIKE design principles
Reducing memory references and interprocessor communication at the cost of extra arithm. operations compared to LAPACK. Allowing multiple levels of parallelism. Creating a polyalgorithm – versions vary from direct to preconditioned iterative schemes.

4 Ax = f SPIKE: A = D  S D = diag (A1, A2, A3, A4) An Introduction
B1 C2 C3 C4 B2 B3 x1 x4 x3 x2 f1 f4 f3 f2 = SPIKE: An Introduction A = D  S D = diag (A1, A2, A3, A4) Solve Dy = f Solve Sx = y

5 The Spike Matrix “S” Reduced System . . .    =  Sx = y  := m  m
. =  Sx = y The Spike Matrix “S”  := m  m  := m  m Reduced System I o Order: 2m (p-1) =

6 SPIKE: A Polyalgorithm
Different choices depending on the properties of the matrix and platform architecture (towards an adaptive library) The diagonal blocks can be solved: Directly (LU, Cholesky, or sparse counterparts) Iteratively (with a preconditioning strategy) The spikes can be computed: Explicitly (fully or partially) Approximately On the Fly The reduced system can be solved: Directly (Recursive SPIKE) Approximately (Truncated SPIKE) Iteratively (with a preconditioning scheme)

7 SPIKE vs ScaLapack ScaLapack SPIKE U L A1 A2 A3 A4 I V W S
AX=F and A=L*U Reduced system V1 V2 V3 W2 W3 W4 AX=F and A=D*S A1 A2 A3 A4 C2 C3 C4 B1 B2 B3 Retrieve solution Spike matrix SPIKE Algorithm design: no LU factorization, no reordering, no Schur complement. New banded primitives using BLAS-3 Polyalgorithm implementation

8 Multilevel Parallelism: SPIKE calling MKL-Pardiso for banded systems that are sparse within the band
Node 1 Node 2 Node 3 Node 4 Pardiso SPIKE SPIKE uses Pardiso on each cluster node.

9 SPIKE Options (dense within the band)
Solving the reduced system R = recursive E = explicit F = on-the-fly T = truncated 2. Factorization (diagonal blocks) No pivoting (diagonal boosting, if necessary): L = LU U = LU & UL A = alternate LU or UL Pivoting: P = LU 3. Solution improvement: 0 direct solver only 2 iterative refinement 3 outer Bicgstab iterations

10 Hierarchy of Computational Modules
The SPIKE algorithm Hierarchy of Computational Modules Level Description 3 SPIKE 2 Lapack Pardiso, SuperLU, MUMPS Iterative solvers 1 Primitives for banded matrices (our own): banded triangular solve banded UL BLAS3 (dense matrix-matrix primitives) Sparse BLAS

11 SPIKE algorithms Algorithm E Explicit R Recursive T Truncated F
Factorization E Explicit R Recursive T Truncated F on the Fly P LU w/ pivoting Explicit generation of spikes- reduced system is solved iteratively with a preconditioner. EP Explicit generation of spikes- reduced system is solved directly using recursive SPIKE RP Implicit generation of reduced system which is solved on-the-fly using an iterative method. FP L LU w/o pivoting EL Explicit generation of spikes- reduce system is solved directly using recursive SPIKE RL Truncated generation of spike tips: Vb is exact, Wt is approx.- reduced system is solved directly TL FL U LU and UL w/o pivot. Truncated generation of spike tips: Vb, Wt are exact- reduced system is solved directly TU Implicit generation of reduced system which is solved on-the-fly using an iterative method with precond. FU A alternate LU / UL Explicit generation of spikes using new partitioning- reduced system is solved iteratively with a preconditioner. EA Truncated generation of spikes using new partitioning- reduced system is solved directly TA

Download ppt "A computational loop k k Integration Newton Iteration"

Similar presentations

Ads by Google