Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Scalable Parallel Preconditioned Sparse Linear System Solver Murat ManguoğluMiddle East Technical University, Turkey Joint work with: Ahmed Sameh Purdue.

Similar presentations


Presentation on theme: "A Scalable Parallel Preconditioned Sparse Linear System Solver Murat ManguoğluMiddle East Technical University, Turkey Joint work with: Ahmed Sameh Purdue."— Presentation transcript:

1 A Scalable Parallel Preconditioned Sparse Linear System Solver Murat ManguoğluMiddle East Technical University, Turkey Joint work with: Ahmed Sameh Purdue University, USA Madan Sathe University of Basel, Switzerland Olaf Schenk University of Basel, Switzerland Precond11, May 16-18 Bordeaux, France

2 Design principles Reducing memory references and interprocessor communications at the cost of extra arithmetic operations. Allowing multiple levels of parallelism. Creating a polyalgorithm – versions vary depending on architectural characteristics of the underlying computing platform.

3 Outline 1. Solve (on parallel computers): Ax = f ; where A is large and sparse 2. Reorder a coefficient matrix A such that large elements are clustered around the main diagonal

4 PART I Parallel Solution of Sparse Linear Systems Ax = f

5 Target Computational Loop Integration Newton iteration Linear system solvers kk kk tt Most time consuming part in large simulations

6 Solving Ax=f Solve Mz = r using DS fact. (M is “banded”) BiCGstab (or any other Krylov subspace method) Step 1: weighted matrix reordering (symmetric/nonsymmetric) Step 2: extract a “banded” preconditioner,M, and solve the system: Manguoglu M, Koyuturk M, Sameh A, Grama A, SIAM Journal on Scientific Computing, 32(3), pp.1201- 1216, 2010

7 reordering with weights and extracting the preconditioner A1A1 A2A2 A3A3 A4A4 C2C2 C3C3 C4C4 B1B1 B2B2 B3B3 PSPIKE Manguoglu M, Sameh A, and Schenk O, PSPIKE: A Parallel Hybrid Sparse Linear System Solver, Lecture Notes in Computer Science(proceedings of EuroPar09), Volume 5704, pp.797-808, 2009

8 Parallel solution of banded linear systems

9

10 Robustness of the narrow banded preconditioner 17 linear systems from UF sparse matrix collection. N: ranging from 7,000 to 2 M. NNZ: ranging from 42,000 to 20 M. Stopping criterion: relres ≤ 10-5 10

11

12 Robustness 12

13 g3_circuit: ~1.5M unknowns, speed improvement compared to Pardiso on one core (75.3 seconds) Platform: a cluster where each node has two Intel Xeon (X5560@2.8GHz) / 8 cores per node Stopping tolarence: ||f-Ax|| ∞ /||f|| ∞ ≤ 10 -6

14 Sparsity structure of The stiffness and mass matrices: N ~ 1.5 Million Pardiso version 4.1.1; 1.5 M problem Intel Westmere processors, 2.67 GHz, 4 sockets w/ 8 cores each http://www.pardiso-project.org/

15 2 nd approach for solving Ax=f Solve Mz = r using DS factorization BiCGstab (or any other Krylov subspace method) Step 1: extract a “sparse” preconditioner,M, and solve the system:

16

17

18 torso3: ~260K unknowns, speed improvement compared to Pardiso on one core (49.4 seconds)

19 PART II Parallel Spectral Matrix Reordering (Graph Partitioning)

20 Eigenvalues of the Laplacian of a Graph Case 1: – A is a symmetric matrix of order n – B = A – The weighted Laplacian matrix L is given by: L(i,i) = ∑ |B(i,k)| ; for k = 1,2,…,n; k ≠ i L(i,j) = - |B(i,j)| ; for i ≠ j Case 2: – A is nonsymmetric – B= (|A|+|A T |)/2 – L is obtained as in Case 1.

21 The Fiedler vector Obtain the eigenvector of the second smallest eigenvalue of: L x = λ x ; λ:= {0 = λ 1 ≤ λ 2 ≤ ……≤ λ n } Minimize σ = ∑ |B(i,j)| (x(i) – x(j)) 2 for all i, j for which B(i,j) ≠ 0, such that: e T x = 0, and x T x = 1. In other words: σ = x T Lx Miroslav Fiedler, Algebraic connectivity of graphs(English), Czechoslovak Mathematical Journal, vol. 23 (1973), issue 2, pp. 298-305

22 Trace Minimization (TraceMin) A x = λ B x – A :=symmetric; B:= symmetric positive definite Minimize tr(Y T A Y) s.t. (Y T BY)=I p Solution: min tr(Y T AY) = Σ λ j (j=1,2,..,p) λ 1 ≤ λ 2 ≤ λ 3 ≤ …≤ λ p < λ p+1 ≤ ……≤ λ n A. Sameh & J. Wisniewski: SIAM J. Num. Analysis, 1982 + A. Sameh & Z. Tong: J. Comp. Appl. Math., 2000.

23 TraceMin iteration

24 A Parallel Algorithm for Computing the Fiedler Vector: TraceMin-Fiedler Based on the Trace Minimization algorithm computes λ 2 and v 2 of L x = λ x – Forms the small Schur complement explicitly – Solution of the linear systems involving L via CG with diagonal preconditioning – L is not spd (it is spsd) Comparison against the best sequential scheme (Harwell Subroutine MC73) Manguoglu M, Cox E, Saied F, Sameh A: in review ACM TOMS 2010 ( http://arxiv.org/abs/1003.3689v1) Source code(GPL): www.cs.purdue.edu/homes/mmanguog/fiedler.html

25 Most time consuming part

26 f2: structral mechanics N: 71,505 NNZ: 5,294,285 Original matrixAfter MC73After TraceMin-Fiedler

27

28 Platform: a cluster where each node has two Intel (Westmere) X5670@2.93Ghz / 12 cores per nodeX5670@2.93Ghz Number of cores (using at most 8 cores per node) Stopping tolarence: ||Lx 2 -λ 2 x 2 || ∞ /||L|| ∞ ≤ 10 -5

29 Conclusions ● New, robust and scalable hybrid solvers on distributed memory clusters ● Tracemin-Fiedler is an effective parallel alternative for computing the fiedler vector.

30 Thank you


Download ppt "A Scalable Parallel Preconditioned Sparse Linear System Solver Murat ManguoğluMiddle East Technical University, Turkey Joint work with: Ahmed Sameh Purdue."

Similar presentations


Ads by Google