Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

Similar presentations


Presentation on theme: "A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,"— Presentation transcript:

1 A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna, Giuseppe Gambolati, Flavio Sartoretto Department ICEA Sparse Days 2014 June 5-6

2 Outline  Introduction: preconditioning techniques for high performance computing  Approximate inverse preconditioning for Symmetric Positive Definite matrices: the FSAI-based approach  FSAIPACK: a software package for high performance FSAI preconditioning  Numerical results  Conclusions and future work

3 Introduction Preconditioning techniques for high performance computing  The implementation of large models is becoming quite a popular effort in several applications, with the the use of parallel computational resources almost mandatory  One of the most expensive and memory-consuming tasks in any numerical application is the solution of large and sparse linear systems  Conjugate Gradient-like solution methods can be efficiently implemented on parallel computers provided that an effective parallel preconditioner is available  Algebraic preconditioners: robust algorithms that generate a preconditioner from the knowledge of the system matrix only, independently of the problem it arises from  Most popular and successful classes of preconditioners:  Incomplete LU factorizations  Approximate inverses  Algebraic multigrid

4 Introduction Preconditioning techniques for high performance computing  For parallel computations the Factorized Sparse Approximate Inverse (FSAI) approach is quite attractive, as it is «naturally» parallel  FSAIPACK: a parallel software package for high performance FSAI preconditioning in the solution of Symmetric Positive Definite linear systems  Collection of routines that implement several different existing methods for computing an FSAI-based preconditioner  Allows for a very flexible user-specified construction of a parallel FSAI preconditioner  General purpose package easy to be included as an external library into any existing code  Currently coded in FORTRAN90 with Open MP directives for shared memory machines  Freely available online at www.dmsa.unipd.it/~janna/software.html

5 The FSAI-based approach FSAI definition  Factorized Sparse Approximate Inverse (FSAI): an almost perfectly parallel factored preconditioner for SPD problems [Kolotilina & Yeremin, 1993] : with G a lower triangular matrix such that: over the set of matrices with a prescribed lower triangular sparsity pattern S L, e.g. the pattern of A or A 2, where L is the exact Cholesky factor of A L is not actually required for computing G!  Computed via the solution of n independent small dense systems and applied via matrix-vector products  Nice features: (1) ideally perfect parallel construction and application of the preconditioner; (2) preservation of the positive definiteness of the native matrix

6 The FSAI-based approach FSAI definition  The key property for the quality of any FSAI-based parallel preconditioner is the selection of the sparsity pattern S L  Historically, the first idea to build S L is to define it a priori, but more effective strategies can be developed dynamically selecting the position of the non-zero entries in S L  Static FSAI: S L is defined a priori, e.g., as the pattern of A k, possibly after a sparsification of A [Huckle 1999; Chow 2000, 2001]  Dynamic FSAI: S L is defined dynamically during the computation of G using some optimization algorithm [Huckle 2003; Janna & Ferronato, 2011]  Recurrent FSAI: the FSAI factor G is defined as the product of several factors, computed either statically or dynamically [Wang & Zhang 2003; Bergamaschi & Martinez 2012]  Post-filtration: it is generally recommended to apply an a posteriori sparsification of G dropping the smallest entries [Kolotilina & Yeremin, 1999]

7 FSAIPACK Static FSAI construction  FSAIPACK is a software library that collects several different ways for computing an FSAI preconditioner in a shared memory environment and allows for combining the construction techniques into original user- specified strategies  Assuming that S L is given, it is possible to compute G  Static FSAI: denote by P i the set of column indices belonging to the i-th row of S L Compute the vector by solving the m i ×m i linear system: and scale to obtain the dense i-th row of G:

8 FSAIPACK Static pattern generation  Static pattern generation: S L is the lower triangular pattern of a power  of A or of a sparsified A with: and:  User-specified parameters needed:  (integer),  (real)  The non-zero pattern for the Static FSAI computation can be generated with the aid of the following recurrence

9 FSAIPACK Dynamic FSAI construction  For ill-conditioned problems high values of  may be needed to properly decrease the iteration count, or even to allow for convergence, and the preconditioner construction and application can become quite heavy  A most efficient option relies on selecting the pattern dynamically by an adaptive procedure which uses somewhat the “best” available positions for the non-zero coefficients  The Kaporin conditioning number  of an SPD matrix is defined as: where: andiff

10 FSAIPACK Dynamic FSAI construction  The Kaporin conditioning number of an FSAI preconditioned matrix reads [Janna & Ferronato 2011; Janna et al. 2014] : where  i depends on the non-zero entries in the i-th row of G:  The scalar  i is a quadratic form of A in  Idea fo generating the pattern dynamically: for each row select the non- zero positions in providing the largest decrease in the  i value  Compute the gradient of  i with respect to and retain the positions containing the largest entries  The procedure can be iterated until either a maximum number of iterations or some exit tolerance is met

11 FSAIPACK Dynamic FSAI construction  Adaptive FSAI: S L is built dynamically and G immediately computed, choosing s entries per step, with a maximum number of k max steps, into the i-th row such that: until the exit tolerance  is achieved:  Dynamic construction of FSAI by an adaptive pattern generation row-by- row:  User-specified parameters needed: k max (integer), s (integer),  (real)  The default initial guess G 0 is diag(A) -1/2, but any other user-specified lower triangular matrix is possible

12 FSAIPACK Dynamic FSAI construction  Iterative FSAI: the i-th row of G is computed by minimizing  i with an incomplete Steepest Descent method: retaining the s largest entries per row for k iter iterations until the exit tolerance  is achieved  As  i is a quadratic form of A in the i-th row of G, it can be minimized by using a gradient method  This gives rise to an iterative construction of S L and G, another kind of Dynamic FSAI  User-specified parameters needed: k iter (integer), s (integer),  (real)  The default initial guess G 0 is diag(A) -1/2, but any other user-specified lower triangular matrix is possible  The use of an inner preconditioner M -1 is also allowed

13 FSAIPACK Recurrent FSAI construction  Recurrent FSAI: the final factor G is obtained as the product of n l factors: where G k is the k-level preconditioning factor for: with A 0 =A and G 0 =I. Even if each factor is very sparse and computationally very cheap, the resulting preconditioner is actually very dense and never formed explicitly:  Implicit construction of the sparsity pattern S L, writing the FSAI preconditioner as a product of factors:

14 FSAIPACK Numerical results  Analysis of the properties of each single method on a structural test case (size = 190,581, no. of non-zeroes: 7,531,389):  Static FSAI

15 FSAIPACK Numerical results  Adaptive FSAI

16 FSAIPACK Numerical results  Iterative FSAI

17 FSAIPACK Numerical results  Recurrent FSAI

18 FSAIPACK Numerical results  Comparison between the different methods on a Linux Cluster with 24 processors:  G =0.50  G =1.00  G =2.00 T p [s]# iter.T p [s]# iter.T p [s]# iter. Static0.208850.848582.68558 Adaptive1.246221.965577.47444 Iterative1.426972.136073.97562 Recurrent2.726176.6450413.48426  The most efficient option is combining the different methods so as to maximize the pros and minimize the cons  FSAIPACK implements all the methods for building a FSAI-based preconditioner following a user-specified strategy that can be prescribed by a pseudo-programming language

19 FSAIPACK Numerical results  Examples and numerical results (Linux Cluster, 24 processors) # iter.T p [s]T s [s]T t [s] GG Static (  =3,  =1e-2) 224516.0101.4117.40.214 Adaptive (k max =10, s=5,  =1e-2) 8979.543.953.40.323 Iterative (k iter = 20, s=10)133227.859.186.90.213 Static + Adaptive8616.234.340.50.270 Iterative + Static + Adaptive6759.333.843.10.332 EMILIA (reservoir mechanics): size = 923,136 non-zeroes = 41,005,206 Note: Post-filtration is used anyway

20 FSAIPACK Numerical results # iter.T p [s]T s [s]T t [s] GG Static (  =4,  =1e-2) 7361.624.826.40.329 Adaptive (k max =20, s=1,  =1e-3) 3603.413.416.80.476 Iterative (k iter =10, s=10)12045.340.946.20.387 Iterative+Static+S.P. Iterative1917.07.714.70.626 Static+S.P. Iterative+Adaptive2204.88.813.60.590 STOCF (porous media flow): size = 1,465,137 non-zeroes = 21,005,389 Note: Post-filtration is used anyway

21 FSAIPACK Numerical results # iter.T p [s]T s [s]T t [s] GG Static (  =3,  =1e-2) 220823.1119.4142.50.238 Adaptive (k max =25, s=2,  =1e-3) 68125.540.265.70.317 Iterative (k iter =30, s=10)198140.0102.0142.00.187 Static+S.P. Iterative+Adaptive66116.539.355.80.305 Iterative+Adaptive68914.939.854.70.294 MECH (structural mechanics): size = 1,102,614 non-zeroes = 48,987,558 Note: Post-filtration is used anyway

22 FSAIPACK Numerical results  Example of strategy prescribed using the pseudo-programming language > MK_PATTERN [ A : patt ] -t -k 1e-2 2 > STATIC_FSAI [ A, patt : F ] > TRANSP_FSAI [ F : Ft ] > PROJ_FSAI [ A, F, Ft : F ] -n -s -e 1 10 1e-8 > ADAPT_FSAI [ A : F ] -n -s -e 10 1 1e-3 > POST_FILT [ A : F ] -t 0.01 > TRANSP_FSAI [ F : Ft ] > APPEND_FSAI [ F, Ft : PREC ] > MK_PATTERN [ A : patt ] -t -k 1e-2 2 > STATIC_FSAI [ A, patt : F ] > TRANSP_FSAI [ F : Ft ] > PROJ_FSAI [ A, F, Ft : F ] -n -s -e 1 10 1e-8 > ADAPT_FSAI [ A : F ] -n -s -e 10 1 1e-3 > POST_FILT [ A : F ] -t 0.01 > TRANSP_FSAI [ F : Ft ] > APPEND_FSAI [ F, Ft : PREC ] Easy management also of complex strategies

23 FSAIPACK Numerical results  FSAIPACK scalability on the largest example  Test on an IBM-Bluegene/Q node equipped with 16 cores  Between 16 and 64 threads the ideal profile is flat because all physical cores are saturated  Using more threads than cores is convenient as we hide memory access latencies

24 Conclusions Results…  FSAI-based approaches are attractive preconditioners for an efficient solution of SPD linear systems on parallel computers  The traditional static pattern generation is fast and cheap, but can give rise to poor preconditioners  The dynamic pattern generation can improve considerably the FSAI quality, especially in ill-conditioned problems, but its cost typically increases quite rapidly with the density of the preconditioner  FSAIPACK is a high performance software package that has been implemented for building a FSAI-based preconditioner using a user- specified strategy that combines different methods for selecting the sparsity pattern  A smart combination of static and dynamic pattern generation techniques is probably the most efficient way to build an effective preconditioner even for very ill-conditioned problems

25 Conclusions … and future work  Generalizing the results also for non-symmetric linear systems: difficulties with existence and uniqueness of the preconditioner, and with an efficient dynamic pattern generation  Implementing the FSAIPACK library also for distributed memory computers and GPU accelerators mixing OpenMP, MPI and CUDA  Studying in more detail the Iterative FSAI construction: http://www.dmsa.unipd.it/~janna/software.html  FSAIPACK is freely available online at:  Analysis of the theoretical properties of Incomplete gradient methods  Replace the Incomplete Steepest Descent method with an Incomplete Self-Preconditioned Conjugate Gradient method  Understand why the pattern is generally good, even though the computed coefficients could be inaccurate

26 Department ICEA Thank you for your attention Sparse Days 2014 June 5-6


Download ppt "A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,"

Similar presentations


Ads by Google