Download presentation
Presentation is loading. Please wait.
Published byOwen Holland Modified over 9 years ago
1
A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna, Giuseppe Gambolati, Flavio Sartoretto Department ICEA Sparse Days 2014 June 5-6
2
Outline Introduction: preconditioning techniques for high performance computing Approximate inverse preconditioning for Symmetric Positive Definite matrices: the FSAI-based approach FSAIPACK: a software package for high performance FSAI preconditioning Numerical results Conclusions and future work
3
Introduction Preconditioning techniques for high performance computing The implementation of large models is becoming quite a popular effort in several applications, with the the use of parallel computational resources almost mandatory One of the most expensive and memory-consuming tasks in any numerical application is the solution of large and sparse linear systems Conjugate Gradient-like solution methods can be efficiently implemented on parallel computers provided that an effective parallel preconditioner is available Algebraic preconditioners: robust algorithms that generate a preconditioner from the knowledge of the system matrix only, independently of the problem it arises from Most popular and successful classes of preconditioners: Incomplete LU factorizations Approximate inverses Algebraic multigrid
4
Introduction Preconditioning techniques for high performance computing For parallel computations the Factorized Sparse Approximate Inverse (FSAI) approach is quite attractive, as it is «naturally» parallel FSAIPACK: a parallel software package for high performance FSAI preconditioning in the solution of Symmetric Positive Definite linear systems Collection of routines that implement several different existing methods for computing an FSAI-based preconditioner Allows for a very flexible user-specified construction of a parallel FSAI preconditioner General purpose package easy to be included as an external library into any existing code Currently coded in FORTRAN90 with Open MP directives for shared memory machines Freely available online at www.dmsa.unipd.it/~janna/software.html
5
The FSAI-based approach FSAI definition Factorized Sparse Approximate Inverse (FSAI): an almost perfectly parallel factored preconditioner for SPD problems [Kolotilina & Yeremin, 1993] : with G a lower triangular matrix such that: over the set of matrices with a prescribed lower triangular sparsity pattern S L, e.g. the pattern of A or A 2, where L is the exact Cholesky factor of A L is not actually required for computing G! Computed via the solution of n independent small dense systems and applied via matrix-vector products Nice features: (1) ideally perfect parallel construction and application of the preconditioner; (2) preservation of the positive definiteness of the native matrix
6
The FSAI-based approach FSAI definition The key property for the quality of any FSAI-based parallel preconditioner is the selection of the sparsity pattern S L Historically, the first idea to build S L is to define it a priori, but more effective strategies can be developed dynamically selecting the position of the non-zero entries in S L Static FSAI: S L is defined a priori, e.g., as the pattern of A k, possibly after a sparsification of A [Huckle 1999; Chow 2000, 2001] Dynamic FSAI: S L is defined dynamically during the computation of G using some optimization algorithm [Huckle 2003; Janna & Ferronato, 2011] Recurrent FSAI: the FSAI factor G is defined as the product of several factors, computed either statically or dynamically [Wang & Zhang 2003; Bergamaschi & Martinez 2012] Post-filtration: it is generally recommended to apply an a posteriori sparsification of G dropping the smallest entries [Kolotilina & Yeremin, 1999]
7
FSAIPACK Static FSAI construction FSAIPACK is a software library that collects several different ways for computing an FSAI preconditioner in a shared memory environment and allows for combining the construction techniques into original user- specified strategies Assuming that S L is given, it is possible to compute G Static FSAI: denote by P i the set of column indices belonging to the i-th row of S L Compute the vector by solving the m i ×m i linear system: and scale to obtain the dense i-th row of G:
8
FSAIPACK Static pattern generation Static pattern generation: S L is the lower triangular pattern of a power of A or of a sparsified A with: and: User-specified parameters needed: (integer), (real) The non-zero pattern for the Static FSAI computation can be generated with the aid of the following recurrence
9
FSAIPACK Dynamic FSAI construction For ill-conditioned problems high values of may be needed to properly decrease the iteration count, or even to allow for convergence, and the preconditioner construction and application can become quite heavy A most efficient option relies on selecting the pattern dynamically by an adaptive procedure which uses somewhat the “best” available positions for the non-zero coefficients The Kaporin conditioning number of an SPD matrix is defined as: where: andiff
10
FSAIPACK Dynamic FSAI construction The Kaporin conditioning number of an FSAI preconditioned matrix reads [Janna & Ferronato 2011; Janna et al. 2014] : where i depends on the non-zero entries in the i-th row of G: The scalar i is a quadratic form of A in Idea fo generating the pattern dynamically: for each row select the non- zero positions in providing the largest decrease in the i value Compute the gradient of i with respect to and retain the positions containing the largest entries The procedure can be iterated until either a maximum number of iterations or some exit tolerance is met
11
FSAIPACK Dynamic FSAI construction Adaptive FSAI: S L is built dynamically and G immediately computed, choosing s entries per step, with a maximum number of k max steps, into the i-th row such that: until the exit tolerance is achieved: Dynamic construction of FSAI by an adaptive pattern generation row-by- row: User-specified parameters needed: k max (integer), s (integer), (real) The default initial guess G 0 is diag(A) -1/2, but any other user-specified lower triangular matrix is possible
12
FSAIPACK Dynamic FSAI construction Iterative FSAI: the i-th row of G is computed by minimizing i with an incomplete Steepest Descent method: retaining the s largest entries per row for k iter iterations until the exit tolerance is achieved As i is a quadratic form of A in the i-th row of G, it can be minimized by using a gradient method This gives rise to an iterative construction of S L and G, another kind of Dynamic FSAI User-specified parameters needed: k iter (integer), s (integer), (real) The default initial guess G 0 is diag(A) -1/2, but any other user-specified lower triangular matrix is possible The use of an inner preconditioner M -1 is also allowed
13
FSAIPACK Recurrent FSAI construction Recurrent FSAI: the final factor G is obtained as the product of n l factors: where G k is the k-level preconditioning factor for: with A 0 =A and G 0 =I. Even if each factor is very sparse and computationally very cheap, the resulting preconditioner is actually very dense and never formed explicitly: Implicit construction of the sparsity pattern S L, writing the FSAI preconditioner as a product of factors:
14
FSAIPACK Numerical results Analysis of the properties of each single method on a structural test case (size = 190,581, no. of non-zeroes: 7,531,389): Static FSAI
15
FSAIPACK Numerical results Adaptive FSAI
16
FSAIPACK Numerical results Iterative FSAI
17
FSAIPACK Numerical results Recurrent FSAI
18
FSAIPACK Numerical results Comparison between the different methods on a Linux Cluster with 24 processors: G =0.50 G =1.00 G =2.00 T p [s]# iter.T p [s]# iter.T p [s]# iter. Static0.208850.848582.68558 Adaptive1.246221.965577.47444 Iterative1.426972.136073.97562 Recurrent2.726176.6450413.48426 The most efficient option is combining the different methods so as to maximize the pros and minimize the cons FSAIPACK implements all the methods for building a FSAI-based preconditioner following a user-specified strategy that can be prescribed by a pseudo-programming language
19
FSAIPACK Numerical results Examples and numerical results (Linux Cluster, 24 processors) # iter.T p [s]T s [s]T t [s] GG Static ( =3, =1e-2) 224516.0101.4117.40.214 Adaptive (k max =10, s=5, =1e-2) 8979.543.953.40.323 Iterative (k iter = 20, s=10)133227.859.186.90.213 Static + Adaptive8616.234.340.50.270 Iterative + Static + Adaptive6759.333.843.10.332 EMILIA (reservoir mechanics): size = 923,136 non-zeroes = 41,005,206 Note: Post-filtration is used anyway
20
FSAIPACK Numerical results # iter.T p [s]T s [s]T t [s] GG Static ( =4, =1e-2) 7361.624.826.40.329 Adaptive (k max =20, s=1, =1e-3) 3603.413.416.80.476 Iterative (k iter =10, s=10)12045.340.946.20.387 Iterative+Static+S.P. Iterative1917.07.714.70.626 Static+S.P. Iterative+Adaptive2204.88.813.60.590 STOCF (porous media flow): size = 1,465,137 non-zeroes = 21,005,389 Note: Post-filtration is used anyway
21
FSAIPACK Numerical results # iter.T p [s]T s [s]T t [s] GG Static ( =3, =1e-2) 220823.1119.4142.50.238 Adaptive (k max =25, s=2, =1e-3) 68125.540.265.70.317 Iterative (k iter =30, s=10)198140.0102.0142.00.187 Static+S.P. Iterative+Adaptive66116.539.355.80.305 Iterative+Adaptive68914.939.854.70.294 MECH (structural mechanics): size = 1,102,614 non-zeroes = 48,987,558 Note: Post-filtration is used anyway
22
FSAIPACK Numerical results Example of strategy prescribed using the pseudo-programming language > MK_PATTERN [ A : patt ] -t -k 1e-2 2 > STATIC_FSAI [ A, patt : F ] > TRANSP_FSAI [ F : Ft ] > PROJ_FSAI [ A, F, Ft : F ] -n -s -e 1 10 1e-8 > ADAPT_FSAI [ A : F ] -n -s -e 10 1 1e-3 > POST_FILT [ A : F ] -t 0.01 > TRANSP_FSAI [ F : Ft ] > APPEND_FSAI [ F, Ft : PREC ] > MK_PATTERN [ A : patt ] -t -k 1e-2 2 > STATIC_FSAI [ A, patt : F ] > TRANSP_FSAI [ F : Ft ] > PROJ_FSAI [ A, F, Ft : F ] -n -s -e 1 10 1e-8 > ADAPT_FSAI [ A : F ] -n -s -e 10 1 1e-3 > POST_FILT [ A : F ] -t 0.01 > TRANSP_FSAI [ F : Ft ] > APPEND_FSAI [ F, Ft : PREC ] Easy management also of complex strategies
23
FSAIPACK Numerical results FSAIPACK scalability on the largest example Test on an IBM-Bluegene/Q node equipped with 16 cores Between 16 and 64 threads the ideal profile is flat because all physical cores are saturated Using more threads than cores is convenient as we hide memory access latencies
24
Conclusions Results… FSAI-based approaches are attractive preconditioners for an efficient solution of SPD linear systems on parallel computers The traditional static pattern generation is fast and cheap, but can give rise to poor preconditioners The dynamic pattern generation can improve considerably the FSAI quality, especially in ill-conditioned problems, but its cost typically increases quite rapidly with the density of the preconditioner FSAIPACK is a high performance software package that has been implemented for building a FSAI-based preconditioner using a user- specified strategy that combines different methods for selecting the sparsity pattern A smart combination of static and dynamic pattern generation techniques is probably the most efficient way to build an effective preconditioner even for very ill-conditioned problems
25
Conclusions … and future work Generalizing the results also for non-symmetric linear systems: difficulties with existence and uniqueness of the preconditioner, and with an efficient dynamic pattern generation Implementing the FSAIPACK library also for distributed memory computers and GPU accelerators mixing OpenMP, MPI and CUDA Studying in more detail the Iterative FSAI construction: http://www.dmsa.unipd.it/~janna/software.html FSAIPACK is freely available online at: Analysis of the theoretical properties of Incomplete gradient methods Replace the Incomplete Steepest Descent method with an Incomplete Self-Preconditioned Conjugate Gradient method Understand why the pattern is generally good, even though the computed coefficients could be inaccurate
26
Department ICEA Thank you for your attention Sparse Days 2014 June 5-6
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.