Trends in Sparse Linear Research and Software Developments in France P. Amestoy 1, M. Daydé 1, I. Duff 2, L. Giraud 1, A. Haidar 3, S. Lanteri 4, J.-Y.

Trends in Sparse Linear Research and Software Developments in France P. Amestoy 1, M. Daydé 1, I. Duff 2, L. Giraud 1, A. Haidar 3, S. Lanteri 4, J.-Y. L’Excellent 5, P. Ramet 6 1 IRIT – INPT / ENSEEIHT 2 CERFACS & RAL 3 CERFACS 4 INRIA, nachos project-team 5 INRIA / LIP-ENSL 6 INRIA Futurs / LaBRI

Overview of sparse linear algebra techniques From Iain Duff CERFACS & RAL

Introduction Solution of Solution of Ax = b Where A is large (may be 10 6 or greater) and sparse

Direct methods Key idea : factorize the matrix into the product of matrices easy to invert (triangular) with possibly some permutations to preserve sparsity and maintain numerical stability E.g. Gaussian Elimination :PAQ = LUwhere P and Q are rows / columns permutations, L : Lower triangular (sparse), U: Upper triangular (sparse) Then forward / backward substitution :Ly = Pb UQ T x = y Good points Can solve very large problems even in 3D Generally are fast... even on fancy machines Are robust and well packaged Bad points Need to have A either element-wise or assembled : can have very high storage requirement Can be costly

Iterative methods Generate a sequence of vectors converging towards the solution of the linear system E.g. iterative methods based on Krylov subspaces : К k (A; r 0 ) = sp{r 0, Ar 0, · · · A k−1 r 0 } where r 0 = b − Ax 0. The idea is then to choose a suitable x k  x 0 + К k (A; r 0 ) For example, so that it minimizes || b − Ax k || 2 (GMRES). There are many Krylov methods, depending on criteria used for choosing x k. Good points May not require to form A Usually very low storage requirements Can be efficient on 3D problems Bad points May require a lot of iterations for convergence May require preconditioning

Hybrid methods Not just preconditioning Not just preconditioning ILU Sparse approximate inverse or Incomplete Cholesky: core techniques of direct methods are used We focus on using a direct method/code combined with an iterative method.

Hybrid methids (con’t) Generic examples of hybrid methods are: Domain decomposition... using direct method on local subdomains and/or direct preconditioner on interface Block iterative methods.... direct solver on subblocks Factorization of nearby problem as a preconditioner

Sparse direct methods

Sparse Direct Solvers Usually three steps Usually three steps Pre-processing : symbolic factorizationPre-processing : symbolic factorization Numercial factorizationNumercial factorization Forward-backward substitutionForward-backward substitution Simulate a phenomenon Solve a sparse linear system ~100 MB Factorize the matrix ~10 GB MUMPS 71 GFlops 512 proc T3E

MUMPS and sparse direct methods MUMPS team http://graal.ens-lyon.fr/MUMPShttp://graal.ens-lyon.fr/MUMPS and http://mumps.enseeiht.fr http://mumps.enseeiht.fr http://graal.ens-lyon.fr/MUMPShttp://mumps.enseeiht.fr

History Main contributors since 1996 : Patrick Amestoy, Iain Duff, Abdou Guermouche, Jacko Koster, Jean-Yves L’Excellent, Stéphane Pralet Current development team : Patrick Amestoy, ENSEEIHT-IRIT Abdou Guermouche, LABRI-INRIA Jean-Yves L’Excellent, INRIA Stéphane Pralet, now working for SAMTECH Phd Students Emmanuel Agullo, ENS-Lyon Tzvetomila Slavova, CERFACS.

Users Around Around 1000 users, 2 requests per day Academics or industrials Type of applications : Structural mechanics, CAD Fluid dynamics, Magnetohydrodynamic, Physical Chemistry Wave propagation and seismic imaging, Ocean modelling Acoustics and electromagnetics propagation Biology Finite Element Analysis, Numerical Optimization, Simulation...

MUMPS : A MUltifrontal Massively Parallel Solver MUMPS solves large systems of linear equations of the form Ax=b by factorizing A into A=LU or LDL T Symmetric or unsymmetric matrices (partial pivoting) Parallel factorization and solution phases (uniprocessor version also available) Iterative refinement and backward error analysis Various matrix input formats assembled format distributed assembled format sum of elemental matrices Partial factorization and Schur complement matrix Version for complex arithmetic Several orderings interfaced: AMD, AMF, PORD, METIS, SCOTCH

The multifrontal method (Duff, Reid’83) Memory is divided into two parts (that can overlap in time) : the factors the active memory Elimination tree represents tasks dependencies

Implementation Distributed multifrontal solver (MPI / F90 based) Dynamic distributed scheduling to accomodate both numerical fill-in and multi-user environments Use of BLAS, ScaLAPACK A fully asynchronous distributed solver (VAMPIR trace, 8 processors)

MUMPS 3 main steps (plus initialization and termination) : JOB=-1 : initialize solver type (LU, LDLT ) and default parameters JOB=1 : analyse the matrix, build an ordering, prepare factorization JOB=2 : (parallel) numerical factorization A = LU JOB=3 : (parallel) solution step forward and backward substitutions (Ly = b,Ux = y) JOB=-2 : termination deallocate all MUMPS data structures

Car body 148770 unknowns and 5396386 nonzeros MSC.Software AVAILABILITY MUMPS is available free of charge It is used on a number of platforms (CRAY, SGI, IBM, Linux, …) and is downloaded once a day on average (applications in chemistry, aeronautics, geophysics,...) If you are interested in obtaining MUMPS for your own use, please refer to the MUMPS home page Some MUMPS users: Boeing, BRGM, CEA, Dassault, EADS, EDF, MIT, NASA, SAMTECH,...

COMPETITIVE PERFORMANCE Comparison with SuperLU extracted from ACM TOMS 2001 and obtained with S. Li Recent performance results (ops = nb of operations) Factorization time in seconds of large matrices on the CRAY T3E (1 proc: not enough memory)

Functionalities, Features Recent features Symmetric indefinite matrices : preprocessing and 2-by-2 pivots Hybrid scheduling 2D cyclic distributed Schur complement Sparse, multiple right-hand sides Singular matrices with detection of null pivots Interfaces to MUMPS : Fortran, C, Matlab (S. Pralet, while at ENSEEIHT-IRIT) and Scilab (A. Fèvre, INRIA) Future functionalities Out-of-core execution Parallel analysis phase Rank revealing algorithms Hybrid direct-iterative solvers (with Luc Giraud)

Ratio of active and total memory peak on different numbers of processors for several large problems (Ph.D. E. Agullo, ENS Lyon and Ph.D. M. Slavova, CERFACS) Use disk storage to solve very large problems Parallel out-of-core Factorization Preprocessing to minimize volume of I/O Scheduling for out-of-core solution On-going research on Out-of-Core solvers

Hybrid Scheduling Both memory and workload information are used to obtain a better behaviour in terms of estimated memory, memory used and factorization time in the context of parallel factorization algorithms Estimated memory much closer to effective memory used Estimated and effective memory (millions of reals) for the factorization on 64 processors Max: maximum amount of memory Avg: average memory per processor

Memory minimizing schedules Multifrontal methods can use a large amount of temporary data By decoupling task allocation and task processing, we can reduce the amount of temporary data: new optimal schedule proposed in this context (Guermouche, L'Excellent, ACM TOMS) Memory gains: Active memory ratio (new algorithm vs Liu's ordering) Remark: Gains relative to Liu's algorithm are equal to 27.1, 17.5 and 19.6 for matrices 8, 9, and 10 (gupta matrices), respectively

Preprocessing for symmetric matrices (S. Pralet, ENSEEIHT-IRIT) Preprocessing: new scaling available, symmetric weighted matching and automatic tuning of the preprocessing strategies Pivoting strategy (2-by-2 pivot and static pivoting) Improvement: factorization time robustness in particular on KKT systems arising from optimization memory estimation Factorization time on a Linux PC (Pentium4 2.80 GHz)

Scotch, PasTIX PaStiX Team INRIA – Futurs / LaBri

PaStiX solver Functionnalities LLt, LDLt, LU factorization (symmetric pattern) with supernodal implementationLLt, LDLt, LU factorization (symmetric pattern) with supernodal implementation Static pivoting (Max. Weight Matching) + It. Raff. / CG / GMRESStatic pivoting (Max. Weight Matching) + It. Raff. / CG / GMRES 1D/2D block distribution + Full BLAS31D/2D block distribution + Full BLAS3 Support external ordering library (provided Scotch ordering)Support external ordering library (provided Scotch ordering) MPI/Threads implementation (SMP node / Cluster / Multi-core / NUMA)MPI/Threads implementation (SMP node / Cluster / Multi-core / NUMA) Simple/Double precision + Float/Complexe operationsSimple/Double precision + Float/Complexe operations Require only C + MPI + Posix ThreadRequire only C + MPI + Posix Thread Multiple RHS (direct factorization)Multiple RHS (direct factorization) Incomplete factorization ILU(k) preconditionnerIncomplete factorization ILU(k) preconditionner

PaStiX solver (con’t) Available on INRIA Gforge All-in-One source codeAll-in-One source code Easy to install on Linux or AIX systemsEasy to install on Linux or AIX systems Simple API (WSMP like)Simple API (WSMP like) Thread safe (can be called from multiple threads in multiple MPI communicators)Thread safe (can be called from multiple threads in multiple MPI communicators) Current works Use of parallel ordering (PT-Scotch) and parallel symbolic factorization)Use of parallel ordering (PT-Scotch) and parallel symbolic factorization) Dynamic scheduling inside SMP nodes (static mapping)Dynamic scheduling inside SMP nodes (static mapping) Out-of Core implementationOut-of Core implementation Generic Finite Element Assembly (domaine decomposition associated to matrix distribution)Generic Finite Element Assembly (domaine decomposition associated to matrix distribution)

Direct solver chain (in PaStiX) Scotch (ordering & amalgamation) Fax (block symbolic factorization) Blend (refinement & mapping) Sopalin (factorizing & solving) graphpartitionsymbolMatrix Distributed solverMatrix Distributed factorized solverMatrix Distributed solution Analyze (sequential steps)// fact. and solve

Direct solver chain (in PaStiX) Scotch (ordering & amalgamation) Fax (block symbolic factorization) Blend (refinement & mapping) Sopalin (factorizing & solving) graphpartitionsymbolMatrix Distributed solverMatrix Distributed factorized solverMatrix Distributed solution Sparse matrix ordering (minimizes fill-in) Scotch: an hybrid algorithm incomplete Nested Dissection the resulting subgraphs being ordered with an Approximate Minimum Degree method under constraints (HAMD)

Direct solver chain (in PaStiX) Scotch (ordering & amalgamation) Fax (block symbolic factorization) Blend (refinement & mapping) Sopalin (factorizing & solving) graphpartitionsymbolMatrix Distributed solverMatrix Distributed factorized solverMatrix Distributed solution The symbolic block factorization Q(G,P) → Q(G,P)*=Q(G*,P) => linear in number of blocks! Q(G,P) → Q(G,P)*=Q(G*,P) => linear in number of blocks! Dense block structures → only a few extra pointers to store the matrix Dense block structures → only a few extra pointers to store the matrix

Direct solver chain (in PaStiX) Scotch (ordering & amalgamation) Fax (block symbolic factorization) Blend (refinement & mapping) Sopalin (factorizing & solving) graphpartitionsymbolMatrix Distributed solverMatrix Distributed factorized solverMatrix Distributed solution 1 2 3 4 5 6 7 8 1 2 3 4 5 5 6 7 8 5 1 2 3 4 5 6 7 8 44 1 2 2 38 6 7 2 321 677 CPU time prediction Exact memory ressources

Yield 1D and 2D block distributions Yield 1D and 2D block distributions BLAS efficiency on small supernodes  1D BLAS efficiency on small supernodes  1D Scalability on larger supernodes  2D Scalability on larger supernodes  2D  Switching criterion 1D block distribution 2D block distribution Hybrid 1D and 2D distributions

« 2D » to « 2D » communication scheme Dynamic technique is used to improve « 1D » to « 1D/2D » communication scheme Supernodal scheme

Modern architecture management (SMP nodes) : hybrid Threads/MPI implementation (all processors in the same SMP node work directly in share memory) Modern architecture management (SMP nodes) : hybrid Threads/MPI implementation (all processors in the same SMP node work directly in share memory)  Less MPI communication and lower the parallel memory overcost Scotch (ordering & amalgamation) Fax (block symbolic factorization) Blend (refinement & mapping) Sopalin (factorizing & solving) graphpartitionsymbolMatrix Distributed solverMatrix Distributed factorized solverMatrix Distributed solution Direct solver chain (in PaStiX)

Incomplete factorization in PaStiX Start from the acknowledgement that it is difficult to build a generic and robust pre-conditioner Start from the acknowledgement that it is difficult to build a generic and robust pre-conditioner Large scale 3D problems Large scale 3D problems High performance computing High performance computing Derive direct solver techniques to pre-conditioner Derive direct solver techniques to pre-conditioner What’s new: (dense) block formulation What’s new: (dense) block formulation Incomplete block symbolic factorization: Incomplete block symbolic factorization: Remove blocks with algebraic criteria Remove blocks with algebraic criteria Use amalgamation algorithm to get dense blocks Use amalgamation algorithm to get dense blocks Provide incomplete factorization LDL t, Cholesky, LU (with static pivoting for symmetric pattern) Provide incomplete factorization LDL t, Cholesky, LU (with static pivoting for symmetric pattern)

Numerical experiments (TERA1) Successful approach for a large collection of industrial test cases (PARASOL, Boeing Harwell, CEA) on IBM SP3 Successful approach for a large collection of industrial test cases (PARASOL, Boeing Harwell, CEA) on IBM SP3 TERA1 supercomputer of CEA Ile-de-France (ES45 SMP 4 procs) TERA1 supercomputer of CEA Ile-de-France (ES45 SMP 4 procs) COUPOLE40000 : 26.5 10 6 of unknowns 1.5 10 10 NNZL and 10.8Tflops COUPOLE40000 : 26.5 10 6 of unknowns 1.5 10 10 NNZL and 10.8Tflops 356 procs: 34s 356 procs: 34s 512 procs: 27s 512 procs: 27s 768 procs: 20s 768 procs: 20s (>500Gflop/s about 35% peak perf.) (>500Gflop/s about 35% peak perf.)

Numerical experiments (TERA10) Successful approach on 3D mesh problem with about 30 millions of unkowns on TERA10 supercomputer Successful approach on 3D mesh problem with about 30 millions of unkowns on TERA10 supercomputer But memory is the bottleneck !!! But memory is the bottleneck !!! ODYSSEE code of French CEA/CESTA Electro- magnetism code (Finite Element Meth. + Integral Equation) Complex double precision, Schur Compl.

SP3 : AUDI ; N=943.10 3 ; NNZ L =1,21.10 9 ; 5,3 TFlops Processors16326412816 481, 96 270,1 6 152,5 8 86,07 8 476, 40 265,0 9 145,1 9 81,90 4 472, 67 266,8 9 155,2 3 87,56 1 ### ### (211,3 5) (134,4 5) Num. of threads / MPI process

ASTER project ANR CIS 2006 Adaptive MHD Simulation of Tokamak ELMs for ITER (ASTER) ANR CIS 2006 Adaptive MHD Simulation of Tokamak ELMs for ITER (ASTER) The ELM (Edge-Localized-Mode) is MHD instability which localised at the boundary of the plasma The ELM (Edge-Localized-Mode) is MHD instability which localised at the boundary of the plasma The energy losses induced by the ELMs within several hundred microseconds are a real concern for ITER The energy losses induced by the ELMs within several hundred microseconds are a real concern for ITER The non-linear MHD simulation code JOREK is under development at the CEA to study the evolution of the ELM instability The non-linear MHD simulation code JOREK is under development at the CEA to study the evolution of the ELM instability To simulate the complete cycle of the ELM instability, a large range of time scales need to be resolved to study: To simulate the complete cycle of the ELM instability, a large range of time scales need to be resolved to study: The evolution of the equilibrium pressure gradient (~seconds) The evolution of the equilibrium pressure gradient (~seconds) ELM instability (~hundred microseconds) ELM instability (~hundred microseconds)  A fully implicit time evolution scheme is used in the JOREK code This leads to a large sparse matrix system to be solved at every time step. This leads to a large sparse matrix system to be solved at every time step. This scheme is possible due to the recent advances made in the parallelized direct solution of general sparse matrices (MUMPS, PaStiX, SuperLU,...) This scheme is possible due to the recent advances made in the parallelized direct solution of general sparse matrices (MUMPS, PaStiX, SuperLU,...)

NUMASIS project Middleware for NUMA architecture Middleware for NUMA architecture Application to seismology simulation Application to seismology simulation New dynamic scheduling algorithm for solvers New dynamic scheduling algorithm for solvers Take into account non-uniform access to memory Take into account non-uniform access to memory Use of MadMPI (R. Namyst) Use of MadMPI (R. Namyst) Need a middleware to handle complexe architectures Need a middleware to handle complexe architectures Multi-Rails, packet re-ordering, … Multi-Rails, packet re-ordering, … PhD of Mathieu Faverge (INRIA-LaBRI) PhD of Mathieu Faverge (INRIA-LaBRI)

Links Scotch : http://gforge.inria.fr/projects/scotch Scotch : http://gforge.inria.fr/projects/scotchhttp://gforge.inria.fr/projects/scotch PaStiX : http://gforge.inria.fr/projects/pastix PaStiX : http://gforge.inria.fr/projects/pastixhttp://gforge.inria.fr/projects/pastix MUMPS : http://mumps.enseeiht.fr/ http://graal.ens-lyon.fr/MUMPS MUMPS : http://mumps.enseeiht.fr/ http://graal.ens-lyon.fr/MUMPShttp://mumps.enseeiht.fr/ http://graal.ens-lyon.fr/MUMPShttp://mumps.enseeiht.fr/ http://graal.ens-lyon.fr/MUMPS ScAlApplix : http://www.labri.fr/project/scalapplix ScAlApplix : http://www.labri.fr/project/scalapplixhttp://www.labri.fr/project/scalapplix ANR CIGCNumasis ANR CIGCNumasis ANR CIS Solstice & Aster ANR CIS Solstice & Aster Latest publication : to appear in Parallel Computing : On finding approximate supernodes for an efficient ILU(k) factorization Latest publication : to appear in Parallel Computing : On finding approximate supernodes for an efficient ILU(k) factorization For more publications, see : http://www.labri.fr/~ramet/ For more publications, see : http://www.labri.fr/~ramet/

OSSAU code of French CEA/CESTA OSSAU code of French CEA/CESTA 2D / 3D structural mechanics code 2D / 3D structural mechanics code ODYSSEE code of French CEA/CESTA ODYSSEE code of French CEA/CESTA Electro-magnetism code (Finite Element Meth. + Integral Equation) Electro-magnetism code (Finite Element Meth. + Integral Equation) Complex double precision, Schur Compl. Complex double precision, Schur Compl. Fluid mechanics Fluid mechanics LU factorization with static pivoting (SuperLU approach like) LU factorization with static pivoting (SuperLU approach like) Industrial applications

Other parallel sparse direct codes Code TechniqueScopeAvailability (www.) MA41MultifrontalUNScse.clrc.ac.uk/Activity/HSL MA49 Multifrontal QRRECTcse.clrc.ac.uk/Activity/HSL PanelLLT Left-lookingSPDNg PARDISO Left-right lookingUNSSchenk PSLLeft-lookingSPD/UNSSGI product SPOOLESFan-inSYM/UNSnetlib.org/linalg/spooles SuperLULeft-lookingUNSnersc.gov/xiaoye/SuperLU WSMPMultifrontalSYM/UNS IBM product Shared-memory codes

Other parallel sparse direct codes Distributed-memory codes Code TechniqueScope Availability (www.) CAPSS Multifrontal SPD netlib.org/scalapack MUMPS Multifrontal SYM/UNS mumps.enseeiht.fr graal.ens-lyon.fr/MUMPS PaStiX Fan-in SYM/UNS gforge.inria.fr/pastix PSPASES Multifrontal SPD cs.umn.edu/mjoshi/pspas es SPOOLES Fan-in SYM/UNS netlib.org/linalg/spooles SuperLU Fan-out UNS nersc.gov/xiaoye/SuperLU S+ Fan-out UNS cs.ucsb.edu/research/S+ WSMP Multifrontal SYM IBM product

Sparse solver for Ax = b: only a black box ? Preprocessing and postprocessing : Symmetric permutations to reduce fill :(Ax = b => PAPtPx = b) Numerical pivoting, scaling to preserve numerical accuracy Maximum transversal (set large entries on the diagonal) Preprocessing for parallelism (influence of task mapping on parallelism) Iterative refinement, error analysis Default (often automatic/adaptive) setting of the options is available. However, a better knowledge of the options can help the user to further improve : memory usage, time for solution, numerical accuracy.

The GRID-TLSE Projet A web expert site for sparse linear algebra

Overview of GRID-TLSE: http://gridtlse.org Supported by Supported by ANR LEGO ProjectANR LEGO Project ANR SOLSTICE ProjectANR SOLSTICE Project CNRS / JST Program : REDIMPS ProjectCNRS / JST Program : REDIMPS Project ACI GRID-TLSE ProjectACI GRID-TLSE Project Partners: Partners: Supported by

Sparse Matrices Expert Site ? Expert site: Help users in choosing the right solvers and its parameters for a given problem Expert site: Help users in choosing the right solvers and its parameters for a given problem Chosen approach: Expert scenarios which answer common user requests Chosen approach: Expert scenarios which answer common user requests Main goal: Provide a friendly test environment for expert and non-expert users of sparse linear algebra software. Main goal: Provide a friendly test environment for expert and non-expert users of sparse linear algebra software.

Sparse Matrices Expert Site ? Easy access to: Easy access to: Software and tools;Software and tools; A wide range of computer architectures;A wide range of computer architectures; Matrix collections;Matrix collections; Expert Scenarios.Expert Scenarios. Also : Provide a testbed for sparse linear algebra software Also : Provide a testbed for sparse linear algebra software

Why using the grid ? Sparse linear algebra software makes use of sophisticated algorithms for (pre-/post-) processing the matrix. Sparse linear algebra software makes use of sophisticated algorithms for (pre-/post-) processing the matrix. Multiple parameters interfere for efficient execution of a sparse direct solver: Multiple parameters interfere for efficient execution of a sparse direct solver: Ordering;Ordering; Amount of memory;Amount of memory; Architecture of computer;Architecture of computer; Available libraries.Available libraries. Determining the best combination of parameter values is a multi-parametric problem. Determining the best combination of parameter values is a multi-parametric problem. Well-suited for execution over a Grid. Well-suited for execution over a Grid.

Components How do software X and Y compare in terms of memory and CPU on my favourite matrix A ? ? Software components  Weaver: high level administrator for the deployment and the exploitation of services on the grid  Websolve: an Web interface to start services on he grid  Middleware: DIET developed within GRID-ASP (LIP, Loria Resedas, LIFC- SDRP) and soon ITBL (?) User 1 User i

Hybrid Solvers

Parallel hybrid iterative/direct solver for the solution of large sparse linear systems arising from 3D elliptic discretization L. Giraud 1 A. Haidar 2 S. Watson 3 1 ENSEEIHT, Parallel Algorithms and Optimization Group 2 rue Camichel, 31071 Toulouse, France CERFACS - Parallel Algorithm Project 42 Avenue Coriolis, 31057 Toulouse, France Departments of Computer Science and Mathematics, Virginia Polytechnic Institute 3 Departments of Computer Science and Mathematics, Virginia Polytechnic Institute, USA

Non-overlapping domain decomposition Natural approach for PDE’s, extend to general sparse matrices Partition the problem into subdomains, subgraphs Use a direct solver on the subdomains (MUMPS package) Robust algebraically preconditioned iterative solver on the interface (Algebraic Additive Schwarz preconditioner, possibly with sparsified and mixed arithmetic variants)

Numerical behaviour of the preconditioners Convergence history on a 43 millions dof problems on 1000 System X processors - Virginia Tech.

Parallel scaled scalability study Parallel elapsed time with fixed sub-problem size (43 000 dof) when the number of procesors varies from 27 (1.1 10 6 dof) up-to 1000 (43 10 6 dof)

Hybrid iterative/direct strategies for solving large sparse linear systems resulting from the finite element discretization of the time-harmonic Maxwell equations L. Giraud 1 A. Haidar 2 S. Lanteri 3 1 ENSEEIHT, Parallel Algorithms and Optimization Group 2 rue Camichel, 31071 Toulouse, France CERFACS - Parallel Algorithm Project 42 Avenue Coriolis, 31057 Toulouse, France 3 INRIA, nachos project-team 2004 Route des Lucioles, BP 93, 06902 Sophia Antipolis Cedex, France

Context and objectives Solution of Solution of time-harmonic electromagnetic wave propagation Discretization in space Discontinuous Galerkin time-harmonic methods Unstructured tetrahedral meshes Based on Pp nodal (Lagrange) interpolation Centered or upwind fluxes for the calculation of jump terms at cell boundaries Discretization in space results in a large, sparse, with complex coefficients linear system Direct (sparse LU) solvers for 2D problems Parallel solvers are mandatory for 3D problems Related publications H. Fol (PhD thesis, 2006) V. Dolean, H. Fol, S. Lanteri and R. Perrussel(J. Comp. Appl. Math., to appear, 2007)

Solution algorithm Parallel hybrid iterative/direct solver Domain decomposition framework Schwarz algorithm with characteristic interface conditions Sparse LU subdomain solver (MUMPS, P.R. Amestoy, I.S. Duff and J.-Y. L’Excellent, Comput. Meth. App. Mech. Engng., Vol 184, 2000) Interface (Schur complement type) formulation Iterative interface solver (GMRES or BiCGstab) Algebraic block preconditioning of the interface system; exploit the structure of the system (free to construct and store)

Scattering of a plane wave by a PEC cube Plane wave frequency: 900 MHz Tetrahedral mesh: # vertices = 67,590 # elements = 373,632 Total number of DOF: 2,241,792

Performance results on various number of processors (IBM JS21) # procs # procsPrecond 81632 None505463 M1242526 Number of iterations Scattering of a plane wave by a PEC cube

Scattering of a plane wave by a head Plane wave frequency: 1800 MHz Tetrahedral mesh: # vertices = 188101 # elements = 1118952 Total number of DOF: 6,713,712

Performance results on various number of processors (Blue Gene B/L) # procs # procsPrecond 48 4864128256 None150161198240 M140425162 Number of iterations Scattering of a plane wave by a head

Trends in Sparse Linear Research and Software Developments in France P. Amestoy 1, M. Daydé 1, I. Duff 2, L. Giraud 1, A. Haidar 3, S. Lanteri 4, J.-Y.

Similar presentations

Presentation on theme: "Trends in Sparse Linear Research and Software Developments in France P. Amestoy 1, M. Daydé 1, I. Duff 2, L. Giraud 1, A. Haidar 3, S. Lanteri 4, J.-Y."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Trends in Sparse Linear Research and Software Developments in France P. Amestoy 1, M. Daydé 1, I. Duff 2, L. Giraud 1, A. Haidar 3, S. Lanteri 4, J.-Y.

Similar presentations

Presentation on theme: "Trends in Sparse Linear Research and Software Developments in France P. Amestoy 1, M. Daydé 1, I. Duff 2, L. Giraud 1, A. Haidar 3, S. Lanteri 4, J.-Y."— Presentation transcript:

Similar presentations

About project

Feedback