ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert
ParCFD Motivation Numerical simulation of pollutant dispersion in industrial sites Better evaluation of risk than with 1D model dispersion Efficiency Navier Stokes solver to run parametric studies Development of a parallel 3D Navier Stokes solver on unstructured meshes. 3D Observations 1D
ParCFD Numerical Methods Properties : Finite volume on unstructured finite elements mesh. Incompressible segregated solver with projection methods Extension to variable density flow with projection on energy equation (Mach Uniformity through the coupled pressure and temperature correction algorithm, 2005 Nerinckx,) Algorithm Fixed point non linear iteration for each time step with : A (W k+1 -W k ) = F(W k ) Parallelization : Evaluation of fluxes and assembling part (RHS + matrix) parallelized using domain decomposition Implicit upwind schemes >> Efficient solvers to solve large unstructured sparse linear systems of several millions of dofs. matrix RHS
ParCFD Parallel Linear Solvers Use of PETSC Krylov subspace iterative methods Acceleration of convergence with different preconditioning methods (Hypre library) : parallel ILU / AMG (Algebraic Multigrid Method) Many way of tunning AMG methods : Coarsening schemes Falgout PMIS, (Parallel Maximal Independent Set) HMIS Interpolation operation Classical interpolation FF, FF1 (De Sterck, Yang : Copper 2005 ; De Sterck 2006)
ParCFD D Poisson Equation : Tetrahedral Mesh Scale up 1 >> 64 processors 12,500 >> 400,000 dofs / proc Speed up 1,000,000 dofs P2CHPD IBM cluster, with Intel dual quad core processor nodes and Infiniband
ParCFD Scale up results Bring out 3 groups of preconditioning methods 1) ILU 2) AMG with high complexity coarsening schemes 3) AMG with low complexity coarsening schemes Better AMG scale up with low complexity coarsen schemes Krylov with AMG preconditioning + FF1 interpolation give the best scale up. (500 x faster than ILU)
ParCFD On the IBM cluster, scalability is good from 200,000 dofs / proc With lower dofs, too much communication cause a loss in scalability Beware the problem size !
ParCFD Speed Up on 1,000,000 dofs PMIS-FF1 give the best results On 32 processors 10 % faster than PMIS FF, 270 % faster than Falgout classical 500 % faster than ILU Efficiency collapse over 16 processors (62,500 dofs / procs) No. of procs Speed up
ParCFD Real case study PMIS – FF1 Real geometry. Application on meshes : 5 M of cells, 30 M of dofs Scalar transport equation
ParCFD Assembling time : 30% total time Parallelization of matrix assembling and RHS assembling perform well Parallelization of linear solver perform well but depends on problem size Parallel efficiency on Navier Stokes Problem
ParCFD
ParCFD
ParCFD Conclusion Objective : Build a new efficient parallel Navier Stokes solver Laplacian equation : Low complexity scheme PMIS with FF1 interpolation gives the best results (speed up, scale up, simulation times) 500 times faster than ILU preconditioning methods Navier Stokes problem on 5M cells mesh run in 6 hours on 64 processors. Good speed up on 5M cells mesh up to 64 processors. Communications in linear solver process limits speed up