SOME EXPERIMENTS on GRID COMPUTING in COMPUTATIONAL FLUID DYNAMICS Thierry Coupez(**), Alain Dervieux(*), Hugues Digonnet(**), Hervé Guillard(*), Jacques Massoni (***), Vanessa Mariotti(*), Youssef Mesri(*), Patrick Nivet (*), Steve Wornom(*)
Large scale computations and CFD Turbulent flows, Required number of mesh points : N = Re ^9/4 Laboratory experiment : Re = Industrial devices : Re = Geophysical flows : Re =
Future of large scale computations in CFD What kind of architecture for these computations ? Super clusters, e.g Tera10 machine of DAM CEA 4532 proc Intel Titanium Grid architecture ? M mesh 10 M mesh 100 M mesh 1 Tflops 10 Tflops 100 Tflops
End-users requirements Transparent solution : The grid must be view as a single unified ressource by the end-users No important code modifications : codes using Fortran/MPI And C/C++/MPI must run on the grid Secure
Mecagrid :project Started 11/2002 Connect 3 sites in The PACA region Perform experiments In grid computing Applied to multimaterial Fluid dynamics
Set-up of the Grid Marseille and CEMEF clusters are private IP address Only front-end are routable through the internet Solution : create a VPN, front end are connected by a tunnel where packets are crypted and transmitted Installation of the Globus middleware () Message passing : MPICH-G2
The MecaGrid : heterogeneous architecture of 162 procs INRIA Sophia pf nina CEMEF Sophia IUSTI Marseille N=32, bi-proc Sp=2.4Ghz Vpq=100Mb/s N=32, mono-proc Sp=2.4Ghz Vpq=100Mb/s N=19, bi-proc Sp=2.4Ghz Vpq=1Gb/s N=16, bi-proc Sp=933Mhz Vpq=100Mb/s 10Mb/s 100Mb/s 10Mb/s
The Mecagrid : mesured performances INRIA Sophia pf nina CEMEF Sophia IUSTI Marseille N=32, bi- proc Sp=2.4Ghz Vpq=100Mb /s N=32, mono- proc Sp=2.4Ghz Vpq=100Mb/s N=16, bi- proc Sp=2.4Ghz Vpq=1Gb/s N=16, bi- proc Sp=933Mhz Vpq=100Mb /s 100Mb/s 3.7Mb/s Stability of the External network 7.2Mb/s 5Mb/s
CFD and parallelism SPMD model Mesh Partitioning Initial mesh Sub-domain 1Sub-domain 3Sub-domain 2 Solver Data solution Message passing Message passing
CODE PORTING AERO-3D Finite volume code using Fortran77/MPI 3D Compressible Navier-Stokes equations with Turbulence modeling ( instructions) Rewrite the code in fortran 90 AEDIPH Finite volume code designed for multimaterial Studies CIMlib library of CEMEF : a C++/MPI finite element library Solving multimaterial incompressible flows
Test case : Jet in cross flow 3D LES Turbulence Modeling, Compressible Flow, explicit solver Results for 32 partitions 100 time steps Sophia clusters Sophia1-Marseille Sophia2-Marseille 241K mesh 729 s 817 s 1181 Com/work 9% 69% 46% 400K mesh 827 s Com/work 1% 13% 6%
Test case 2: 3D Dam break pb 3-D Incompressible Navier-Stokes computation, Level-set representation of the interface with Hamilton-Jacobi reinitialization, Iterative implicit scheme using GMRES (MINRES) preconditioned with ILU, 600 time steps
3D DAM BREAK RESULTS 500 K mesh, 2.5M elements 600 time steps : Implicit code : 600 2Mx2M linear systems solved Results on 3 x 4 proc on 3 different clusters : 60 h With optimisation of the code for the grid : 37 h 1.5 M mesh, 8.7 M elements 600 time steps : Implicit code : 600 6Mx6M linear systems solved Results on 3 x 11 proc on 3 different clusters : 125 h
PROVISIONAL CONCLUSIONS : Mecagrid gives access to a large number of processors and the possibility to run larger applications than on a in-home cluster For sufficient large applications : compete with an in home cluster No significant communications overhead for sufficient large applications HOWEVER Fine tuning of the application codes to obtain good efficiency Algorithmic developments
Heterogeneous Mesh partitioning The mapping problem : find the mesh partition that minimise the CPU time Homogeneous (cluster architecture) : load balancing Heterogeneous (Grid):
Algorithmic Developpements Iterative linear solvers : b = AX A sparse X X + P (b-AX) P : Preconditioning matrix LU factorization of A : A = LU P : ILU (0), ILU(1), …ILU(k) ILU(0) ILU(1) ILU(2) ILU(3) Normalized # iter CPU cluster CPU Mecagrid
Hierarchical mesh partitioner Initial mesh partitioner
Heterogeneous Mesh partitioning : Test case on 32 proc, mesh size 400 K Sophia-MRS(hetero) Sophia1-Sophia2(hetero) Sophia1-Sophia2(homo) Sophia-MRS(homo) CPU Time clusters Gain of more than 75% !
Conclusions Grid appears as a viable alternative to the use of specialized super-clusters for large scale CFD computations From the point of view of the numerical analysis, grid architectures are a source of new questions : Mesh and graph partitioning Linear solvers Communication and latency hiding schemes ….