SOME EXPERIMENTS on GRID COMPUTING in COMPUTATIONAL FLUID DYNAMICS Thierry Coupez(**), Alain Dervieux(*), Hugues Digonnet(**), Hervé Guillard(*), Jacques.

Slides:



Advertisements
Similar presentations
1 A parallel software for a saltwater intrusion problem E. Canot IRISA/CNRS J. Erhel IRISA/INRIA Rennes C. de Dieuleveult IRISA/INRIA Rennes.
Advertisements

Sparse linear solvers applied to parallel simulations of underground flow in porous and fractured media A. Beaudoin 1, J.R. De Dreuzy 2, J. Erhel 1 and.
Setting up Small Grid Testbed
Extending the capability of TOUGHREACT simulator using parallel computing Application to environmental problems.
OpenFOAM on a GPU-based Heterogeneous Cluster
History of Distributed Systems Joseph Cordina
Reference: Message Passing Fundamentals.
Introduction CS 524 – High-Performance Computing.
Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus Gabrielle Allen, Thomas Dramlitsch, Ian Foster,
Improved Mesh Partitioning For Parallel Substructure Finite Element Computations Shang-Hsien Hsieh, Yuan-Sen Yang and Po-Liang Tsai Department of Civil.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.
Chapter 6 Network Address Translation (NAT). Network Address Translation  Modification of source or destination IP address  Needed by networks using.
1 Resolution of large symmetric eigenproblems on a world-wide grid Laurent Choy, Serge Petiton, Mitsuhisa Sato CNRS/LIFL HPCS Lab. University of Tsukuba.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Numerical Grid Computations with the OPeNDAP Back End Server (BES)
© Fluent Inc. 9/5/2015L1 Fluids Review TRN Solution Methods.
Parallel Processing LAB NO 1.
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
Sandia National Laboratories Graph Partitioning Workshop Oct. 15, Load Balancing Myths, Fictions & Legends Bruce Hendrickson Sandia National Laboratories.
Processing of a CAD/CAE Jobs in grid environment using Elmer Electronics Group, Physics Department, Faculty of Science, Ain Shams University, Mohamed Hussein.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Scotch + HAMD –Hybrid algorithm based on incomplete Nested Dissection, the resulting subgraphs being ordered with an Approximate Minimun Degree method.
A comparison between a direct and a multigrid sparse linear solvers for highly heterogeneous flux computations A. Beaudoin, J.-R. De Dreuzy and J. Erhel.
CENTRAL AEROHYDRODYNAMIC INSTITUTE named after Prof. N.E. Zhukovsky (TsAGI) Multigrid accelerated numerical methods based on implicit scheme for moving.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
Supercomputing Center CFD Grid Research in N*Grid Project KISTI Supercomputing Center Chun-ho Sung.
After step 2, processors know who owns the data in their assumed partitions— now the assumed partition defines the rendezvous points Scalable Conceptual.
CFD Lab - Department of Engineering - University of Liverpool Ken Badcock & Mark Woodgate Department of Engineering University of Liverpool Liverpool L69.
Introduction to Parallel Finite Element Method using GeoFEM/HPC-MW Kengo Nakajima Dept. Earth & Planetary Science The University of Tokyo VECPAR’06 Tutorial:
Heavy and lightweight dynamic network services: challenges and experiments for designing intelligent solutions in evolvable next generation networks Laurent.
Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
The swiss-carpet preconditioner: a simple parallel preconditioner of Dirichlet-Neumann type A. Quarteroni (Lausanne and Milan) M. Sala (Lausanne) A. Valli.
A High Performance Middleware in Java with a Real Application Fabrice Huet*, Denis Caromel*, Henri Bal + * Inria-I3S-CNRS, Sophia-Antipolis, France + Vrije.
Leibniz Supercomputing Centre Garching/Munich Matthias Brehm HPC Group June 16.
PaStiX : how to reduce memory overhead ASTER meeting Bordeaux, Nov 12-14, 2007 PaStiX team LaBRI, UMR CNRS 5800, Université Bordeaux I Projet ScAlApplix,
1 HPC Middleware on GRID … as a material for discussion of WG5 GeoFEM/RIST August 2nd, 2001, ACES/GEM at MHPCC Kihei, Maui, Hawaii.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Scotch + HAMD –Hybrid algorithm based on incomplete Nested Dissection, the resulting subgraphs being ordered with an Approximate Minimun Degree method.
Connections to Other Packages The Cactus Team Albert Einstein Institute
High performance computing for Darcy compositional single phase fluid flow simulations L.Agélas, I.Faille, S.Wolf, S.Réquena Institut Français du Pétrole.
An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.
What Programming Paradigms and algorithms for Petascale Scientific Computing, a Hierarchical Programming Methodology Tentative Serge G. Petiton June 23rd,
Data Structures and Algorithms in Parallel Computing Lecture 7.
Linear Algebra Operators for GPU Implementation of Numerical Algorithms J. Krüger R. Westermann computer graphics & visualization Technical University.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
HPC across Heterogeneous Resources Sathish Vadhiyar.
1 Network Address Translation. 2 Network Address Translation (NAT) Extension of original addressing scheme Motivated by exhaustion of IP address space.
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
HYDROGRID J. Erhel – October 2004 Components and grids  Deployment of components  CORBA model  Parallel components with GridCCM Homogeneous cluster.
University of Texas at Arlington Scheduling and Load Balancing on the NASA Information Power Grid Sajal K. Das, Shailendra Kumar, Manish Arora Department.
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Computational Physics (Lecture 16) PHY4061. Typical initial-value problems: – time-dependent diffusion equation, – the time-dependent wave equation Some.
On the Path to Trinity - Experiences Bringing Codes to the Next Generation ASC Platform Courtenay T. Vaughan and Simon D. Hammond Sandia National Laboratories.
Solving linear systems in fluid dynamics P. Aaron Lott Applied Mathematics and Scientific Computation Program University of Maryland.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
VGrADS and GridSolve Asim YarKhan Jack Dongarra, Zhiao Shi, Fengguang Song Innovative Computing Laboratory University of Tennessee VGrADS Workshop – September.
Xing Cai University of Oslo
Conception of parallel algorithms
Programming Models for SimMillennium
Unstructured Grids at Sandia National Labs
Computational Physics (Lecture 16)
Meros: Software for Block Preconditioning the Navier-Stokes Equations
Chapter 17: Database System Architectures
Hybrid Programming with OpenMP and MPI
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Presentation transcript:

SOME EXPERIMENTS on GRID COMPUTING in COMPUTATIONAL FLUID DYNAMICS Thierry Coupez(**), Alain Dervieux(*), Hugues Digonnet(**), Hervé Guillard(*), Jacques Massoni (***), Vanessa Mariotti(*), Youssef Mesri(*), Patrick Nivet (*), Steve Wornom(*)

Large scale computations and CFD  Turbulent flows, Required number of mesh points : N = Re ^9/4  Laboratory experiment : Re =  Industrial devices : Re =  Geophysical flows : Re =

Future of large scale computations in CFD What kind of architecture for these computations ? Super clusters, e.g Tera10 machine of DAM CEA 4532 proc Intel Titanium Grid architecture ? M mesh 10 M mesh 100 M mesh 1 Tflops 10 Tflops 100 Tflops

End-users requirements Transparent solution : The grid must be view as a single unified ressource by the end-users No important code modifications : codes using Fortran/MPI And C/C++/MPI must run on the grid Secure

Mecagrid :project Started 11/2002 Connect 3 sites in The PACA region Perform experiments In grid computing Applied to multimaterial Fluid dynamics

Set-up of the Grid Marseille and CEMEF clusters are private IP address Only front-end are routable through the internet Solution : create a VPN, front end are connected by a tunnel where packets are crypted and transmitted Installation of the Globus middleware () Message passing : MPICH-G2

The MecaGrid : heterogeneous architecture of 162 procs INRIA Sophia pf nina CEMEF Sophia IUSTI Marseille N=32, bi-proc Sp=2.4Ghz Vpq=100Mb/s N=32, mono-proc Sp=2.4Ghz Vpq=100Mb/s N=19, bi-proc Sp=2.4Ghz Vpq=1Gb/s N=16, bi-proc Sp=933Mhz Vpq=100Mb/s 10Mb/s 100Mb/s 10Mb/s

The Mecagrid : mesured performances INRIA Sophia pf nina CEMEF Sophia IUSTI Marseille N=32, bi- proc Sp=2.4Ghz Vpq=100Mb /s N=32, mono- proc Sp=2.4Ghz Vpq=100Mb/s N=16, bi- proc Sp=2.4Ghz Vpq=1Gb/s N=16, bi- proc Sp=933Mhz Vpq=100Mb /s 100Mb/s 3.7Mb/s Stability of the External network 7.2Mb/s 5Mb/s

CFD and parallelism SPMD model Mesh Partitioning Initial mesh Sub-domain 1Sub-domain 3Sub-domain 2 Solver Data solution Message passing Message passing

CODE PORTING AERO-3D Finite volume code using Fortran77/MPI 3D Compressible Navier-Stokes equations with Turbulence modeling ( instructions) Rewrite the code in fortran 90 AEDIPH Finite volume code designed for multimaterial Studies CIMlib library of CEMEF : a C++/MPI finite element library Solving multimaterial incompressible flows

Test case : Jet in cross flow 3D LES Turbulence Modeling, Compressible Flow, explicit solver Results for 32 partitions 100 time steps Sophia clusters Sophia1-Marseille Sophia2-Marseille 241K mesh 729 s 817 s 1181 Com/work 9% 69% 46% 400K mesh 827 s Com/work 1% 13% 6%

Test case 2: 3D Dam break pb 3-D Incompressible Navier-Stokes computation, Level-set representation of the interface with Hamilton-Jacobi reinitialization, Iterative implicit scheme using GMRES (MINRES) preconditioned with ILU, 600 time steps

3D DAM BREAK RESULTS 500 K mesh, 2.5M elements 600 time steps : Implicit code : 600 2Mx2M linear systems solved Results on 3 x 4 proc on 3 different clusters : 60 h With optimisation of the code for the grid : 37 h 1.5 M mesh, 8.7 M elements 600 time steps : Implicit code : 600 6Mx6M linear systems solved Results on 3 x 11 proc on 3 different clusters : 125 h

PROVISIONAL CONCLUSIONS : Mecagrid gives access to a large number of processors and the possibility to run larger applications than on a in-home cluster For sufficient large applications : compete with an in home cluster No significant communications overhead for sufficient large applications HOWEVER Fine tuning of the application codes to obtain good efficiency Algorithmic developments

Heterogeneous Mesh partitioning The mapping problem : find the mesh partition that minimise the CPU time Homogeneous (cluster architecture) : load balancing Heterogeneous (Grid):

Algorithmic Developpements Iterative linear solvers : b = AX A sparse X X + P (b-AX) P : Preconditioning matrix LU factorization of A : A = LU P : ILU (0), ILU(1), …ILU(k) ILU(0) ILU(1) ILU(2) ILU(3) Normalized # iter CPU cluster CPU Mecagrid

Hierarchical mesh partitioner Initial mesh partitioner

Heterogeneous Mesh partitioning : Test case on 32 proc, mesh size 400 K Sophia-MRS(hetero) Sophia1-Sophia2(hetero) Sophia1-Sophia2(homo) Sophia-MRS(homo) CPU Time clusters Gain of more than 75% !

Conclusions Grid appears as a viable alternative to the use of specialized super-clusters for large scale CFD computations From the point of view of the numerical analysis, grid architectures are a source of new questions : Mesh and graph partitioning Linear solvers Communication and latency hiding schemes ….