A New Domain Decomposition Technique for TeraGrid Simulations Leopold Grinberg 1 Brian Toonen 2 Nicholas Karonis 2,3 George Em Karniadakis 1 1 Brown University.

Slides:



Advertisements
Similar presentations
1 Numerical Simulation for Flow in 3D Highly Heterogeneous Fractured Media H. Mustapha J. Erhel J.R. De Dreuzy H. Mustapha INRIA, SIAM Juin 2005.
Advertisements

Advanced CFD Analysis of Aerodynamics Using CFX
Coupling Continuum Model and Smoothed Particle Hydrodynamics Methods for Reactive Transport Yilin Fang, Timothy D Scheibe and Alexandre M Tartakovsky Pacific.
1 A component mode synthesis method for 3D cell by cell calculation using the mixed dual finite element solver MINOS P. Guérin, A.M. Baudron, J.J. Lautard.
Evan Greer, Mentor: Dr. Marcelo Kobayashi, HARP REU Program August 2, 2012 Contact: globalwindgroup.com.
OpenFOAM on a GPU-based Heterogeneous Cluster
A Bezier Based Approach to Unstructured Moving Meshes ALADDIN and Sangria Gary Miller David Cardoze Todd Phillips Noel Walkington Mark Olah Miklos Bergou.
A Bezier Based Approach to Unstructured Moving Meshes ALADDIN and Sangria Gary Miller David Cardoze Todd Phillips Noel Walkington Mark Olah Miklos Bergou.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Thermo-fluid Analysis of Helium cooling solutions for the HCCB TBM Presented By: Manmeet Narula Alice Ying, Manmeet Narula, Ryan Hunt and M. Abdou ITER.
George Em Karniadakis Division of Applied Mathematics The CRUNCH group: Cross-Site Simulations on the TeraGrid spectral elementsMicro.
Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus Gabrielle Allen, Thomas Dramlitsch, Ian Foster,
Steady Aeroelastic Computations to Predict the Flying Shape of Sails Sriram Antony Jameson Dept. of Aeronautics and Astronautics Stanford University First.
Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.
Multi-Scale Finite-Volume (MSFV) method for elliptic problems Subsurface flow simulation Mark van Kraaij, CASA Seminar Wednesday 13 April 2005.
Network and Grid Computing –Modeling, Algorithms, and Software Mo Mu Joint work with Xiao Hong Zhu, Falcon Siu.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Massively Parallel Magnetohydrodynamics on the Cray XT3 Joshua Breslau and Jin Chen Princeton Plasma Physics Laboratory Cray XT3 Technical Workshop Nashville,
1 A Domain Decomposition Analysis of a Nonlinear Magnetostatic Problem with 100 Million Degrees of Freedom H.KANAYAMA *, M.Ogino *, S.Sugimoto ** and J.Zhao.
Hybrid WENO-FD and RKDG Method for Hyperbolic Conservation Laws
Definitions Spectral Elements – data structures that contain information about data at points within each geometric entity. Finite elements only hold information.
I-DEAS 11 TMG Thermal and ESC Flow New Features
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Grid Computing With Charm++ And Adaptive MPI Gregory A. Koenig Department of Computer Science University of Illinois.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
ME 566 Computer Lab ANSYS–CFX Tutorial Oct. 5, :30 – 4:30 pm
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
CFD Lab - Department of Engineering - University of Liverpool Ken Badcock & Mark Woodgate Department of Engineering University of Liverpool Liverpool L69.
Parallel Simulation of Continuous Systems: A Brief Introduction
Discontinuous Galerkin Methods for Solving Euler Equations Andrey Andreyev Advisor: James Baeder Mid.
METHODS CT scans were segmented and triangular surface meshes generated using Amira. Antiga and Steinman’s method (2004) for automatically extracting parameterized.
FSI for Assessing Nerve Injury During Whiplash Motion
Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.
© Fluent Inc. 11/24/2015J1 Fluids Review TRN Overview of CFD Solution Methodologies.
CFX-10 Introduction Lecture 1.
Adaptive Meshing Control to Improve Petascale Compass Simulations Xiao-Juan Luo and Mark S Shephard Scientific Computation Research Center (SCOREC) Interoperable.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
COMPUTER SIMULATION OF BLOOD FLOW WITH COMPLIANT WALLS  ITC Software All rights reserved.
Present / introduce / motivate After Introduction to the topic
Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.
23/5/20051 ICCS congres, Atlanta, USA May 23, 2005 The Deflation Accelerated Schwarz Method for CFD C. Vuik Delft University of Technology
Data Structures and Algorithms in Parallel Computing Lecture 7.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09,
Brookhaven Science Associates U.S. Department of Energy MERIT Project Review December 12, 2005, BNL, Upton NY MHD Studies of Mercury Jet Target Roman Samulyak.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
Parallel Computing Presented by Justin Reschke
Adaptive grid refinement. Adaptivity in Diffpack Error estimatorError estimator Adaptive refinementAdaptive refinement A hierarchy of unstructured gridsA.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0.
Tony Arts Carlo Benocci Patrick Rambaud
An Introduction to Computational Fluids Dynamics Prapared by: Chudasama Gulambhai H ( ) Azhar Damani ( ) Dave Aman ( )
Solving linear systems in fluid dynamics P. Aaron Lott Applied Mathematics and Scientific Computation Program University of Maryland.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Computational Fluid Dynamics Lecture II Numerical Methods and Criteria for CFD Dr. Ugur GUVEN Professor of Aerospace Engineering.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Xing Cai University of Oslo
Challenges in Electromagnetic Modeling Scalable Solvers
Data Structures for Efficient and Integrated Simulation of Multi-Physics Processes in Complex Geometries A.Smirnov MulPhys LLC github/mulphys
L Ge, L Lee, A. Candel, C Ng, K Ko, SLAC
A New Approach in Processing Atomistic Simulations
AIAA OBSERVATIONS ON CFD SIMULATION UNCERTAINITIES
GENERAL VIEW OF KRATOS MULTIPHYSICS
Supported by the National Science Foundation.
Hybrid Programming with OpenMP and MPI
Comparison of CFEM and DG methods
Low Order Methods for Simulation of Turbulence in Complex Geometries
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Presentation transcript:

A New Domain Decomposition Technique for TeraGrid Simulations Leopold Grinberg 1 Brian Toonen 2 Nicholas Karonis 2,3 George Em Karniadakis 1 1 Brown University Division of Applied Mathematics 2 Northern Illinois University 3 Argonne National Laboratory This work was supported by the National Science Foundation’s CI-TEAM program under contract OCI and NSF NMI program under contract OCI

How large is the problem we want to solve ?  On the average, an adult human who weighs 70 kg. has a blood volume of 5 liters. Typical volume of tetrahedral computational elements with an edge of 0.5 mm is mm^3.  339M tetrahedral elements.  339E6*(P+3)(P+2)^2 = 85.4 E9 Degrees of Freedom per one variable (P=4).  A human being is a multicellular eukaryote consisting of an estimated 100 trillion cells …

Motivation Hardware limits :  Solution of a large scale problem requires thousands of processors.  Parallel efficiency is strongly affected by communication cost.  Low memory / per core availability. Solution of large linear systems is extremely expensive :  Large condition number results in high iteration count.  The most effective preconditioners do not scale well on more than a thousand of processors. Our solution: a multi-layer hierarchical approach.

Outline Software: NEKTAR-G2 Multilevel Partitioning Solution of Large Scale Problem: –on a single supercomputer –on TeraGrid Summary

NEKTAR-G2  NEKTAR-G2 Prototype  The New Domain Decomposition Technique: The Idea

Nektar-G2 Prototype 1  Overall simulation consists of –1D computation through the full arterial tree; –Detailed 3D simulations on arterial bifurcations.  1D results feed 3D simulations, providing flow rate and pressure for boundary conditions.  MPICH-G2 was used for intra-site and inter-site communications on TeraGrid. 1D model 3d simulation 1d data 3d simulation 1d data SDSC TACC UC/ANL NCSA PSC 1 S. Dong et al., “Simulating and visualizing the human arterial system on the TeraGrid”, Future Generation Computer Systems, Volume 22, Issue 8, October 2006, pp

The New Domain Decomposition Technique for TeraGrid Simulations SDSC TACC UC/ANL NCSA PSC ORNL PU IU The new version of Nektar-G2 features direct coupling between 3D domains. Increased volume of data transfer between 3D blocks requires high level of parallelism.

Domain Decomposition Technique for TeraGrid Simulations: Method Numerical solution is performed on two levels: on outer level loosely coupled problem is solved, on inner level several tightly problems are solved in parallel. Multi-level partitioning of the entire computational domain requires multi-level parallelism in order to maintain high parallel efficiency using thousands of processors. Solution of tightly coupled problems is performed by NEKTAR, a parallel numerical library based on the high-order spectral/hp element method 1. Continuity in numerical solution is achieved by imposing proper boundary conditions on the sub-domains interfaces. On TeraGrid inter-site communication required on the outer level is performed by MPICH-G2 library 2. 1 Karniadakis & Sherwin, Spectral/hp Element Methods for CFD, 2005, 2 nd edition, Oxford University Press 2 N. Karonis et al, A Grid-Enabled Implementation of the Message Passing Interface, (JPDC), Vol. 63, No. 5, pp , May 2003

Multilevel Partitioning of Global Communicator  High Level Communicator Splitting  Low Level Communicator Splitting  Message Passing Across Sub-Domain Interface

Dual Domain Decomposition: High Level Communicator Splitting MPICOMMWOLRD S1S2S3 1D3DD33D TOPOLOGY AWARE DECOMPOSITION TASK ORIENTED DECOMPOSITION

Dual Domain Decomposition : Low Level Communicator Splitting Inlet Outlet 1 Outlet 2 Inlet Outlet 1 Outlet 2 Outlet 3 Inlet Outlet 1

Message Passing Across Sub-Domain Interface MPI_Gatherv MPI_IsendMPI_Irecv MPI_Scaterv MPI_Gatherv MPI_Scaterv

Dual Domain Decomposition Method: details  Interface Boundary Condition  Accuracy  Efficiency

Dual Domain Decomposition: Interface Boundary Conditions Flow VnVn P n, dV n /dn Sub-domain A Sub-domain B outlet inlet Boundary ConditionMessage size V elocity is computed at outlet and imposed as Dirichlet Boundary Condition at inlet. N F *N M *3* sizeof(double) P ressure is computed at inlet and imposed as Dirichlet Boundary Condition at outlet. N F *N M * sizeof(double) V elocity flux from inlet is averaged with velocity flux computed at outlet and imposed as Newman Boundary Condition at outlet. N F *(P+3)*(P+2)* sizeof(double) N F, N M – number of faces and modes, P – order of polynomial approximation.

Dual Domain Decomposition: Accuracy Simulation of steady flow in converging pipe using Dual Domain Decomposition. Sub-domain interface is at z=5. Re = 200; P = 5. Plot on the right depicts the Wall Shear Stress: solid line- solution with dual domain decomposition. dash line – solution with standard domain decomposition.

Dual Domain Decomposition: Efficiency Mean CPU time required per time step. Problem size: tetrahedral elements, polynomial order P=4 and P=5. Computations were performed on the CRAY XT3 at PSC.

Solution of Large Scale Problem with the New Domain Decomposition Method  Numerical Simulation of a Flow in Aorta  Communication/Computation Time Balance

Numerical Simulation of Flow in Human Aorta with Dual Domain Decomposition # of elementsPolynomial order # of DOF Domain A120,8134 (6) 30,444,876 (69,588,288) Domain B20,7974 (6) 5,240,844 (11,979,072) Domain C106,2194 (6) 26,767,188 (61,182,144) Domain D77,9664 (6) 19,647,432 (44,908,416) Total325,7954 (6) 82,100,340 (187,657,920) A B C D

Communication / Computation CPU-time balance Simulation of a blood flow in Aorta. Nelements = 325,795; P = 6. Computation was performed on CRAY XT3 with 226 and 508 processors. A B C D

Standard and Dual Domain Decomposition: Parallel Speed-up Simulation of a blood flow in Aorta. N elements = 325,795; P = 6. Computation was performed on CRAY XT sec 1.24 sec 0.77 sec

The New Domain Decomposition Method on TeraGrid  Cross-site computation: sample problem  Cross-site computation: large scale problem

Cross-site computation: sample problem A B Problem size: N el = 8432 x 2; P = fast CPU 12 slow CPU 12 fast CPU 24 slow CPU

Cross-site computation: Arterial Flow Simulation A B C # of CPU# of elementsPolynomial order# of DOF Domain A (SDSC)66120, ,444,876 Domain B (UC/ANL)1220,7974 5,240,844 Domain C (NCSA)66106, ,767,188 Total144247, ,452,908

Summary We developed and implemented a new scalable approach for solution of large problems on the TeraGrid and beyond. Overlapping computation with cross-site communication, performing simultaneous communication and full-duplex communication over multiple channels hides the expensive inter-site latency on TeraGrid. Future plan: improve communication algorithm (between different process groups).

Thank you!

3D CT SCAN 3D SOLID MODEL DOMAIN DECOMPOSITION NUMERICAL SIMULATION Reconstruction of Aorta from CT images

Spectral Element Method Flexible: Efficient discretization for complex geometries; Standard finite element meshes, structured or un-structured; High-order polynomial expansion within each element; high-order method; Inherently contains multi-scale modeling capability; Dual path to convergence: p-type refinement and h-type refinement Spectral elements Karniadakis & Sherwin, Spectral/hp Element Methods for CFD, 2005, 2 nd edition, Oxford University Press

Computational Algorithm for step=1:NSTEPS Irecv(P); Irecv(V);Irecv(dV/dn) local_gather(V); Isend(V) local_gather(dV/dn); Isend(dV/dn);. local_scatter(P); local_scatter(V); local_scatter(dV/dn);. P_solve local_gather(P); Isend(P). V_solve. end

Standard Domain Decomposition: Parallel Speed-up Simulation of a blood flow in Aorta. N elements = 325,795; P = 6. Computation was performed on CRAY XT3.

Outflow Boundary Condition for Terminal Vessels A B C D