Presentation is loading. Please wait.

Presentation is loading. Please wait.

A New Domain Decomposition Technique for TeraGrid Simulations Leopold Grinberg 1 Brian Toonen 2 Nicholas Karonis 2,3 George Em Karniadakis 1 1 Brown University.

Similar presentations

Presentation on theme: "A New Domain Decomposition Technique for TeraGrid Simulations Leopold Grinberg 1 Brian Toonen 2 Nicholas Karonis 2,3 George Em Karniadakis 1 1 Brown University."— Presentation transcript:

1 A New Domain Decomposition Technique for TeraGrid Simulations Leopold Grinberg 1 Brian Toonen 2 Nicholas Karonis 2,3 George Em Karniadakis 1 1 Brown University Division of Applied Mathematics 2 Northern Illinois University 3 Argonne National Laboratory This work was supported by the National Science Foundation’s CI-TEAM program under contract OCI-0636412 and NSF NMI program under contract OCI-0330664.

2 How large is the problem we want to solve ?  On the average, an adult human who weighs 70 kg. has a blood volume of 5 liters. Typical volume of tetrahedral computational elements with an edge of 0.5 mm is 0.0147 mm^3.  339M tetrahedral elements.  339E6*(P+3)(P+2)^2 = 85.4 E9 Degrees of Freedom per one variable (P=4).  A human being is a multicellular eukaryote consisting of an estimated 100 trillion cells …

3 Motivation Hardware limits :  Solution of a large scale problem requires thousands of processors.  Parallel efficiency is strongly affected by communication cost.  Low memory / per core availability. Solution of large linear systems is extremely expensive :  Large condition number results in high iteration count.  The most effective preconditioners do not scale well on more than a thousand of processors. Our solution: a multi-layer hierarchical approach.

4 Outline Software: NEKTAR-G2 Multilevel Partitioning Solution of Large Scale Problem: –on a single supercomputer –on TeraGrid Summary

5 NEKTAR-G2  NEKTAR-G2 Prototype  The New Domain Decomposition Technique: The Idea

6 Nektar-G2 Prototype 1  Overall simulation consists of –1D computation through the full arterial tree; –Detailed 3D simulations on arterial bifurcations.  1D results feed 3D simulations, providing flow rate and pressure for boundary conditions.  MPICH-G2 was used for intra-site and inter-site communications on TeraGrid. 1D model 3d simulation 1d data 3d simulation 1d data SDSC TACC UC/ANL NCSA PSC 1 S. Dong et al., “Simulating and visualizing the human arterial system on the TeraGrid”, Future Generation Computer Systems, Volume 22, Issue 8, October 2006, pp. 1011 - 1017

7 The New Domain Decomposition Technique for TeraGrid Simulations SDSC TACC UC/ANL NCSA PSC ORNL PU IU The new version of Nektar-G2 features direct coupling between 3D domains. Increased volume of data transfer between 3D blocks requires high level of parallelism.

8 Domain Decomposition Technique for TeraGrid Simulations: Method Numerical solution is performed on two levels: on outer level loosely coupled problem is solved, on inner level several tightly problems are solved in parallel. Multi-level partitioning of the entire computational domain requires multi-level parallelism in order to maintain high parallel efficiency using thousands of processors. Solution of tightly coupled problems is performed by NEKTAR, a parallel numerical library based on the high-order spectral/hp element method 1. Continuity in numerical solution is achieved by imposing proper boundary conditions on the sub-domains interfaces. On TeraGrid inter-site communication required on the outer level is performed by MPICH-G2 library 2. 1 Karniadakis & Sherwin, Spectral/hp Element Methods for CFD, 2005, 2 nd edition, Oxford University Press 2 N. Karonis et al, A Grid-Enabled Implementation of the Message Passing Interface, (JPDC), Vol. 63, No. 5, pp. 551-563, May 2003

9 Multilevel Partitioning of Global Communicator  High Level Communicator Splitting  Low Level Communicator Splitting  Message Passing Across Sub-Domain Interface


11 Dual Domain Decomposition : Low Level Communicator Splitting 0 0 1 2 3 4 5 6 7 8 9 10 12 14 13 0 1 2 3 4 5 6 7 8 9 10 12 14 13 0 1 2 3 4 5 6 7 8 9 10 12 14 13 Inlet Outlet 1 Outlet 2 Inlet Outlet 1 Outlet 2 Outlet 3 Inlet Outlet 1

12 Message Passing Across Sub-Domain Interface 0 1 2 3 4 5 6 7 8 9 10 12 14 13 0 1 2 3 4 5 6 7 8 9 10 12 14 13 MPI_Gatherv MPI_IsendMPI_Irecv MPI_Scaterv MPI_Gatherv MPI_Scaterv

13 Dual Domain Decomposition Method: details  Interface Boundary Condition  Accuracy  Efficiency

14 Dual Domain Decomposition: Interface Boundary Conditions Flow VnVn P n, dV n /dn Sub-domain A Sub-domain B outlet inlet Boundary ConditionMessage size V elocity is computed at outlet and imposed as Dirichlet Boundary Condition at inlet. N F *N M *3* sizeof(double) P ressure is computed at inlet and imposed as Dirichlet Boundary Condition at outlet. N F *N M * sizeof(double) V elocity flux from inlet is averaged with velocity flux computed at outlet and imposed as Newman Boundary Condition at outlet. N F *(P+3)*(P+2)* sizeof(double) N F, N M – number of faces and modes, P – order of polynomial approximation.

15 Dual Domain Decomposition: Accuracy Simulation of steady flow in converging pipe using Dual Domain Decomposition. Sub-domain interface is at z=5. Re = 200; P = 5. Plot on the right depicts the Wall Shear Stress: solid line- solution with dual domain decomposition. dash line – solution with standard domain decomposition.

16 Dual Domain Decomposition: Efficiency Mean CPU time required per time step. Problem size: 67456 tetrahedral elements, polynomial order P=4 and P=5. Computations were performed on the CRAY XT3 at PSC.

17 Solution of Large Scale Problem with the New Domain Decomposition Method  Numerical Simulation of a Flow in Aorta  Communication/Computation Time Balance

18 Numerical Simulation of Flow in Human Aorta with Dual Domain Decomposition # of elementsPolynomial order # of DOF Domain A120,8134 (6) 30,444,876 (69,588,288) Domain B20,7974 (6) 5,240,844 (11,979,072) Domain C106,2194 (6) 26,767,188 (61,182,144) Domain D77,9664 (6) 19,647,432 (44,908,416) Total325,7954 (6) 82,100,340 (187,657,920) A B C D

19 Communication / Computation CPU-time balance Simulation of a blood flow in Aorta. Nelements = 325,795; P = 6. Computation was performed on CRAY XT3 with 226 and 508 processors. A B C D

20 Standard and Dual Domain Decomposition: Parallel Speed-up Simulation of a blood flow in Aorta. N elements = 325,795; P = 6. Computation was performed on CRAY XT3. 4.06 sec 1.24 sec 0.77 sec

21 The New Domain Decomposition Method on TeraGrid  Cross-site computation: sample problem  Cross-site computation: large scale problem

22 Cross-site computation: sample problem A B Problem size: N el = 8432 x 2; P = 6. 12 fast CPU 12 slow CPU 12 fast CPU 24 slow CPU

23 Cross-site computation: Arterial Flow Simulation A B C # of CPU# of elementsPolynomial order# of DOF Domain A (SDSC)66120,8134 30,444,876 Domain B (UC/ANL)1220,7974 5,240,844 Domain C (NCSA)66106,2194 26,767,188 Total144247,8294 62,452,908

24 Summary We developed and implemented a new scalable approach for solution of large problems on the TeraGrid and beyond. Overlapping computation with cross-site communication, performing simultaneous communication and full-duplex communication over multiple channels hides the expensive inter-site latency on TeraGrid. Future plan: improve communication algorithm (between different process groups).

25 Thank you!


27 Spectral Element Method Flexible: Efficient discretization for complex geometries; Standard finite element meshes, structured or un-structured; High-order polynomial expansion within each element; high-order method; Inherently contains multi-scale modeling capability; Dual path to convergence: p-type refinement and h-type refinement Spectral elements Karniadakis & Sherwin, Spectral/hp Element Methods for CFD, 2005, 2 nd edition, Oxford University Press

28 Computational Algorithm for step=1:NSTEPS Irecv(P); Irecv(V);Irecv(dV/dn) local_gather(V); Isend(V) local_gather(dV/dn); Isend(dV/dn);. local_scatter(P); local_scatter(V); local_scatter(dV/dn);. P_solve local_gather(P); Isend(P). V_solve. end

29 Standard Domain Decomposition: Parallel Speed-up Simulation of a blood flow in Aorta. N elements = 325,795; P = 6. Computation was performed on CRAY XT3.

30 Outflow Boundary Condition for Terminal Vessels A B C D


Download ppt "A New Domain Decomposition Technique for TeraGrid Simulations Leopold Grinberg 1 Brian Toonen 2 Nicholas Karonis 2,3 George Em Karniadakis 1 1 Brown University."

Similar presentations

Ads by Google