Presentation is loading. Please wait.

Presentation is loading. Please wait.

A New Domain Decomposition Technique for TeraGrid Simulations Leopold Grinberg 1 Brian Toonen 2 Nicholas Karonis 2,3 George Em Karniadakis 1 1 Brown University.

Similar presentations


Presentation on theme: "A New Domain Decomposition Technique for TeraGrid Simulations Leopold Grinberg 1 Brian Toonen 2 Nicholas Karonis 2,3 George Em Karniadakis 1 1 Brown University."— Presentation transcript:

1 A New Domain Decomposition Technique for TeraGrid Simulations Leopold Grinberg 1 Brian Toonen 2 Nicholas Karonis 2,3 George Em Karniadakis 1 1 Brown University Division of Applied Mathematics 2 Northern Illinois University 3 Argonne National Laboratory This work was supported by the National Science Foundation’s CI-TEAM program under contract OCI-0636412 and NSF NMI program under contract OCI-0330664.

2 How large is the problem we want to solve ?  On the average, an adult human who weighs 70 kg. has a blood volume of 5 liters. Typical volume of tetrahedral computational elements with an edge of 0.5 mm is 0.0147 mm^3.  339M tetrahedral elements.  339E6*(P+3)(P+2)^2 = 85.4 E9 Degrees of Freedom per one variable (P=4).  A human being is a multicellular eukaryote consisting of an estimated 100 trillion cells …

3 Motivation Hardware limits :  Solution of a large scale problem requires thousands of processors.  Parallel efficiency is strongly affected by communication cost.  Low memory / per core availability. Solution of large linear systems is extremely expensive :  Large condition number results in high iteration count.  The most effective preconditioners do not scale well on more than a thousand of processors. Our solution: a multi-layer hierarchical approach.

4 Outline Software: NEKTAR-G2 Multilevel Partitioning Solution of Large Scale Problem: –on a single supercomputer –on TeraGrid Summary

5 NEKTAR-G2  NEKTAR-G2 Prototype  The New Domain Decomposition Technique: The Idea

6 Nektar-G2 Prototype 1  Overall simulation consists of –1D computation through the full arterial tree; –Detailed 3D simulations on arterial bifurcations.  1D results feed 3D simulations, providing flow rate and pressure for boundary conditions.  MPICH-G2 was used for intra-site and inter-site communications on TeraGrid. 1D model 3d simulation 1d data 3d simulation 1d data SDSC TACC UC/ANL NCSA PSC 1 S. Dong et al., “Simulating and visualizing the human arterial system on the TeraGrid”, Future Generation Computer Systems, Volume 22, Issue 8, October 2006, pp. 1011 - 1017

7 The New Domain Decomposition Technique for TeraGrid Simulations SDSC TACC UC/ANL NCSA PSC ORNL PU IU The new version of Nektar-G2 features direct coupling between 3D domains. Increased volume of data transfer between 3D blocks requires high level of parallelism.

8 Domain Decomposition Technique for TeraGrid Simulations: Method Numerical solution is performed on two levels: on outer level loosely coupled problem is solved, on inner level several tightly problems are solved in parallel. Multi-level partitioning of the entire computational domain requires multi-level parallelism in order to maintain high parallel efficiency using thousands of processors. Solution of tightly coupled problems is performed by NEKTAR, a parallel numerical library based on the high-order spectral/hp element method 1. Continuity in numerical solution is achieved by imposing proper boundary conditions on the sub-domains interfaces. On TeraGrid inter-site communication required on the outer level is performed by MPICH-G2 library 2. 1 Karniadakis & Sherwin, Spectral/hp Element Methods for CFD, 2005, 2 nd edition, Oxford University Press 2 N. Karonis et al, A Grid-Enabled Implementation of the Message Passing Interface, (JPDC), Vol. 63, No. 5, pp. 551-563, May 2003

9 Multilevel Partitioning of Global Communicator  High Level Communicator Splitting  Low Level Communicator Splitting  Message Passing Across Sub-Domain Interface

10 Dual Domain Decomposition: High Level Communicator Splitting MPICOMMWOLRD S1S2S3 1D3DD33D TOPOLOGY AWARE DECOMPOSITION TASK ORIENTED DECOMPOSITION

11 Dual Domain Decomposition : Low Level Communicator Splitting 0 0 1 2 3 4 5 6 7 8 9 10 12 14 13 0 1 2 3 4 5 6 7 8 9 10 12 14 13 0 1 2 3 4 5 6 7 8 9 10 12 14 13 Inlet Outlet 1 Outlet 2 Inlet Outlet 1 Outlet 2 Outlet 3 Inlet Outlet 1

12 Message Passing Across Sub-Domain Interface 0 1 2 3 4 5 6 7 8 9 10 12 14 13 0 1 2 3 4 5 6 7 8 9 10 12 14 13 MPI_Gatherv MPI_IsendMPI_Irecv MPI_Scaterv MPI_Gatherv MPI_Scaterv

13 Dual Domain Decomposition Method: details  Interface Boundary Condition  Accuracy  Efficiency

14 Dual Domain Decomposition: Interface Boundary Conditions Flow VnVn P n, dV n /dn Sub-domain A Sub-domain B outlet inlet Boundary ConditionMessage size V elocity is computed at outlet and imposed as Dirichlet Boundary Condition at inlet. N F *N M *3* sizeof(double) P ressure is computed at inlet and imposed as Dirichlet Boundary Condition at outlet. N F *N M * sizeof(double) V elocity flux from inlet is averaged with velocity flux computed at outlet and imposed as Newman Boundary Condition at outlet. N F *(P+3)*(P+2)* sizeof(double) N F, N M – number of faces and modes, P – order of polynomial approximation.

15 Dual Domain Decomposition: Accuracy Simulation of steady flow in converging pipe using Dual Domain Decomposition. Sub-domain interface is at z=5. Re = 200; P = 5. Plot on the right depicts the Wall Shear Stress: solid line- solution with dual domain decomposition. dash line – solution with standard domain decomposition.

16 Dual Domain Decomposition: Efficiency Mean CPU time required per time step. Problem size: 67456 tetrahedral elements, polynomial order P=4 and P=5. Computations were performed on the CRAY XT3 at PSC.

17 Solution of Large Scale Problem with the New Domain Decomposition Method  Numerical Simulation of a Flow in Aorta  Communication/Computation Time Balance

18 Numerical Simulation of Flow in Human Aorta with Dual Domain Decomposition # of elementsPolynomial order # of DOF Domain A120,8134 (6) 30,444,876 (69,588,288) Domain B20,7974 (6) 5,240,844 (11,979,072) Domain C106,2194 (6) 26,767,188 (61,182,144) Domain D77,9664 (6) 19,647,432 (44,908,416) Total325,7954 (6) 82,100,340 (187,657,920) A B C D

19 Communication / Computation CPU-time balance Simulation of a blood flow in Aorta. Nelements = 325,795; P = 6. Computation was performed on CRAY XT3 with 226 and 508 processors. A B C D

20 Standard and Dual Domain Decomposition: Parallel Speed-up Simulation of a blood flow in Aorta. N elements = 325,795; P = 6. Computation was performed on CRAY XT3. 4.06 sec 1.24 sec 0.77 sec

21 The New Domain Decomposition Method on TeraGrid  Cross-site computation: sample problem  Cross-site computation: large scale problem

22 Cross-site computation: sample problem A B Problem size: N el = 8432 x 2; P = 6. 12 fast CPU 12 slow CPU 12 fast CPU 24 slow CPU

23 Cross-site computation: Arterial Flow Simulation A B C # of CPU# of elementsPolynomial order# of DOF Domain A (SDSC)66120,8134 30,444,876 Domain B (UC/ANL)1220,7974 5,240,844 Domain C (NCSA)66106,2194 26,767,188 Total144247,8294 62,452,908

24 Summary We developed and implemented a new scalable approach for solution of large problems on the TeraGrid and beyond. Overlapping computation with cross-site communication, performing simultaneous communication and full-duplex communication over multiple channels hides the expensive inter-site latency on TeraGrid. Future plan: improve communication algorithm (between different process groups).

25 Thank you!

26 3D CT SCAN 3D SOLID MODEL DOMAIN DECOMPOSITION NUMERICAL SIMULATION Reconstruction of Aorta from CT images

27 Spectral Element Method Flexible: Efficient discretization for complex geometries; Standard finite element meshes, structured or un-structured; High-order polynomial expansion within each element; high-order method; Inherently contains multi-scale modeling capability; Dual path to convergence: p-type refinement and h-type refinement Spectral elements Karniadakis & Sherwin, Spectral/hp Element Methods for CFD, 2005, 2 nd edition, Oxford University Press

28 Computational Algorithm for step=1:NSTEPS Irecv(P); Irecv(V);Irecv(dV/dn) local_gather(V); Isend(V) local_gather(dV/dn); Isend(dV/dn);. local_scatter(P); local_scatter(V); local_scatter(dV/dn);. P_solve local_gather(P); Isend(P). V_solve. end

29 Standard Domain Decomposition: Parallel Speed-up Simulation of a blood flow in Aorta. N elements = 325,795; P = 6. Computation was performed on CRAY XT3.

30 Outflow Boundary Condition for Terminal Vessels A B C D

31


Download ppt "A New Domain Decomposition Technique for TeraGrid Simulations Leopold Grinberg 1 Brian Toonen 2 Nicholas Karonis 2,3 George Em Karniadakis 1 1 Brown University."

Similar presentations


Ads by Google