Integrating Trilinos Solvers to SEAM code Dagoberto A.R. Justo – UNM Tim Warburton – UNM Bill Spotz – Sandia
SEAM (NCAR) SpectralElementAtmosphericMethod AztecOO AztecOO Epetra Epetra Nox Nox Ifpack Ifpack PETSc PETSc Komplex Komplex Trilinos (Sandia Lab)
AztecOO Solvers Solvers –CG, CGS, BICGStab, GMRES, Tfqmr Preconditioners Preconditioners –Diagonal Jacobi, Least Square, Neumann, Domain Decomposition, Symmetric Gauss-Seidel Matrix Free implementation Matrix Free implementation C++ (Fortran interface) C++ (Fortran interface) MPI MPI
Implementation SEAM CODE.. Pcg_solver. (F90) Pcg_solver. Aztec_solvers( ). (F90) Sub Aztec_solvers. AZ_Iterate( ) (C) Matrix_vector_C (C) Matrix_vector. (F90) Prec_Jacobi. (F90) Prec_Jacobi_C (C) AZTECAZTEC
Machines used Pentium III Notebook (serial) Pentium III Notebook (serial) –Linux, LAM-MPI, Intel Compilers Los Lobos at Los Lobos at –Linux Cluster –256 nodes –IBM Pentium III 750 MHz, 256 KB L2 Cache, 1 Gb RAM –Portland Group compiler –MPICH for Myrinet interconnections
Graphical Results from SEAM Energy Mass
Memory (in Mbytes per processor)
Speed Up From 1 to 160 processors. From 1 to 160 processors. Time of Simulation Time of Simulation 144 time iterations 144 time iterations x 300 s = 12 h simulation x 300 s = 12 h simulation Verify results using mass, energy,… Verify results using mass, energy,… –(Different result for 1 proc)
Speed Up – SEAM selecting # of elements ne=24x24x6
Speed Up – SEAM selecting order np=6
Speed Up – SEAM+Aztec best: cgs solver
Speed Up – SEAM+Aztec best: cgs solver + Least Square preconditioner
Speed Up – SEAM+Aztec increasing np -> increases speedup
Upshot – SEAM (One CG iteration)
Upshot – SEAM (matrix times vector communication)
Upshot – SEAM+Aztec (One CG iteration)
Upshot – SEAM+Aztec (Matrix times vector communication)
Upshot – SEAM+Aztec (Vector Reduction)
Time (24x24x6 elements, 2 proc.) SolverIter. Time (loop) Time/iter SEAM p= it 7.48 s 0.22 s/it SEAM p= it 81.2 s 1.42 s/it Cg p= it 28.2 s 0.32 s/it Cgs p= it 28.6 s 0.38 s/it Tfqmr p= it 31.1 s 0.41 s/it Bicg p= it 29.4 s 0.31 s/it Cgs ls p= it 42.0 s 1.19 s/it CG Jacobi p= it 17.2 s 0.37 s/it Cgs Jacobip= it 15.3 s 0.48 s/it Cgs p= it 274. S 4.53 s/it
Conclusions & Suggested Future Efforts SEAM+Aztec works! SEAM+Aztec works! SEAM+Aztec is 2x slower SEAM+Aztec is 2x slower difference in CG algorithms SEAM+Aztec time-iteration is 50% slower 0.1% of time lost in calls, preparation for Aztec. More time better tune-up. More time better tune-up. Domain decomposition Preconditioners Domain decomposition Preconditioners
SEAM + Aztec works! SEAM + Aztec works! More time better tune-up. More time better tune-up. Conclusions & Suggested Future Efforts