Integrating Trilinos Solvers to SEAM code Dagoberto A.R. Justo – UNM Tim Warburton – UNM Bill Spotz – Sandia.

Slides:



Advertisements
Similar presentations
MLD2P4: a package of parallel algebraic multilevel Preconditioners Pasqua DAmbra, Institute for High-Performance Computing and Networking (ICAR-CNR), Naples.
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Delivering High Performance to Parallel Applications Using Advanced Scheduling Nikolaos Drosinos, Georgios Goumas Maria Athanasaki and Nectarios Koziris.
 Copyright, HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919) P.O. Box 569, Chapel Hill, NC.
MPJava : High-Performance Message Passing in Java using Java.nio Bill Pugh Jaime Spacco University of Maryland, College Park.
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems.
Rayan Alsemmeri Amseena Mansoor. LINEAR SYSTEMS Jacobi method is used to solve linear systems of the form Ax=b, where A is the square and invertible.
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
MPI – An introduction by Jeroen van Hunen What is MPI and why should we use it? Simple example + some basic MPI functions Other frequently used MPI functions.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.
MPJava : High-Performance Message Passing in Java using Java.nio Bill Pugh Jaime Spacco University of Maryland, College Park.
Seminar on parallel computing Goal: provide environment for exploration of parallel computing Driven by participants Weekly hour for discussion, show &
Introduction to Scientific Computing Doug Sondak Boston University Scientific Computing and Visualization.
Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.
PETSc Portable, Extensible Toolkit for Scientific computing.
PARALLEL PROCESSING The NAS Parallel Benchmarks Daniel Gross Chen Haiout.
Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical.
Tomographic mammography parallelization Juemin Zhang (NU) Tao Wu (MGH) Waleed Meleis (NU) David Kaeli (NU)
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Iterative and direct linear solvers in fully implicit magnetic reconnection simulations with inexact Newton methods Xuefei (Rebecca) Yuan 1, Xiaoye S.
Singular Value Decomposition-Based Modeling of Time Domain Signals in Broadband Microwave Spectroscopy A. J. Minei College of Mount St. Vincent S. A. Cooke.
Makoto Kudoh*1, Hisayasu Kuroda*1,
DELL PowerEdge 6800 performance for MR study Alexander Molodozhentsev KEK for RCS-MR group meeting November 29, 2005.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
PuReMD: Purdue Reactive Molecular Dynamics Package Hasan Metin Aktulga and Ananth Grama Purdue University TST Meeting,May 13-14, 2010.
Page 1 Trilinos Software Engineering Technologies and Integration Numerical Algorithm Interoperability and Vertical Integration –Abstract Numerical Algorithms.
Overview of the ACTS Toolkit For NERSC Users Brent Milne John Wu
Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.
ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 1 Barry Smith Argonne National Laboratory DOE 2000 Numerics Capability
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
ML: Multilevel Preconditioning Package Trilinos User’s Group Meeting Wednesday, October 15, 2003 Jonathan Hu Sandia is a multiprogram laboratory operated.
Developing a computational infrastructure for parallel high performance FE/FVM simulations Dr. Stan Tomov Brookhaven National Laboratory August 11, 2003.
Parallelization of the Classic Gram-Schmidt QR-Factorization
A High Performance Middleware in Java with a Real Application Fabrice Huet*, Denis Caromel*, Henri Bal + * Inria-I3S-CNRS, Sophia-Antipolis, France + Vrije.
Parallelization of 2D Lid-Driven Cavity Flow
Computing Resources at Vilnius Gediminas Technical University Dalius Mažeika Parallel Computing Laboratory Vilnius Gediminas Technical University
Parallel Scaling of parsparsecircuit3.c Tim Warburton.
Danny Dunlavy, Andy Salinger Sandia National Laboratories Albuquerque, New Mexico, USA SIAM Parallel Processing February 23, 2006 SAND C Sandia.
New Features in ML 2004 Trilinos Users Group Meeting November 2-4, 2004 Jonathan Hu, Ray Tuminaro, Marzio Sala, Michael Gee, Haim Waisman Sandia is a multiprogram.
1 Stratimikos Unified Wrapper to Trilinos Linear Solvers and Preconditioners Roscoe A. Bartlett Department of Optimization & Uncertainty Estimation Sandia.
Parallel Solution of the Poisson Problem Using MPI
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Thyra For Developers Roscoe A. Bartlett Department of Optimization & Uncertainty Estimation Sandia National Laboratories Trilinos Users Group Meeting (Developers.
An Overview of Belos and Anasazi October 16 th, 11-12am Heidi Thornquist Teri Barth Rich Lehoucq Mike Heroux Computational Mathematics and Algorithms Sandia.
1 Spring 2003 Prof. Tim Warburton MA557/MA578/CS557 Lecture 34.
ML: A Multilevel Preconditioning Package Copper Mountain Conference on Iterative Methods March 29-April 2, 2004 Jonathan Hu Ray Tuminaro Marzio Sala Sandia.
Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing Rahul.S. Sampath May 9 th 2007.
Brain (Tech) NCRR Overview Magnetic Leadfields and Superquadric Glyphs.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
PuReMD Design Initialization – neighbor-list, bond-list, hydrogenbond-list and Coefficients of QEq matrix Bonded interactions – Bond-order, bond-energy,
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0.
On the Path to Trinity - Experiences Bringing Codes to the Next Generation ASC Platform Courtenay T. Vaughan and Simon D. Hammond Sandia National Laboratories.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
DOE/Office of Science/ASCR (Sandia National Laboratories)
Hui Liu University of Calgary
Parallel Plasma Equilibrium Reconstruction Using GPU
School of Nuclear Engineering
Trilinos Software Engineering Technologies and Integration
MPJ: A Java-based Parallel Computing System
CINECA HIGH PERFORMANCE COMPUTING SYSTEM
Parallelizing Unstructured FEM Computation
Presentation transcript:

Integrating Trilinos Solvers to SEAM code Dagoberto A.R. Justo – UNM Tim Warburton – UNM Bill Spotz – Sandia

SEAM (NCAR) SpectralElementAtmosphericMethod AztecOO AztecOO Epetra Epetra Nox Nox Ifpack Ifpack PETSc PETSc Komplex Komplex Trilinos (Sandia Lab)

AztecOO Solvers Solvers –CG, CGS, BICGStab, GMRES, Tfqmr Preconditioners Preconditioners –Diagonal Jacobi, Least Square, Neumann, Domain Decomposition, Symmetric Gauss-Seidel Matrix Free implementation Matrix Free implementation C++ (Fortran interface) C++ (Fortran interface) MPI MPI

Implementation SEAM CODE.. Pcg_solver. (F90) Pcg_solver. Aztec_solvers( ). (F90) Sub Aztec_solvers. AZ_Iterate( ) (C) Matrix_vector_C (C) Matrix_vector. (F90) Prec_Jacobi. (F90) Prec_Jacobi_C (C) AZTECAZTEC

Machines used Pentium III Notebook (serial) Pentium III Notebook (serial) –Linux, LAM-MPI, Intel Compilers Los Lobos at Los Lobos at –Linux Cluster –256 nodes –IBM Pentium III 750 MHz, 256 KB L2 Cache, 1 Gb RAM –Portland Group compiler –MPICH for Myrinet interconnections

Graphical Results from SEAM Energy Mass

Memory (in Mbytes per processor)

Speed Up From 1 to 160 processors. From 1 to 160 processors. Time of Simulation Time of Simulation 144 time iterations 144 time iterations x 300 s = 12 h simulation x 300 s = 12 h simulation Verify results using mass, energy,… Verify results using mass, energy,… –(Different result for 1 proc)

Speed Up – SEAM selecting # of elements ne=24x24x6

Speed Up – SEAM selecting order np=6

Speed Up – SEAM+Aztec best: cgs solver

Speed Up – SEAM+Aztec best: cgs solver + Least Square preconditioner

Speed Up – SEAM+Aztec increasing np -> increases speedup

Upshot – SEAM (One CG iteration)

Upshot – SEAM (matrix times vector communication)

Upshot – SEAM+Aztec (One CG iteration)

Upshot – SEAM+Aztec (Matrix times vector communication)

Upshot – SEAM+Aztec (Vector Reduction)

Time (24x24x6 elements, 2 proc.) SolverIter. Time (loop) Time/iter SEAM p= it 7.48 s 0.22 s/it SEAM p= it 81.2 s 1.42 s/it Cg p= it 28.2 s 0.32 s/it Cgs p= it 28.6 s 0.38 s/it Tfqmr p= it 31.1 s 0.41 s/it Bicg p= it 29.4 s 0.31 s/it Cgs ls p= it 42.0 s 1.19 s/it CG Jacobi p= it 17.2 s 0.37 s/it Cgs Jacobip= it 15.3 s 0.48 s/it Cgs p= it 274. S 4.53 s/it

Conclusions & Suggested Future Efforts SEAM+Aztec works! SEAM+Aztec works! SEAM+Aztec is 2x slower SEAM+Aztec is 2x slower  difference in CG algorithms  SEAM+Aztec time-iteration is 50% slower  0.1% of time lost in calls, preparation for Aztec. More time  better tune-up. More time  better tune-up. Domain decomposition Preconditioners Domain decomposition Preconditioners

SEAM + Aztec works! SEAM + Aztec works! More time  better tune-up. More time  better tune-up. Conclusions & Suggested Future Efforts