NERSC User Group Meeting The DOE ACTS Collection Osni Marques Lawrence Berkeley National Laboratory

Slides:



Advertisements
Similar presentations
Advanced Computational Software Scientific Libraries: Part 2 Blue Waters Undergraduate Petascale Education Program May 29 – June
Advertisements

HPC - High Performance Productivity Computing and Future Computational Systems: A Research Engineer’s Perspective Dr. Robert C. Singleterry Jr. NASA Langley.
1 A Common Application Platform (CAP) for SURAgrid -Mahantesh Halappanavar, John-Paul Robinson, Enis Afgane, Mary Fran Yafchalk and Purushotham Bangalore.
MA5233: Computational Mathematics
- ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) UC Berkeley.
Symmetric Eigensolvers in Sca/LAPACK Osni Marques
Programming Tools and Environments: Linear Algebra James Demmel Mathematics and EECS UC Berkeley.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium Siu Man Yau, Katherine.
PETSc Portable, Extensible Toolkit for Scientific computing.
PARALLEL PROCESSING The NAS Parallel Benchmarks Daniel Gross Chen Haiout.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.
Parallel & Cluster Computing Linear Algebra Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma SC08 Education.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.
Processing of a CAD/CAE Jobs in grid environment using Elmer Electronics Group, Physics Department, Faculty of Science, Ain Shams University, Mohamed Hussein.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
1 Using the PETSc Parallel Software library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems (M. Minkoff.
Overview of the ACTS Toolkit For NERSC Users Brent Milne John Wu
The ACTS Toolkit (What can it do for you?) Osni Marques and Tony Drummond ( LBNL/NERSC )
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 1 Barry Smith Argonne National Laboratory DOE 2000 Numerics Capability
1 Eigenvalue Problems in Nanoscale Materials Modeling Hong Zhang Computer Science, Illinois Institute of Technology Mathematics and Computer Science, Argonne.
Center for Component Technology for Terascale Simulation Software CCA is about: Enhancing Programmer Productivity without sacrificing performance. Supporting.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Presented by The Lapack for Clusters (LFC) Project Piotr Luszczek The MathWorks, Inc.
ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory NERSC User.
Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.
1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
CS240A: Conjugate Gradients and the Model Problem.
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
Implementing Hypre- AMG in NIMROD via PETSc S. Vadlamani- Tech X S. Kruger- Tech X T. Manteuffel- CU APPM S. McCormick- CU APPM Funding: DE-FG02-07ER84730.
Cracow Grid Workshop, November 5-6, 2001 Concepts for implementing adaptive finite element codes for grid computing Krzysztof Banaś, Joanna Płażek Cracow.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Automatic Parameterisation of Parallel Linear Algebra Routines Domingo Giménez Javier Cuenca José González University of Murcia SPAIN Algèbre Linéaire.
Full and Para Virtualization
The ACTS Collection Building a Reliable Software Infrastructure for Scientific Computing Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory.
The Performance Evaluation Research Center (PERC) Participating Institutions: Argonne Natl. Lab.Univ. of California, San Diego Lawrence Berkeley Natl.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09,
Report from LBNL TOPS Meeting TOPS/ – 2Investigators  Staff Members:  Parry Husbands  Sherry Li  Osni Marques  Esmond G. Ng 
Algebraic Solvers in FASTMath Argonne Training Program on Extreme-Scale Computing August 2015.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
Performance of BLAS-3 Based Tridiagonalization Algorithms on Modern SMP Machines Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0.
SimTK 1.0 Workshop Downloads Jack Middleton March 20, 2008.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
TEMPLATE DESIGN © H. Che 2, E. D’Azevedo 1, M. Sekachev 3, K. Wong 3 1 Oak Ridge National Laboratory, 2 Chinese University.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Sub-fields of computer science. Sub-fields of computer science.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
A survey of Exascale Linear Algebra Libraries for Data Assimilation
Xing Cai University of Oslo
Performance Technology for Scalable Parallel Systems
for more information ... Performance Tuning
P A R A L L E L C O M P U T I N G L A B O R A T O R Y
GENERAL VIEW OF KRATOS MULTIPHYSICS
Immersed Boundary Method Simulation in Titanium Objectives
Presented By: Darlene Banta
Support for Adaptivity in ARMCI Using Migratable Objects
Presentation transcript:

NERSC User Group Meeting The DOE ACTS Collection Osni Marques Lawrence Berkeley National Laboratory

09/19/2007NUG Meeting2 What is the ACTS Collection? Advanced CompuTational Software Collection Tools for developing parallel applications ACTS started as an “umbrella” project Goals  Extended support for experimental software  Make ACTS tools available on DOE computers  Provide technical support  Maintain ACTS information center (  Coordinate efforts with other supercomputing centers  Enable large scale scientific applications  Educate and train

09/19/2007NUG Meeting3 ACTS Timeline ACTS Toolkit User Community ACTS Challenge Codes Computing Systems Interoperability Pool of Software Tools Testing and Acceptance Phase Workshops and Training Scientific Computing Centers Computer Vendors Numerical Simulations Physics Chemistry Biology Medicine Mathematics Bioinformatics Computer Sciences Engineering Software Collection Software Sustainability Center Software Tool Box

09/19/2007NUG Meeting4 Challenges in the Development of Scientific Codes Research in computational sciences is fundamentally interdisciplinary The development of complex simulation codes on high-end computers is not a trivial task Productivity Time to the first solution (prototype) Time to solution (production) Other requirements Complexity Increasingly sophisticated models Model coupling Interdisciplinarity Performance Increasingly complex algorithms Increasingly complex architectures Increasingly demanding applications Research in computational sciences is fundamentally interdisciplinary The development of complex simulation codes on high-end computers is not a trivial task Productivity Time to the first solution (prototype) Time to solution (production) Other requirements Complexity Increasingly sophisticated models Model coupling Interdisciplinarity Performance Increasingly complex algorithms Increasingly complex architectures Increasingly demanding applications Libraries written in different languages Discussions about standardizing interfaces are often sidetracked into implementation issues Difficulties managing multiple libraries developed by third-parties Need to use more than one language in one application The code is long-lived and different pieces evolve at different rates Swapping competing implementations of the same idea and testing without modifying the code Need to compose an application with some other(s) that were not originally designed to be combined Libraries written in different languages Discussions about standardizing interfaces are often sidetracked into implementation issues Difficulties managing multiple libraries developed by third-parties Need to use more than one language in one application The code is long-lived and different pieces evolve at different rates Swapping competing implementations of the same idea and testing without modifying the code Need to compose an application with some other(s) that were not originally designed to be combined

09/19/2007NUG Meeting5 Current ACTS Tools and their Functionalities CategoryToolFunctionalities Numerical Trilinos Algorithms for the iterative solution of large sparse linear systems (includes AZTEC00) Hypre Algorithms for the iterative solution of large sparse linear systems, intuitive grid-centric interfaces, and dynamic configuration of parameters. PETSc Tools for the solution of PDEs that require solving large-scale, sparse linear and nonlinear systems of equations. OPT++ Object-oriented nonlinear optimization package. SUNDIALS Solvers for the solution of systems of ordinary differential equations, nonlinear algebraic equations, and differential-algebraic equations. ScaLAPACK Library of high performance dense linear algebra routines for distributed-memory message- passing. SLEPc Eigensolver package built on top of PETSc SuperLU General-purpose library for the direct solution of large, sparse, nonsymmetric systems of linear equations. TAO Large-scale optimization software, including nonlinear least squares, unconstrained minimization, bound constrained optimization, and general nonlinear optimization. Code Development Global Arrays Library for writing parallel programs that use large arrays distributed across processing nodes and that offers a shared-memory view of distributed arrays. Overture Object-Oriented tools for solving computational fluid dynamics and combustion problems in complex geometries. Code Execution TAU Set of tools for analyzing the performance of C, C++, Fortran and Java programs. Library Development ATLAS Tools for the automatic generation of optimized numerical software for modern computer architectures and compilers. Availability To be installed Installed Upon request Installed* To be installed Installed* To be installed Installed** Upon request Under test Upon request * Also in LibSci ** USG

09/19/2007NUG Meeting6 ACTS Tools: numerical functionalities Computational ProblemMethodologyAlgorithmsLibrary Linear Equations Direct Methods LU factorizationScaLAPACK (dense), SuperLU (sparse) Cholesky factorizationScaLAPACK LDL T factorization (tridiagonal matrices) QR factorization QR factorization with column pivoting LQ factorization Full orthogonal factorization Generalized QR factorization Iterative Methods Conjugate gradient (CG)AztecOO (Trilinos), PETSc GMRESAztecOO, Hypre, PETSc CG SquaredAztecOO, PETSc Bi-CG-Stab QMRAztecOO Transpose free QMRAztecOO, PETSc SYMMLQPETSc Richardson Block Jacobi preconditionerAztecOO, Hypre, PETSc Point Jacobi preconditionerAztecOO Least-squares polynomials SOR preconditionerPETSc Overlapping additive Schwarz Approximate inverse Hypre Sparse LU preconditionerAztecOO, Hypre, PETSc Incomplete LU (ILU) preconditioner Multigrid MG preconditionerHypre, PETSc Algebraic multigridML (Trilinos), Hypre SemicoarseningHypre

09/19/2007NUG Meeting7 ACTS Tools: numerical functionalities Computational ProblemMethodologyAlgorithmsLibrary Linear least squaresleast squares min x  b  Ax  2 ScaLAPACK minimum norm min x  x  2 minimum norm least squares min x  x  2 and min x  b  Ax  2 Standard eigenvalue problemsiterative, direct Az= z for A=A T or A=A H ScaLAPACK (dense), SLEPc (sparse) Generalized eigenvalue problems Az= Bz, ABz= z, BAz= z Singular value decomposition A=U  V T, A=U  V H Non-linear equations problemsNewton-based Line searchPETSc, KINSOL (SUNDIALS) Trust regionsPETSc Pseudotransient continuationPETSc Matrix-freePETSc Nonlinear optimizationNewton-based NewtonOPT++, TAO Finite differencesOPT++ Quasi-NewtonOPT++, TAO (LMVM) Nonlinear interior pointOPT++, TAO CG Standard nonlinear CGOPT++, TAO Limited memory BFGSOPT++ Gradient projectionTAO Direct Search Without derivative informationOPT++ Semismooth Infeasible semismoothTAO Feasible semismooth

09/19/2007NUG Meeting8 ACTS Tools: numerical functionalities Computational ProblemMethodologyAlgorithmsLibrary ODEsIntegration Variable coefficient Adams-MoultonCVODE (SUNDIALS) Backward differential Direct and iterative solvers ODEs with sensitivity analysisIntegration Variable coefficient Adams-Moulton Backward differential Direct and iterative solvers Differential-algebraic equationsBackward differential formula Direct and iterative solversIDA (SUNDIALS) Nonlinear equations with sensitivity analysis Inexact Newton line searchSensKINSOL (SUNDIALS) Tuning and optimizationAutomatic code generator BLAS and some LAPACK routinesATLAS

09/19/2007NUG Meeting9 Software Interfaces CALL BLACS_GET( -1, 0, ICTXT ) CALL BLACS_GRIDINIT( ICTXT, 'Row-major', NPROW, NPCOL ) : CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) : CALL PDGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, INFO ) Data Layout structuredcompositeblockstrcunstrucCSR Linear Solvers GMGFACHybrid,...AMGeILU,... Linear System Interfaces -ksp_type [cg,gmres,bcgs,tfqmr,…] -pc_type [lu,ilu,jacobi,sor,asm,…] More advanced: -ksp_max_it -ksp_gmres_restart -pc_asm_overlap -pc_asm_type [basic,restrict,interpolate,none] command line function call problem domain (ScaLAPACK) (PETSc) (Hypre)

09/19/2007NUG Meeting10 Use of ACTS Tools Electronic structure optimization performed with TAO, (UO 2 ) 3 (CO 3 ) 6 (courtesy of deJong). Molecular dynamics and thermal flow simulation using codes based on Global Arrays. GA have been employed in large simulation codes such as NWChem, GAMESS-UK, Columbus, Molpro, Molcas, MWPhys/Grid, etc. Problems (different grid types) solved with Hypre. Micro-FE bone modeling using ParMetis, Prometheus and PETSc; models up to 537 million DOF (Adams, Bayraktar, Keaveny, and Papadopoulos). Model of the heart mechanics (blood- muscle-valve) by an adaptive and parallel version of the immersed boundary method, using PETSc, Hypre and SAMRAI (courtesy of Boyce Griffith, New York University). 3D incompressible Euler,tetrahedral grid, up to 11 million unknowns, based on a legacy NASA code, FUN3d (W. K. Anderson), fully implicit steady-state, parallelized with PETSc (courtesy of Kaushik and Keyes).

09/19/2007NUG Meeting11 Use of ACTS Tools Induced current (white arrows) and charge density (colored plane and gray surface) in crystallized glycine due to an external field (Louie, Yoon, Pfrommer and Canning), eigenvalue problems solved with ScaLAPACK. OPT++ is used in protein energy minimization problems (shown here is protein T162 from CASP5, courtesy of Meza, Oliva et al.) Omega3P is a parallel distributed-memory code intended for the modeling and analysis of accelerator cavities, which requires the solution of generalized eigenvalue problems. A parallel exact shift-invert eigensolver based on PARPACK and SuperLU has allowed for the solution of a problem of order 7.5 million with 304 million nonzeros. Finding 10 eigenvalues requires about 2.5 hours on 24 processors of an IBM SP. Two ScaLAPACK routines, PZGETRF and PZGETRS, are used for solution of linear systems in the spectral algorithms based AORSA code (Batchelor et al.), which is intended for the study of electromagnetic wave-plasma interactions. The code reaches 68% of peak performance on 1936 processors of an IBM SP.

09/19/2007NUG Meeting12 ScaLAPACK BLAS LAPACK BLACS MPI/PVM/... PBLAS Global Local platform specific Clarity,modularity, performance and portability. Atlas can be used here for automatic tuning. Linear systems, least squares, singular value decomposition, eigenvalues. Communication routines targeting linear algebra operations. Parallel BLAS. Communication layer (message passing). Version released in January 2007; NSF funding for further development. UTK, UCB …

09/19/2007NUG Meeting13 ScaLAPACK: understanding performance 60 processors, Dual AMD Opteron 1.4GHz Cluster with Myrinet Interconnect, 2GB memory LU on 2.2 GHz AMD Opteron (4.4 GFlop/s peak performance)

09/19/2007NUG Meeting14 ScaLAPACK: understanding the 2D block-cyclic distribution

09/19/2007NUG Meeting15 PETSc Computation and Communication Kernels MPI, MPI-IO, BLAS, LAPACK Profiling Interface PETSc PDE Application Codes Object-Oriented Matrices, Vectors, Indices Grid Management Linear Solvers Preconditioners + Krylov Methods Nonlinear Solvers, Unconstrained Minimization ODE Integrators Visualization Interface ANL Portable, Extensible Toolkit for Scientific computation

09/19/2007NUG Meeting16 PETSc: Linear Solvers (SLES) PETSc Application Initialization Evaluation of A and b Post- Processing Solve Ax = b PCKSP Linear Solvers (SLES) PETSc codeUser code Main Routine

09/19/2007NUG Meeting17 PETSc: setting SLES parameters at run time -ksp_type [cg,gmres,bcgs,tfqmr,…] -pc_type [lu,ilu,jacobi,sor,asm,…] more advanced: -ksp_max_it -ksp_gmres_restart -pc_asm_overlap -pc_asm_type [basic,restrict,interpolate,none] many more (see manual)

09/19/2007NUG Meeting18 Important Questions for Application Developers How does performance vary with different compilers? Is poor performance correlated with certain OS features? Has a recent change caused unanticipated performance? How does performance vary with MPI variants? Why is one application version faster than another? What is the reason for the observed scaling behavior? Did two runs exhibit similar performance? How are performance data related to application events? Which machines will run my code the fastest and why? Which benchmarks predict my code performance best? From

09/19/2007NUG Meeting19 TAU Multi-level performance instrumentation Multi-language automatic source instrumentation Flexible and configurable performance measurement Widely-ported parallel performance profiling system Computer system architectures and operating systems Different programming languages and compilers Support for multiple parallel programming paradigms Multi-threading, message passing, mixed-mode, hybrid Support for performance mapping Support for object-oriented and generic programming Integration in complex software systems and applications U Oregon Tuning and Analysis Utilities

09/19/2007NUG Meeting20 Definitions: profiling and tracing Profiling Recording of summary information during execution (inclusive and exclusive time, number of calls, hardware statistics, etc) Reflects performance behavior of program entities (functions, loops, basic blocks, user-defined “semantic” entities) Very good for low-cost performance assessment Helps to expose performance bottlenecks and hotspots Implemented through sampling: periodic OS interrupts or hardware counter traps instrumentation: direct insertion of measurement code Tracing Recording of information about significant points (events) during program execution entering/exiting code region (function, loop, block, etc) thread/process interactions (send/receive message, etc) Save information in event record timestamp CPU identifier, thread identifier Event type and event-specific information Event trace is a time-sequenced stream of event records Can be used to reconstruct dynamic program behavior Typically requires code instrumentation

09/19/2007NUG Meeting21 TAU: Example 1 (1/2) set the C compiler Ex. tau-multiplecounters-mpi-papi-pdt

09/19/2007NUG Meeting22 TAU: Example 1 (2/2) PAPI provides access to hardware performance counters (see for details and contact for the corresponding TAU events). In this example we are just measuring FLOPS. PARAPROF

09/19/2007NUG Meeting23 TAU: Example 2 (1/2) PESCAN is a code that uses the folded spectrum method for nonselfconsistent nanoscale calculations. It uses a planewave basis, and conventional Kleinman-Bylander nonlocal pseudopotetials in real space. It is parallelized using MPI and can calculate million atom systems. # Makefile for PESCAN include $(TAULIBDIR)/Makefile.tau-multiplecounters-mpi-papi-pdt #include $(TAULIBDIR)/Makefile.tau-callpath-mpi-pdt FC = $(TAU_COMPILER) mpxlf90_r CC = $(TAU_COMPILER) mpcc_r ⋮

09/19/2007NUG Meeting24 TAU: Example 2 (2/2)

09/19/2007NUG Meeting25 The Case for Software Libraries machine tuned and dependent modules application data layout CONTROL I/O algorithmic implementations APPLICATION New architecture: may or may not need re-rewriting New developments: difficult to predict New architecture: extensive re-rewriting New or extended Physics: extensive re-rewriting or increased overhead New architecture: minimal to extensive rewriting New architecture or software: Extensive tuning May require new programming paradigms Difficult to maintain!

09/19/2007NUG Meeting26 ACTS: value-added services PyACTS Requirements for reusable high quality software tools Integration, maintenance and support efforts Interfaces using script languages Software automation  More information: