Presentation is loading. Please wait.

Presentation is loading. Please wait.

High-Performance Numerical Components and Common Interfaces Lois Curfman McInnes Mathematics and Computer Science Division Argonne National Laboratory.

Similar presentations

Presentation on theme: "High-Performance Numerical Components and Common Interfaces Lois Curfman McInnes Mathematics and Computer Science Division Argonne National Laboratory."— Presentation transcript:

1 High-Performance Numerical Components and Common Interfaces Lois Curfman McInnes Mathematics and Computer Science Division Argonne National Laboratory June 7-8, 2005 Joint ORNL/Indiana University Workshop on Computational Frameworks for Fusion Oak Ridge, TN

2 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 2 Outline Motivation –Complex, multiphysics, multiscale nonlinear applications –Distributed, multilevel memory hierarchies Parallel Components for PDEs and Optimization –Two-phased approach Some Challenges –Domain-specific common interfaces –Dynamic adaptivity Concluding Remarks

3 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 3 Motivating Scientific Applications Discretization Algebraic Solvers Parallel I/O Meshes Data Redistribution Physics Optimization Derivative Computation DiagnosticsSteeringVisualization Adaptive Solution Astrophysics Molecular structures Aerodynamics Fusion

4 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 4 Challenges Community Perspective –Life-cycle costs of applications are increasing Require the combined use of software developed by different groups Difficult to leverage expert knowledge and advances in subfields Difficult to obtain portable performance Individual Scientist Perspective –Too much energy focused on too many details Little time to think about modeling, physics, mathematics Fear of bad performance without custom code Even when code reuse is possible, it is far too difficult Our Perspective –How to manage complexity? Numerical software tools that work together New algorithms (e.g., interactive/dynamic techniques, algorithm composition) Multimodel, multiphysics simulations

5 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 5 What are the algorithmic needs of our target applications? Large-scale, nonlinear PDE-based simulations –Multirate, multiscale, multicomponent –Rich variety of time scales and strong nonlinearities –Can run on 10,000+ processors, where systems have increasingly deep memory hierarchies –Require 100,000’s of nonlinear solves (time integration) Need –Fully or semi-implicit solvers –Multi-level algorithms –Support for adaptivity –Support for user-defined customizations (e.g., physics- informed preconditioners, transfer operators, and smoothers)

6 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 6 Software for Nonlinear PDEs and Related Optimization Problems Goal: For problems arising from PDEs, support the general solution of F(u) = 0 User provides: –Code to evaluate F(u) –Code to evaluate Jacobian of F(u) (optional) or use sparse finite difference (FD) approximation or use automatic differentiation (AD) –AD support via collaboration with P. Hovland and B. Norris (see ) Goal: Solve related optimization problems, generally min f(u), u < u < u, c < c(u) < c Simple example: unconstrained minimization: min f(u) User provides: –Code to evaluate f(u) –Code to evaluate gradient and Hessian of f(u) (optional) or use sparse FD or AD ll uu

7 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 7 Interface Issues How to hide complexity, yet allow customization and access to a range of algorithmic options? How to achieve portable performance? How to interface among external tools? –Including multiple libraries developed by different groups that provide similar functionality (e.g., linear algebra software) Criteria for evaluation of success –Efficiency (both per node performance and scalability) –Usability –Extensibility

8 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 8 Two-Phased Approach to Numerical Components Phase 1 –Develop parallel, object-oriented numerical libraries OO techniques are effective for development with a moderate sized team Provide foundation of algorithms, data structures, implementations Phase 2 –Develop CCA-compliant component interfaces Leverage existing code Provide a more effective means for managing interactions among code developed by different groups

9 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 9 Parallel Numerical Libraries: PETSc and TAO PETSc: Portable, Extensible Toolkit for Scientific Computation –S. Balay, K. Buschelman, B. Gropp, D. Kaushik, M. Knepley, L. C. McInnes, B. Smith, H. Zhang – –Targets the parallel solution of large-scale PDE-based applications –Begun in 1991, now over 13,000 downloads since 1995 TAO: Toolkit for Advanced Optimization –S. Benson, L. C. McInnes, J. Moré, J. Sarich – –Targets the solution of large-scale optimization problems –Begun in 1997 as part of DOE ACTS Toolkit Approach –Freely available and supported research toolkits Hyperlinked docs, many examples, usable from Fortran 77/90, C, and C++ –Portable to any parallel system supporting MPI, including Tightly coupled systems –Cray T3E, SGI Origin, IBM SP, HP 9000, Sun Enterprise Loosely coupled systems, e.g., networks of workstations –Compaq, HP, IBM, SGI, Sun, PCs running Linux or Windows –Distributed memory ‘shared nothing’ approach; encapsulate message- passing details in objects such as matrices, vectors, index sets

10 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 10 Compressed Sparse Row (AIJ) Blocked Compressed Sparse Row (BAIJ) Block Diagonal (BDIAG) DenseOthers IndicesBlock IndicesStrideOthers Index Sets Vectors Line SearchTrust Region Newton-based Methods Others Nonlinear Solvers Additive Schwartz Block Jacobi ILUICC LU (Sequential only) Others Preconditioners Euler Backward Euler Pseudo Time Stepping Others Time Steppers GMRESCGCGSBi-CG-STABTFQMRRichardsonChebychevOthers Krylov Subspace Methods Matrices PETSc Numerical Libraries Distributed Arrays Matrix-free

11 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 11 Semi-smooth Methods Others Complementarity Newton Trust Region GPCGInterior PointLMVMKTOthers Bound Constrained Optimization TAO Solvers PETSc (initial interface) Global Arrays (PNNL – thanks to M. Kumar and J. Nieplocha) Etc. Levenberg Marquardt Gauss- Newton LMVM Levenberg Marquardt with Bound Constraints Others Nonlinear Least Squares LMVM with Bound Constraints Line Search Trust Region Newton-based Methods Limited Memory Variable Metric (LMVM) Method Unconstrained Minimization Conjugate Gradient Methods Fletcher- Reeves Polak- Ribiére Polak- Ribiére-Plus Others TAO interfaces to external libraries for parallel vectors, matrices, and linear solvers

12 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 12 Newton-Krylov Methods Newton: Solve: Update: Krylov: Projection methods for solving linear systems, Ax=b, using the Krylov subspace K = span(r, Ar, A r,…,A r ) –Require A only in the form of matrix-vector products –Popular methods: CG, GMRES, TFQMR, BiCGStab, etc. Preconditioning: In practice, typically needed: –Transform Ax=b into an equivalent form: B Ax = B b or (AB )(Bx) = b where the inverse action of B approximates that of A, but at a smaller cost F’(u ) d u = – F(u ) u = u + l du l-1l l l j 0 0 0 0 2 j-1

13 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 13 Post- Processing Application Initialization Function Evaluation Jacobian Evaluation PETSc Nonlinear Solvers PETSc code Application code Finite difference approximation Or automatic differentiation code Matrices Vectors Krylov Solvers Preconditioners GMRES TFQMR BCGS CGS BCG Others… ASM ILU B-Jacobi SSOR Multigrid Others… AIJ B-AIJ Diagonal Dense Matrix-free Others… Sequential Parallel Others… Application Driver An Application Perspective: Solve F(u) = 0

14 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 14 Aerodynamics Example Developers: D. Kaushik (Argonne), D. Keyes (Columbia Univ), W. Gropp, B. Smith (Argonne), W.K. Anderson (NASA); based on a legacy NASA code, FUN3d, developed by Anderson Background: The Euler equations describe the conservation of mass, momentum, and energy in an inviscid fluid; here we study the flow of air over an ONERA M6 wing. Model: Fully implicit steady-state 3D incompressible Euler model using a tetrahedral mesh Solvers: Newton-Krylov-Schwarz method with pseudo-transient continuation Won Gordon Bell prize at SC99

15 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 15 Performance ONERA M6 wing test case, tetrahedral grid of 2.8 million vertices (about 11 million unknowns) on up to 3072 ASCI Red nodes (each with dual Pentium Pro 333 MHz processors)

16 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 16 Scientific Applications PETSc and TAO solvers have been used successfully in many scientific applications –Aerodynamics, acoustics, biomechanics, chemistry, fusion, electromagnetics, micromagnetics, materials science, multiphase flow, nanotechnology, reactive transport, etc. –See and –Scale to low 1000s of processors PETSc usage in fusion applications includes: –The SEL macroscopic modeling code, A. H. Glasser and X. Z. Tang, Computer Physics Communications, 164, 237-243, 2004. –A finite element Poisson solver for gyrokinetic particle simulations, Y. Nishimura, Z.Lin, J.Lewandowski, and S.Ethier, Submitted to J. Comput. Phys., 2004. –Global gyrokinetic Particle-in-cell Simulations with Trapped Electrons, J.L.V Lewandowski, Y.Nishimura, W.W.Lee, Z.Lin, and S. Ethier, Sherwood Fusion Theory Conference, Missoula, MT, 2004. –Electromagnetic gyrokinetic simulation with a fluid-kinetic hybrid electron model, Y. Nishimura, Z.Lin, L.Chen, J.Lewandowski, S.Ethier, and W. Wang, Sherwood Fusion Theory Conference, Missoula, MT, 2004. –Numerical studies of a steady state axisymmetric co-axial helicity injection plasma, X.Z. Tang and A.H. Boozer, Physics of Plasmas, 11, 171-185, 2004. –Inclusion of electromagnetic effects into gyrokinetic particle simulations, Y. Nishimura, Z.Lin, L.Chen, and W. Wang, American Physical Society 45th Annual Meeting Division of Plasma Physics, Albuquerque, New Mexico, October 2003, 2003. –Resistive Magnetohydrodynamics Simulation of Fusion Plasmas, X. Z. Tang, G. Y. Fu, S. C. Jardin, L. L. Lowe, W. Park, and H. R. Strauss, Princeton Plasma Physics Laboratory, PPPL-3532, Presented at 10th Society for Industrial and Applied Mathematics (SIAM) Conference on Parallel Processing for Scientific Computing, Portsmouth, Virginia, March 12-14, 2001.

17 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 17 Two-Phased Approach to Numerical Components Phase 1 –Develop parallel, object-oriented numerical libraries OO techniques are effective for development with a moderate sized team Provide foundation of algorithms, data structures, implementations Phase 2 –Develop CCA-compliant component interfaces Leverage existing code Provide a more effective means for managing interactions among code developed by different groups

18 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 18 CCA Overview CCA evolved from DOE2000 as a grass roots effort –Recognized benefit of component based software engineering (CBSE) to high-performance scientific computing –Bridle the burgeoning hardware/software complexity! –See: CBSE needed to be specially crafted for HPC –Supporting parallelism and performance requirements –Supporting scientific languages (e.g. Fortran 90), legacy codes With SciDAC support, CCA has: –Demonstrated effectiveness of component-oriented approach –Advanced scientific research across several key domains –Grown a diverse community of users –See:

19 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 19 CCA Compliance in TAO Paradigm shift; both TAO and the application become components Each is required to provide a default constructor and to implement the CCA component interface –contains one method: “setServices” to register ports All interactions between components use ports –Application provides a “go” port and uses “taoSolver” port –TAO provides a “taoSolver” port There is no “main” routine Ref: J. Sarich, A Programmer's Guide for Providing CCA Component Interfaces to the Toolkit for Advanced Optimization, Argonne technical report ANL/MCS-TM-279, December, 2004.

20 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 20 Negligible CCA Overhead in TAO Optimization Components No CCA overhead within components Small overhead between components Small overhead for language interoperability No CCA overhead on parallel computing Be aware of costs & design with them in mind –Small costs, easily amortized Maximum 0.2% overhead for CCA vs native C++ code for parallel molecular dynamics up to 170 CPUs. Aggregate time for linear solver component in unconstrained minimization problem. Ref: B. Norris et al., Parallel Components for PDEs and Optimization: Some Issues and Experiences, Parallel Computing, 28 (12), 2002, pp. 1811-1831.

21 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 21 CCA Application: Optimization in Quantum Chemistry Collaboration of ANL, PNNL, and SNL researchers, working with their own packages, integrated using CCA: –TAO (ANL) Limited Memory Variable Metric (LMVM) algorithm –PETSc (ANL) and Global Arrays (PNNL) for linear algebra –MPQC (SNL) and NWChem (PNNL) chemistry packages Significant improvements over “traditional” BFGS optimizers built into packages Interoperability at linear algebra and chemistry package levels Ref: J. P. Kenny et al. Component-Based Integration of Chemistry and Optimization Software. J. Computational Chemistry, 24(14):1717--1725, 2004. 0 10 20 30 40 50 60 70 80 90 GlycineIsoprenePhosphoserineAcetylsalicylic AcidCholesterol Number of Energy and Gradient Evalutaions NWChem/native MPQC/native NWChem/TAO MPQC/TAO Comparison of native BFGS and TAO LMVM optimization algorithms used with the MPQC and NWChem computational chemistry packages. Function evaluations in this domain are very expensive, so reducing optimization steps is very important.

22 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 22 Outline Motivation –Complex, multiphysics, multiscale nonlinear applications –Distributed, multilevel memory hierarchies Parallel Components for PDEs and Optimization –Two-phased approach Some Challenges –Domain-specific common interfaces –Dynamic adaptivity Concluding Remarks

23 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 23 The CCA Forum participants do not pretend to be experts in all phases of computation, but rather just to be developing a standard way to exchange component capabilities. Medium of exchange: interfaces –Components interact only through explicitly defined interfaces –Quality (generality, completeness) of interfaces varies widely –Higher quality interfaces… Require general agreement among groups or communities Are more easily used in front of multiple implementations Are more easily (re)used by many applications Facilitate experimentation with new algorithms, implementations, etc. The Importance of Interfaces

24 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 24 A challenge to the community: Common interfaces are central Need experts in various areas to define sets of domain- specific common interfaces –Scientific application domains, meshes, discretization, (non)linear solvers, optimization, data analysis, visualization, etc. Caveat: Developing common interfaces is difficult! –Technical challenges Tradeoffs in broad functionality vs. maintaining good performance –Social challenges Agreement among diverse individuals with different priorities Few academic rewards for software The CCA is actively developing or promoting the development of common domain-specific interfaces, including –Distributed array descriptor –Molecular geometry optimization –MxN parallel data redistribution –Adaptive mesh refinement (w/ APDEC SciDAC Center) –Mesh and discretization interfaces (lead: TSTT SciDAC Center) –Linear and nonlinear solver interfaces (lead: TOPS SciDAC Center) This means you!

25 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 25 Interface Definition Efforts Collaborations with math SciDAC centers focus on unified interfaces to numerous existing and new libraries –Users can swap libraries without having to change their code –New libraries are more easily integrated into applications Some info on TOPS and TSTT interfaces: –Parallel PDE-Based Simulations Using the Common Component Architecture, Lois Curfman McInnes et al., Argonne National Laboratory preprint ANL/MCS-P1179-0704, 2004 (available via, to appear in Are Magnus Bruaset, Petter Bjorstad, and Aslak Tveito, editors, Numerical Solution of PDEs on Parallel Computers, SuperLU PETSc Hypre Sparskit Others … Application Linear Solver Libraries TOPS Solver Interfaces SuperLU PETSc Hypre Sparskit SolversSolvers Others … Application

26 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 26 TOPS’ Linear Solver Interface Goals –Simplicity - small number of distinct concepts –Generality –Programming language independence (via SIDL) –High performance –Extensibility – infrastructure for defining/implementing ‘conceptual’ solver interfaces Progenitors include –FEI (finite element interface) / C++ developed at SNL –ESI (equation solver interface) / C++ multi-lab effort –Various TOPS software packages Current drafts available via –Bitkeeper repository: bk:// –Snapshot: Who: B. Smith (ANL), R. Falgout (LLNL), various TOPS investigators

27 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 27 Object Model Concepts Solver (is an) Vector – represents field data View (has one or more) (has a) Layout – provides access to the data – how data is laid out across processes Operator

28 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 28 View allows users to access values in the “language of the application” Handles any data communication transparently Same idea as conceptual interfaces within hypre (LLNL) Data Layout structuredcompositeblock-strucunstrucCSR Linear Solvers GMG,...FAC,...Hybrid,...AMGe,...ILU,... Conceptual (Linear System) Interfaces c/o Rob Falgout, LLNL

29 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 29 Views differ primarily in the way they “set” and “get” data Classical Linear Algebra View – Indices are scalars that represent locations in R n Structured Mesh View – Indices are 3D triples that describe “boxes of data” (think 3D Fortran arrays) Views / Layouts –classical linear algebra access –single structured mesh –finite element interface –semi-structured meshes (structured mesh “parts” with additional arbitrary connections) –etc… array getValues(array indices); array getValues( ilower, iupper);

30 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 30 What’s Coming in TOPS Solvers Greater interface standardization Greater solver interoperability Better integration upwards w/ meshing and discretization systems Better integration downwards w/ performance monitoring and engineering systems Better algorithms! c/o David Keyes, TOPS PI (see more TOPS info at

31 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 31 Anticipated Impact of Common TOPS Solver Interfaces on Fusion Easier for fusion scientists to explore different algorithms and solvers developed by different groups, such as these MHD/TOPS collaborations (for which interfaces were done manually for new algorithms callable across Ax=b interface) –M3D replacement of additive Schwarz (ASM) preconditioner with algebraic multigrid (AMG) in hypre (LLNL) achieved mesh-independent convergence rate 4-5  improvement in execution time –NIMROD replacement of diagonally scaled Krylov solver with a supernodal parallel sparse direct solver in SuperLU (LBNL) 2D tests run 100  faster; 3D production runs are 4-5  faster

32 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 32 Motivating Scientific Applications Discretization Algebraic Solvers Parallel I/O Meshes Data Redistribution Physics Optimization Derivative Computation DiagnosticsSteeringVisualization Adaptive Solution Astrophysics Molecular structures Aerodynamics Fusion

33 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 33 Dynamic Adaptivity Next generation applications will need to adapt to changing computational conditions –Changes in physics/models/algorithms in long-running simulations, different resource needs and performance characteristics CBSE enables component substitution at runtime, based on changing application characteristics and available resources linear solver A linear solver B linear solver C linear solver proxy: solve f’(u) du = -f(u) component monitoring Newton-Krylov solver application monitoring application driver analysis, optimization, replacement, and substitution decision services Component Substitution Set

34 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 34 Computational Quality of Service (CQoS ) Approach: Automatic selection and configuration of components to suit a particular computational purpose, involves research in: Ref: P. Hovland, K. Keahey, L. McInnes, B. Norris, L. Diachin, and P. Raghavan, A Quality-of-Service Architecture for High-Performance Numerical Components, Proceedings of the Workshop on QoS in Component-Based Software Engineering, Toulouse, France, June 20, 2003. Ref: B. Norris, J. Ray, R. Armstrong, L. McInnes, D. Bernholdt, W. Elwasif, A. Malony, and S. Shende, Computational Quality of Service for Scientific Components, Proceedings of the International Symposium on Component-Based Software Engineering (CBSE7), Edinburgh, Scotland, 2004. Ref: B. Norris and I. Veljkovic, Performance Monitoring and Analysis Components in Adaptive PDE-Based Simulations, Argonne preprint ANL/MCS-P1221-0105, January, 2005. Provider Component C Provider Component B Provider Component A Component Proxy Runtime Monitoring Historical Database Runtime Database Access Component Framework Application Component(s) Adaptive Strategy Component Adaptive Strategy Component Adaptive Strategy Component Adaptive Strategy Component Abstract Interface Metadata and metrics Performance evaluation and monitoring Automated application assembly and reconfiguration Adaptive polyalgorithmic solvers

35 L.C. McInnes, IU/ORNL Workshop on Computational Frameworks for Fusion, 6/8/2005 35 Concluding Remarks High-performance numerical components can be effectively built using a 2-phased process –Object-oriented numerical libraries developed by different teams at different institutions –Light-weight component layers Domain-specific common interfaces that are defined by various computational science communities are critical for –Achieving the promise of ‘plug-and-play’ component interoperability –Addressing issues in dynamic component interactions (reconfiguring and recomposing) These capabilities are becoming increasingly important for multi-physics, multi-scale computational science applications (e.g., fusion simulations)

Download ppt "High-Performance Numerical Components and Common Interfaces Lois Curfman McInnes Mathematics and Computer Science Division Argonne National Laboratory."

Similar presentations

Ads by Google