8/23/ Trilinos Tutorial Overview and Basic Concepts Michael A. Heroux Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL85000.
8/23/ Trilinos Development Team Ross Bartlett Lead Developer of Thyra Developer of Rythmos Paul Boggs Developer of Thyra Todd Coffey Lead Developer of Rythmos Jason Cross Developer of Jpetra David Day Developer of Komplex Clark Dohrmann Developer of CLAPS Michael Gee Developer of ML, NOX Bob Heaphy Lead developer of Trilinos SQA Mike Heroux Trilinos Project Leader Lead Developer of Epetra, AztecOO, Kokkos, Komplex, IFPACK, Thyra, Tpetra Developer of Amesos, Belos, EpetraExt, Jpetra Ulrich Hetmaniuk Developer of Anasazi Robert Hoekstra Lead Developer of EpetraExt Developer of Epetra, Thyra, Tpetra Russell Hooper Developer of NOX Vicki Howle Lead Developer of Meros Developer of Belos and Thyra Jonathan Hu Developer of ML Sarah Knepper Developer of Komplex Tammy Kolda Lead Developer of NOX Joe Kotulski Lead Developer of Pliris Rich Lehoucq Developer of Anasazi and Belos Kevin Long Lead Developer of Thyra, Developer of Belos and Teuchos Roger Pawlowski Lead Developer of NOX Michael Phenow Trilinos Webmaster Lead Developer of New_Package Eric Phipps Developer of LOCA and NOX Marzio Sala Lead Developer of Didasko and IFPACK Developer of ML, Amesos Andrew Salinger Lead Developer of LOCA Paul Sexton Developer of Epetra and Tpetra Bill Spotz Lead Developer of PyTrilinos Developer of Epetra, New_Package Ken Stanley Lead Developer of Amesos and New_Package Heidi Thornquist Lead Developer of Anasazi, Belos and Teuchos Ray Tuminaro Lead Developer of ML and Meros Jim Willenbring Developer of Epetra and New_Package. Trilinos library manager Alan Williams Developer of Epetra, EpetraExt, AztecOO, Tpetra
8/23/ Outline of Talk Background/Motivation. Trilinos Package Concepts. ANAs, LALs and APPs. Managing Parallel Data: Petra Object Model. Whirlwind Tour of Some Trilinos Packages. Generic Programming. SQA/SQE Issues. Concluding remarks.
8/23/ Target Problems: PDEs
8/23/ Target Problems: Circuits Pre-solve steps essential. Preconditioned iterative methods used routinely. Partition Circuit ScaleRCM Singleton Filter Partition LinSys
8/23/ Target Problems: Inhomogenous Fluids
8/23/ Motivation For Trilinos Sandia does LOTS of solver work. When I started at Sandia in May 1998: Aztec was a mature package. Used in many codes. FETI, PETSc, DSCPack, Spooles, ARPACK, DASPK, and many other codes were (and are) in use. New projects were underway or planned in multi-level preconditioners, eigensolvers, non-linear solvers, etc… The challenges: Little or no coordination was in place to: Efficiently reuse existing solver technology. Leverage new development across various projects. Support solver software processes. Provide consistent solver APIs for applications. ASCI was forming software quality assurance/engineering (SQA/SQE) requirements: Daunting requirements for any single solver effort to address alone.
8/23/ Evolving Trilinos Solution Trilinos 1 is an evolving framework to address these challenges: Fundamental atomic unit is a package. Includes core set of vector, graph and matrix classes (Epetra/Tpetra packages). Provides a common abstract solver API (Thyra package). Provides a ready-made package infrastructure (new_package package): Source code management (cvs, bonsai). Build tools (autotools). Automated regression testing (queue directories within repository). Communication tools (mailman mail lists). Specifies requirements and suggested practices for package SQA. In general allows us to categorize efforts: Efforts best done at the Trilinos level (useful to most or all packages). Efforts best done at a package level (peculiar or important to a package). Allows package developers to focus only on things that are unique to their package. 1. Trilinos loose translation: “A string of pearls”
8/23/ Trilinos Package Concepts Package: The Atomic Unit
8/23/ Trilinos Packages Trilinos is a collection of Packages. Each package is: Focused on important, state-of-the-art algorithms in its problem regime. Developed by a small team of domain experts. Self-contained: No explicit dependencies on any other software packages (with some special exceptions). Configurable/buildable/documented on its own. Sample packages: NOX, AztecOO, ML, IFPACK, Meros. Special package collections: Petra (Epetra, Tpetra, Jpetra): Concrete Data Objects Thyra: Abstract Conceptual Interfaces Teuchos: Common Tools. New_package: Jumpstart prototype.
8/23/ Greek Names
8/23/ Trilinos Package Summary ObjectivePackage(s) Distributed linear algebra objectsEpetra, Jpetra, Tpetra Krylov solversAztecOO, Belos ILU-type preconditionersAztecOO, IFPACK Multilevel preconditionersML, CLAPS Eigenvalue problemsAnasazi Block preconditionersMeros Direct sparse linear solversAmesos Direct dense solversEpetra, Teuchos, Pliris Abstract interfacesThyra Nonlinear system solversNOX, LOCA Complex linear systemsKomplex C++ utilities, (some) I/OTeuchos, EpetraExt, Kokkos Trilinos TutorialDidasko Python SupportPyTrilinos Time integrators/DAEsRythmos Archetype packageNewPackage Basic Linear Algebra classes Block Krylov Methods Scalable Preconditioners 3 rd Party Direct Solver Package. Will include WSMP and Pardiso interfaces in 6.0 Abstract Interfaces.
8/23/ Why Packages? Let’s Do Some Matrix Analysis!
8/23/ Trilinos Development Team (Again) Ross Bartlett Lead Developer of Thyra Developer of Rythmos Paul Boggs Developer of Thyra Todd Coffey Lead Developer of Rythmos Jason Cross Developer of Jpetra David Day Developer of Komplex Clark Dohrmann Developer of CLAPS Michael Gee Developer of ML, NOX Bob Heaphy Lead developer of Trilinos SQA Mike Heroux Trilinos Project Leader Lead Developer of Epetra, AztecOO, Kokkos, Komplex, IFPACK, Thyra, Tpetra Developer of Amesos, Belos, EpetraExt, Jpetra Ulrich Hetmaniuk Developer of Anasazi Robert Hoekstra Lead Developer of EpetraExt Developer of Epetra, Thyra, Tpetra Russell Hooper Developer of NOX Vicki Howle Lead Developer of Meros Developer of Belos and Thyra Jonathan Hu Developer of ML Sarah Knepper Developer of Komplex Tammy Kolda Lead Developer of NOX Joe Kotulski Lead Developer of Pliris Rich Lehoucq Developer of Anasazi and Belos Kevin Long Lead Developer of Thyra, Developer of Belos and Teuchos Roger Pawlowski Lead Developer of NOX Michael Phenow Trilinos Webmaster Lead Developer of New_Package Eric Phipps Developer of LOCA and NOX Marzio Sala Lead Developer of Didasko and IFPACK Developer of ML, Amesos Andrew Salinger Lead Developer of LOCA Paul Sexton Developer of Epetra and Tpetra Bill Spotz Lead Developer of PyTrilinos Developer of Epetra, New_Package Ken Stanley Lead Developer of Amesos and New_Package Heidi Thornquist Lead Developer of Anasazi, Belos and Teuchos Ray Tuminaro Lead Developer of ML and Meros Jim Willenbring Developer of Epetra and New_Package. Trilinos library manager Alan Williams Developer of Epetra, EpetraExt, AztecOO, Tpetra
8/23/ Developer and Package Node IDs Developer IDs RossBartlett = 1; PaulBoggs = 2; ToddCoffey = 3; JasonCross = 4; DavidDay = 5; ClarkDohrmann= 6; MichaelGee = 7; BobHeaphy = 8; MikeHeroux = 9; UlrichHetmaniuk = 10; RobHoekstra = 11; RussellHooper = 12; VickiHowle = 13; JonathanHu = 14; SarahKnepper = 15; TammyKolda = 16; JoeKotulski = 17; RichLehoucq = 18; KevinLong = 19; RogerPawlowski = 20; MichaelPhenow = 21; EricPhipps = 22; MarzioSala = 23; AndrewSalinger = 24; PaulSexton = 25; BillSpotz = 26; KenStanley = 27; HeidiThornquist = 28; RayTuminaro = 29; JimWillenbring = 30; AlanWilliams = 31; Package IDs PyTrilinos = 1; amesos = 2; anasazi = 3; aztecoo = 4; belos = 5; claps = 6; didasko = 7; epetra = 8; epetraext = 9; ifpack = 10; jpetra = 11; kokkos = 12; komplex = 13; meros = 14; loca = 15; ml = 16; new_package = 17; nox = 18; pliris = 19; rythmos = 20; teuchos = 21; thyra = 22; tpetra = 23; trilinosframework = 24;
8/23/ Developer-Package Edges A(i,j) = 1 if developer i contributes to package j A = sparse(31,24); A(RossBartlett,rythmos) = 1; A(RossBartlett,thyra) = 1; A(PaulBoggs,thyra) = 1; A(ToddCoffey,rythmos) = 1; A(JasonCross,jpetra) = 1; A(DavidDay,komplex) = 1; A(ClarkDohrmann,claps) = 1; A(MichaelGee,ml) = 1; A(MichaelGee,nox) = 1; A(BobHeaphy,trilinosframework) = 1; A(MikeHeroux,trilinosframework) = 1; A(MikeHeroux,epetra) = 1; A(MikeHeroux,aztecoo) = 1; A(MikeHeroux,kokkos) = 1; A(MikeHeroux,komplex) = 1; A(MikeHeroux,ifpack) = 1; A(MikeHeroux,thyra) = 1; A(MikeHeroux,tpetra) = 1; A(MikeHeroux,amesos) = 1; A(MikeHeroux,belos) = 1; A(MikeHeroux,epetraext) = 1; A(MikeHeroux,jpetra) = 1; A(UlrichHetmaniuk,anasazi) = 1; A(MarzioSala,didasko) = 1; A(MarzioSala,ifpack) = 1; A(MarzioSala,ml) = 1; A(MarzioSala,amesos) = 1; A(AndrewSalinger,loca) = 1; A(PaulSexton,epetra) = 1; A(PaulSexton,tpetra) = 1; A(BillSpotz,PyTrilinos) = 1; A(BillSpotz,epetra) = 1; A(BillSpotz,new_package) = 1; A(KenStanley,amesos) = 1; A(KenStanley,new_package) = 1; A(HeidiThornquist,anasazi) = 1; A(HeidiThornquist,belos) = 1; A(HeidiThornquist,teuchos) = 1; A(RayTuminaro,ml) = 1; A(RayTuminaro,meros) = 1; A(JimWillenbring,epetra) = 1; A(JimWillenbring,new_package) = 1; A(JimWillenbring,trilinosframework) = 1; A(AlanWilliams,epetra) = 1; A(AlanWilliams,epetraext) = 1; A(AlanWilliams,aztecoo) = 1; A(AlanWilliams,tpetra) = 1; A(RobHoekstra,epetra) = 1; A(RobHoekstra,thyra) = 1; A(RobHoekstra,tpetra) = 1; A(RobHoekstra,epetraext) = 1; A(RussellHooper,nox) = 1; A(VickiHowle,meros) = 1; A(VickiHowle,belos) = 1; A(VickiHowle,thyra) = 1; A(JonathanHu,ml) = 1; A(SarahKnepper,komplex) = 1; A(TammyKolda,nox) = 1; A(TammyKolda,trilinosframework) = 1; A(JoeKotulski,pliris) = 1; A(RichLehoucq,anasazi) = 1; A(RichLehoucq,belos) = 1; A(KevinLong,thyra) = 1; A(KevinLong,belos) = 1; A(KevinLong,teuchos) = 1; A(RogerPawlowski,nox) = 1; A(MichaelPhenow,trilinosframework) = 1; A(EricPhipps,loca) = 1; A(EricPhipps,nox) = 1;
8/23/ Developer-Package Matrix Number of developers per package: Maximum: max(sum(A)) = 6 Average:sum(sum(A))/24 = Number of package affiliations per developer: Maximum: max(sum(A’)) = 12 (me). Minus outlier: = 4. Average: sum(sum(A’))/31 = 2.26 Observations: Small package teams. Multiple packages per developer. spy(A);
8/23/ Developer-Developer Matrix (A*A’) B = A*A’ Apply Symmetric AMD reordering. Two developers connected if co- authors of a package. Only two developers “disconnected”: Clark Dohrmann: CLAPS. Joe Kotulski: Pliris. Komplex Subteam ML Subteam Anasazi Subteam Epetra Subteam NOX Subteam Thyra Subteam
8/23/ Package-Package Matrix (A’*A) (Not sure what to say about this) B = A’*A, Apply symamd. Two packages connected if they share a developer. Only two packages “disconnected”: Clark Dohrmann: CLAPS. Joe Kotulski: Pliris. Me Epetra, Amesos, Didasko, Ifpack Without Me Meros, AztecOO, Tpetra, EpetraExt
8/23/ Why Packages? Partial Answer: Small Inter-related teams.
8/23/ Package Interoperability
8/23/ Interoperability vs. Dependence (“Can Use”) (“Depends On”) Although most Trilinos packages have no explicit dependence, each package must interact with some other packages: NOX needs operator, vector and solver objects. AztecOO needs preconditioner, matrix, operator and vector objects. Interoperability is enabled at configure time. For example, NOX: --enable-nox-lapack compile NOX lapack interface libraries --enable-nox-epetra compile NOX epetra interface libraries --enable-nox-petsc compile NOX petsc interface libraries Trilinos configure script is vehicle for: Establishing interoperability of Trilinos components… Without compromising individual package autonomy. Trilinos offers seven basic interoperability mechanisms.../configure –enable-python
8/23/ Trilinos Interoperability Mechanisms (Acquired as Package Matures) Package builds under Trilinos configure scripts. Package can be built as part of a suite of packages; cross-package interfaces enable/disable automatically Package accepts user data as Epetra or Thyra objects Applications using Epetra/Thyra can use package Package accepts parameters from Teuchos ParameterLists Applications using Teuchos ParameterLists can drive package Package can be used via Thyra abstract solver classes Applications or other packages using Thyra can use package Package can use Epetra for private data. Package can then use other packages that understand Epetra Package accesses solver services via Thyra interfaces Package can then use other packages that implement Thyra interfaces Package available via PyTrilinos Package can be used with other Trilinos packages via Python.
8/23/ “Can Use” vs. “Depends On” “Can Use” Interoperable without dependence. Dense is Good. Encouraged. “Depends On” OK, if essential. Epetra: 9 clients. Thyra, Teuchos, NOX: 2 clients. Discouraged.
8/23/ Interoperability Example: ML ML: Multi-level Preconditioner Package. Primary Developers: Ray Tuminaro, Jonathan Hu, Marzio Sala. No explicit, essential dependence on other Trilinos packages. Uses abstract interfaces to matrix/operator objects. Has independent configure/build process (but can be invoked at Trilinos level). Interoperable with other Trilinos packages and other libraries: Accepts user data as Epetra matrices/vectors. Can use Epetra for internal matrices/vectors. Can be used via Thyra abstract interfaces. Can be built via Trilinos configure/build process. Available as preconditioner to all other Trilinos packages. Can use IFPACK, Amesos, AztecOO objects as smoothers, coarse solvers. Can be driven via Teuchos ParameterLists. Available to PETSc users without dependence on any other Trilinos packages.
8/23/ Package Maturation Process Asynchronicity
8/23/ Day 1 of Package Life CVS: Each package is self-contained in Trilinos/package/ directory. Bugzilla: Each package has its own Bugzilla product. Bonsai: Each package is browsable via Bonsai interface. Mailman: Each Trilinos package, including Trilinos itself, has four mail lists: CVS commit s. “Finger on the pulse” list. Mailing list for developers. Issues for package users. Releases and other announcements specific to the package. New_package (optional): Customizable boilerplate for Autoconf/Automake/Doxygen/Python/Thyra/Epetra/TestHarness/Website
8/23/ Sample Package Maturation Process StepExample Package added to CVS: Import existing code or start with new_package. ML CVS repository migrated into Trilinos (July 2002). Mail lists, Bugzilla Product, Bonsai database created. ml-announce, ml-users, ml-developers, ml-checkins, ml- created, linked to CVS (July 2002). Package builds with configure/make, Trilinos- compatible ML adopts Autoconf, Automake starting from new_package (June 2003). Epetra objects recognized by package.ML accepts user data as Epetra matrices and vectors (October 2002). Package accessible via Thyra interfaces.ML adaptors written for TSFCore_LinOp (Thyra) interface (May 2003). Package uses Epetra for internal data.ML able to generate Epetra matrices. Allows use of AztecOO, Amesos, Ifpack, etc. as smoothers and coarse grid solvers (Feb- June 2004). Package parameters settable via Teuchos ParameterList ML gets manager class, driven via ParameterLists (June 2004). Package usable from Python (PyTrilinos)ML Python wrappers written using new_package template (April 2005). Startup StepsMaturation Steps
8/23/ Maturation Jumpstart: NewPackage NewPackage provides jump start to develop/integrate a new package NewPackage is a “Hello World” program and website: Simple but it does work with autotools. Compiles and builds. NewPackage directory contains: Commonly used directory structure: src, test, doc, example, config. Working Autoconf/Automake files. Documentation templates (doxygen). Working regression test setup. Working Python and Thyra adaptors. Substantially cuts down on: Time to integrate new package. Variation in package integration details. Development of website. NOTE: NewPackage can be used independent from Trilinos
8/23/ What Trilinos is not Trilinos is not a single monolithic piece of software. Each package: Can be built independent of Trilinos. Has its own self-contained CVS structure. Has its own Bugzilla product and mail lists. Development team is free to make its own decisions about algorithms, coding style, release contents, testing process, etc. Trilinos top layer is not a large amount of source code: Trilinos repository (6.0 branch) contains: 660,378 source lines of code (SLOC). Sum of the packages SLOC counts :648,993. Trilinos top layer SLOC count: 11,385 (1.7%). Trilinos is not “indivisible”: You don’t need all of Trilinos to get things done. Any collection of packages can be combined and distributed. Current public release contains only 16 of the 20+ Trilinos packages.
8/23/ Insight from History A Philosophy for Future Directions In the early 1800’s U.S. had many new territories. Question: How to incorporate into U.S.? Colonies? No. Expand boundaries of existing states? No. Create process for self-governing regions. Yes. Theme: Local control drawing on national resources. Trilinos package architecture has some similarities: Asynchronous maturation. Packages decide degree of interoperations, use of Trilinos facilities. Strength of each: Scalable growth with local control.
8/23/ Solver Collaboration: The Big Picture
8/23/ ANA Linear Operator Interface Solver Software Components and Interfaces 2) LAL : Linear Algebra Library (e.g. vectors, sparse matrices, sparse factorizations, preconditioners) ANA APP ANA/APP Interface ANA Vector Interface 1) ANA : Abstract Numerical Algorithm (e.g. linear solvers, eigen solvers, nonlinear solvers, stability analysis, uncertainty quantification, transient solvers, optimization etc.) 3) APP : Application (the model: physics, discretization method etc.) Example Trilinos Packages: Belos (linear solvers) Anasazi (eigen solvers) NOX (nonlinear equations) Rhythmos (ODEs,DAEs) MOOCHO (Optimization) … Example Trilinos Packages: Epetra/Tpetra (Mat,Vec) Ifpack, AztecOO, ML (Preconditioners) Meros (Preconditioners) Pliris (Interface to direct solvers) Amesos (Direct solvers) Komplex (Complex/Real forms) … Types of Software Components Thyra ANA Interfaces to Linear Algebra FEI/Thyra APP to LAL Interfaces Custom/Thyra LAL to LAL Interfaces Thyra::Nonlin Examples: SIERRA NEVADA Xyce Sundance … LAL Matrix Preconditioner Vector
8/23/ LAL Foundation: Petra Petra provides a “common language” for distributed linear algebra objects (operator, matrix, vector) Petra provides distributed matrix and vector services. Has 3 implementations under development.
8/23/ Perform redistribution of distributed objects: Parallel permutations. “Ghosting” of values for local computations. Collection of partial results from remote processors. Petra Object Model Abstract Interface to Parallel Machine Shameless mimic of MPI interface. Keeps MPI dependence to a single class (through all of Trilinos!). Allow trivial serial implementation. Opens door to novel parallel libraries (shmem, UPC, etc…) Abstract Interface for Sparse All-to-All Communication Supports construction of pre-recorded “plan” for data-driven communications. Examples: Supports gathering/scatter of off-processor x/y values when computing y = Ax. Gathering overlap rows for Overlapping Schwarz. Redistribution of matrices, vectors, etc… Describes layout of distributed objects: Vectors: Number of vector entries on each processor and global ID Matrices/graphs: Rows/Columns managed by a processor. Called “Maps” in Epetra. Dense Distributed Vector and Matrices: Simple local data structure. BLAS-able, LAPACK-able. Ghostable, redistributable. RTOp-able. Base Class for All Distributed Objects: Performs all communication. Requires Check, Pack, Unpack methods from derived class. Graph class for structure-only computations: Reusable matrix structure. Pattern-based preconditioners. Pattern-based load balancing tools. Basic sparse matrix class: Flexible construction process. Arbitrary entry placement on parallel machine.
8/23/ Petra Implementations Three version under development: Epetra (Essential Petra): Current production version. Restricted to real, double precision arithmetic. Uses stable core subset of C++ (circa 2000). Interfaces accessible to C and Fortran users. Tpetra (Templated Petra): Next generation C++ version. Templated scalar and ordinal fields. Uses namespaces, and STL: Improved usability/efficiency. Jpetra (Java Petra): Pure Java. Portable to any JVM. Interfaces to Java versions of MPI, LAPACK and BLAS via interfaces.
8/23/ Data Model Wrap-Up Our target apps require flexible data model. Petra Object Model (POM) supports: Arbitrary placement of vector, graph, matrix entries on parallel machine. Arbitrary redistribution, ghosting and collection of distributed data. This flexibility is needed by LALs: Algebraic and multi-level preconditioners. Concrete distributed matrix kernels. Direct methods. POM is complex: Non-LALs (ANAs) do not rely on it.
8/23/ Abstract Numerical Algorithms (ANAs)
8/23/ Categories of Abstract Problems and Abstract Algorithms · Linear Problems: · Linear equations: · Eigen problems: · Nonlinear Problems: · Nonlinear equations: · Stability analysis: · Transient Nonlinear Problems: · DAEs/ODEs: · Optimization Problems: · Unconstrained: · Constrained: Trilinos Packages Belos Anasazi NOX LOCA Rythmos MOOCHO
8/23/ Introducing Abstract Numerical Algorithms An ANA is a numerical algorithm that can be expressed abstractly solely in terms of vectors, vector spaces, and linear operators (i.e. not with direct element access) Example Linear ANA (LANA) : Linear Conjugate Gradients scalar product defined by vector space vector-vector operations linear operator applications Scalar operations Types of operationsTypes of objects What is an abstract numerical algorithm (ANA)? Linear Conjugate Gradient Algorithm
8/23/ Examples of Nonlinear Abstract Numerical Algorithms Class of Numerical ProblemExample ANA Nonlinear equations Newton’s method (e.g. NOX, MOOCHO) Initial Value DAE/ODE Backward Euler method (e.g. Rythmos?) Linear equations GMRES (e.g. Belos) Requires
8/23/ Basic Thyra Vector and Operator Interfaces LinearOps apply to Vectors An operator knows its domain and range spaces A linear operator is a kind of operator A Vector knows its VectorSpace > VectorSpaces create Vectors!
8/23/ Basic Thyra Vector and Operator Interfaces > Basically compatible with HCL (Symes et al.) and many other operator/vector interfaces Near optimal for many but not all abstract numerical algorithms (ANAs) What’s missing? => Multi-vectors!
8/23/ Introducing Multi-Vectors What is a multi-vector? An m multi-vector V is a tall thin dense matrix composed of m column vectors v j Example: m = 4 columns V == What ANAs can exploit multi-vectors? Block linear solvers (e.g. block GMRES): Block eigen solvers (i.e. block Arnoldi): Compact limited memory quasi-Newton Tensor methods for nonlinear equations Why are multi-vectors important? Cache performance Reduce global communication Examples of multi-vector operations = Y = A X Q = X T Y Y = Y + X Q Block dot products (m 2 ) Block update Operator applications (i.e. mat-vecs) =+ =
8/23/ Basic Thyra Vector and Operator Interfaces > Where do multi-vectors fit in?
8/23/ Basic Thyra Vector and Operator Interfaces A MulitVector is a linear operator A MulitVector has a collection of column vectors A MulitVector is a tall thin dense matrix > VectorSpaces create MultiVectors LinearOps apply to MultiVectors A Vector is a MultiVector
8/23/ Basic Thyra Vector and Operator Interfaces What about standard vector ops? Reductions (norm, dot etc.)? Transformations (axpy, scaling etc.)? What about specialized vector ops? e.g. Interior point methods for opt >
8/23/ What’s the Big Deal about Vector-Vector Operations? Examples from OOQP (Gertz, Wright) Example from TRICE (Dennis, Heinkenschloss, Vicente) Example from IPOPT (Waechter) Currently in MOOCHO : > 40 vector operations! Many different and unusual vector operations are needed by interior point methods for optimization!
8/23/ Vector Reduction/Transformation Operators Defined Reduction/Transformation Operators (RTOp) Defined z 1 i … z q i op t ( i, v 1 i … v p i, z 1 i … z q i ) element-wise transformation op r ( i, v 1 i … v p i, z 1 i … z q i ) element-wise reduction 2 op rr ( 1, 2 ) reduction of intermediate reduction objects v 1 … v p R n : p non-mutable (constant) input vectors z 1 … z q R n : q mutable (non-constant) input/output vectors : reduction target object (many be non-scalar (e.g. {y k,k}), or NULL) Key to Optimal Performance op t (…) and op r (…) applied to entire sets of subvectors (i = a…b) independently: z 1 a:b … z q a:b, op( a, b, v 1 a:b … v p a:b, z 1 a:b … z q a:b, ) Communication between sets of subvectors only for NULL, op rr ( 1, 2 ) 2 RTOpT apply_op(in sv[1..p], inout sz[1…q], inout beta ) ROpDotProd apply_op(… ) TOpAssignVectors apply_op(… ) … Vector applyOp( in : RTOpT, apply_op(in v[1..p], inout z[1…q], inout beta) SerialVectorStd applyOp( … ) MPIVectorStd applyOp( … ) Software Implementation (see RTOp paper on Trilinos website “Publications”) “Visitor” design pattern (a.k.a. function forwarding) …
8/23/ Basic Thyra Vector and Operator Interfaces The missing piece: Reduction/Transformation Operators Supports all needed vector operations Data/parallel independence Optimal performance R. A. Bartlett, B. G. van Bloemen Waanders and M. A. Heroux. Vector Reduction/Transformation Operators, ACM TOMS, March 2004
8/23/ Managing Parallel Data The Petra Object Model
8/23/ A Simple Epetra/AztecOO Program // Header files omitted… int main(int argc, char *argv[]) { MPI_Init(&argc,&argv); // Initialize MPI, MpiComm Epetra_MpiComm Comm( MPI_COMM_WORLD ); // ***** Create x and b vectors ***** Epetra_Vector x(Map); Epetra_Vector b(Map); b.Random(); // Fill RHS with random #s // ***** Create an Epetra_Matrix tridiag(-1,2,-1) ***** Epetra_CrsMatrix A(Copy, Map, 3); double negOne = -1.0; double posTwo = 2.0; for (int i=0; i<NumMyElements; i++) { int GlobalRow = A.GRID(i); int RowLess1 = GlobalRow - 1; int RowPlus1 = GlobalRow + 1; if (RowLess1!=-1) A.InsertGlobalValues(GlobalRow, 1, &negOne, &RowLess1); if (RowPlus1!=NumGlobalElements) A.InsertGlobalValues(GlobalRow, 1, &negOne, &RowPlus1); A.InsertGlobalValues(GlobalRow, 1, &posTwo, &GlobalRow); } A.FillComplete(); // Transform from GIDs to LIDs // ***** Map puts same number of equations on each pe ***** int NumMyElements = 1000 ; Epetra_Map Map(-1, NumMyElements, 0, Comm); int NumGlobalElements = Map.NumGlobalElements(); // ***** Report results, finish *********************** cout << "Solver performed " << solver.NumIters() << " iterations." << endl << "Norm of true residual = " << solver.TrueResidual() << endl; MPI_Finalize() ; return 0; } // ***** Create/define AztecOO instance, solve ***** AztecOO solver(problem); solver.SetAztecOption(AZ_precond, AZ_Jacobi); solver.Iterate(1000, 1.0E-8); // ***** Create Linear Problem ***** Epetra_LinearProblem problem(&A, &x, &b);
8/23/ Typical Flow of Epetra Object Construction Construct Comm Construct Map Construct x Construct b Construct A Any number of Comm objects can exist. Comms can be nested (ee.g., serial within MPI). Maps describe parallel layout. Maps typically associated with more than one comp object. Two maps (source and target) define an export/import object. Computational objects. Compatibility assured via common map.
8/23/ Parallel Data Redistribution Epetra vectors, multivectors, graphs and matrices are distributed via one of the map objects. A map is basically a partitioning of a list of global IDs: IDs are simply labels, no need to use contiguous values (Directory class handles details for general ID lists). No a priori restriction on replicated IDs. If we are given: A source map and A set of vectors, multivectors, graphs and matrices (or other distributable objects) based on source map. Redistribution is performed by: 1.Specifying a target map with a new distribution of the global IDs. 2.Creating Import or Export object using the source and target maps. 3.Creating vectors, multivectors, graphs and matrices that are redistributed (to target map layout) using the Import/Export object.
8/23/ Example: epetra/ex9.cpp int main(int argc, char *argv[]) { MPI_Init(&argc, &argv); Epetra_MpiComm Comm(MPI_COMM_WORLD); int NumGlobalElements = 4; // global dimension of the problem int NumMyElements; // local nodes Epetra_IntSerialDenseVector MyGlobalElements; if( Comm.MyPID() == 0 ) { NumMyElements = 3; MyGlobalElements.Size(NumMyElements); MyGlobalElements[0] = 0; MyGlobalElements[1] = 1; MyGlobalElements[2] = 2; } else { NumMyElements = 3; MyGlobalElements.Size(NumMyElements); MyGlobalElements[0] = 1; MyGlobalElements[1] = 2; MyGlobalElements[2] = 3; } // create a map Epetra_Map Map(-1,MyGlobalElements.Length(), MyGlobalElements.Values(),0, Comm); // create a vector based on map Epetra_Vector xxx(Map); for( int i=0 ; i<NumMyElements ; ++i ) xxx[i] = 10*( Comm.MyPID()+1 ); if( Comm.MyPID() == 0 ){ double val = 12; int pos = 3; xxx.SumIntoGlobalValues(1,0,&val,&pos); } cout << xxx; // create a target map, in which all elements are on proc 0 int NumMyElements_target; if( Comm.MyPID() == 0 ) NumMyElements_target = NumGlobalElements; else NumMyElements_target = 0; Epetra_Map TargetMap(-1,NumMyElements_target,0,Comm); Epetra_Export Exporter(Map,TargetMap); // work on vectors Epetra_Vector yyy(TargetMap); yyy.Export(xxx,Exporter,Add); cout << yyy; MPI_Finalize(); return( EXIT_SUCCESS ); }
8/23/ Output: epetra/ex9.cpp > mpirun -np 2./ex9.exe Epetra::Vector MyPID GID Value Epetra::Vector Epetra::Vector MyPID GID Value Epetra::Vector PE 0 xxx(0)=10 xxx(1)=10 xxx(2)=10 PE 1 xxx(1)=20 xxx(2)=20 xxx(3)=20 PE 0 yyy(0)=10 yyy(1)=30 yyy(2)=30 yyy(3)=20 PE 1 Export/Add Before ExportAfter Export
8/23/ Import vs. Export Import (Export) means calling processor knows what it wants to receive (send). Distinction between Import/Export is important to user, almost identical in implementation. Import (Export) objects can be used to do an Export (Import) as a reverse operation. When mapping is bijective (1-to-1 and onto), either Import or Export is appropriate.
8/23/ Example: 1D Matrix Assembly a b -u xx = f u(a) = 0 u(b) = 1 x1x1 x2x2 x3x3 PE 0PE 1 3 Equations: Find u at x 1, x 2 and x 3 Equation for u at x 2 gets a contribution from PE 0 and PE 1. Would like to compute partial contributions independently. Then combine partial results.
8/23/ Two Maps We need two maps: Assembly map: PE 0: { 1, 2 }. PE 1: { 2, 3 }. Solver map: PE 0: { 1, 2 } (we arbitrate ownership of 2). PE 1: { 3 }.
8/23/ End of Assembly Phase At the end of assembly phase we have AssemblyMatrix: On PE 0: On PE 1: Want to assign all of Equation 2 to PE 0 for use with solver. NOTE: For a class of Neumann-Neumann preconditioners, the above layout is exactly what we want. Equation 1: Equation 2: Equation 2: Equation 3: Row 2 is shared
8/23/ Export Assembly Matrix to Solver Matrix Epetra_Export Exporter(AssemblyMap, SolverMap); Epetra_CrsMatrix SolverMatrix (Copy, SolverMap, 0); SolverMatrix.Export(AssemblyMatrix, Exporter, Add); SolverMatrix.FillComplete();
8/23/ Matrix Export Equation 1: Equation 2: Equation 3: Equation 1: Equation 2: Equation 2: Equation 3: PE 0 PE 1 Before Export After Export Export/Add
8/23/ Example: epetraext/ex2.cpp int main(int argc, char *argv[]) { MPI_Init(&argc,&argv); Epetra_MpiComm Comm (MPI_COMM_WORLD); int MyPID = Comm.MyPID(); int n=4; // Generate Laplacian2d gallery matrix Trilinos_Util::CrsMatrixGallery G("laplace_2d", Comm); G.Set("problem_size", n*n); G.Set("map_type", "linear"); // Linear map initially // Get the LinearProblem. Epetra_LinearProblem *Prob = G.GetLinearProblem(); // Get the exact solution. Epetra_MultiVector *sol = G.GetExactSolution(); // Get the rhs (b) and lhs (x) Epetra_MultiVector *b = Prob->GetRHS(); Epetra_MultiVector *x = Prob->GetLHS(); // Repartition graph using Zoltan EpetraExt::Zoltan_CrsGraph * ZoltanTrans = new EpetraExt::Zoltan_CrsGraph(); EpetraExt::LinearProblem_GraphTrans * ZoltanLPTrans = new EpetraExt::LinearProblem_GraphTrans( *(dynamic_cast *>(ZoltanTrans)) ); cout << "Creating Load Balanced Linear Problem\n"; Epetra_LinearProblem &BalancedProb = (*ZoltanLPTrans)(*Prob); // Get the rhs (b) and lhs (x) Epetra_MultiVector *Balancedb = Prob->GetRHS(); Epetra_MultiVector *Balancedx = Prob->GetLHS(); cout << "Balanced b: " << *Balancedb << endl; cout << "Balanced x: " << *Balancedx << endl; MPI_Finalize() ; return 0 ; }
8/23/ Need for Import/Export Solvers for complex engineering applications need expressive, easy-to-use parallel data redistribution: Allows better scaling for non-uniform overlapping Schwarz. Necessary for robust solution of multiphysics problems. We have found import and export facilities to be a very natural and powerful technique to address these issues.
8/23/ Details about Epetra Maps Getting beyond standard use case…
8/23/ to-1 Maps 1-to-1 map (defn): A map is 1-to-1 if each GID appears only once in the map (and is therefore associated with only a single processor). Certain operations in parallel data repartitioning require 1- to-1 maps. Specifically: The source map of an import must be 1-to-1. The target map of an export must be 1-to-1. The domain map of a 2D object must be 1-to-1. The range map of a 2D object must be 1-to-1.
8/23/ D Objects: Four Maps Epetra 2D objects: CrsMatrix CrsGraph VbrMatrix Have four maps: RowMap: On each processor, the GIDs of the rows that processor will “manage”. ColMap: On each processor, the GIDs of the columns that processor will “manage”. DomainMap: The layout of domain objects (the x vector/multivector in y=Ax). RangeMap: The layout of range objects (the y vector/multivector in y=Ax). Must be 1-to-1 maps!!! Typically a 1-to-1 map Typically NOT a 1-to-1 map
8/23/ Sample Problem = yA x
8/23/ Case 1: Standard Approach RowMap = {0, 1} ColMap = {0, 1, 2} DomainMap = {0, 1} RangeMap = {0, 1} First 2 rows of A, elements of y and elements of x, kept on PE 0. Last row of A, element of y and element of x, kept on PE 1. PE 0 ContentsPE 1 Contents RowMap = {2} ColMap = {1, 2} DomainMap = {2} RangeMap = {2} Notes: Rows are wholly owned. RowMap=DomainMap=RangeMap (all 1-to-1). ColMap is NOT 1-to-1. Call to FillComplete: A.FillComplete(); // Assumes = yAx Original Problem
8/23/ Case 2: Twist 1 RowMap = {0, 1} ColMap = {0, 1, 2} DomainMap = {1, 2} RangeMap = {0} First 2 rows of A, first element of y and last 2 elements of x, kept on PE 0. Last row of A, last 2 element of y and first element of x, kept on PE 1. PE 0 ContentsPE 1 Contents RowMap = {2} ColMap = {1, 2} DomainMap = {0} RangeMap = {1, 2} Notes: Rows are wholly owned. RowMap is NOT = DomainMap is NOT = RangeMap (all 1-to-1). ColMap is NOT 1-to-1. Call to FillComplete: A.FillComplete(DomainMap, RangeMap); = yAx Original Problem
8/23/ Case 2: Twist 2 RowMap = {0, 1} ColMap = {0, 1} DomainMap = {1, 2} RangeMap = {0} First row of A, part of second row of A, first element of y and last 2 elements of x, kept on PE 0. Last row, part of second row of A, last 2 element of y and first element of x, kept on PE 1. PE 0 ContentsPE 1 Contents RowMap = {1, 2} ColMap = {1, 2} DomainMap = {0} RangeMap = {1, 2} Notes: Rows are NOT wholly owned. RowMap is NOT = DomainMap is NOT = RangeMap (all 1-to-1). RowMap and ColMap are NOT 1-to-1. Call to FillComplete: A.FillComplete(DomainMap, RangeMap); = yAx Original Problem
8/23/ Overview of Trilinos Packages
8/23/ Petra is Greek for “foundation”. Trilinos Common Language: Petra Petra provides a “common language” for distributed linear algebra objects (operator, matrix, vector) Petra 1 provides distributed matrix and vector services. Exists in basic form as an object model: Describes basic user and support classes in UML, independent of language/implementation. Describes objects and relationships to build and use matrices, vectors and graphs. Has 3 implementations under development.
8/23/ Petra Implementations Three version under development: Epetra (Essential Petra): Current production version. Restricted to real, double precision arithmetic. Uses stable core subset of C++ (circa 2000). Interfaces accessible to C and Fortran users. Tpetra (Templated Petra): Next generation C++ version. Templated scalar and ordinal fields. Uses namespaces, and STL: Improved usability/efficiency. Jpetra (Java Petra): Pure Java. Portable to any JVM. Interfaces to Java versions of MPI, LAPACK and BLAS via interfaces. Developers: Mike Heroux, Rob Hoekstra, Alan Williams, Paul Sexton
8/23/ Epetra Basic Stuff: What you would expect. Variable block matrix data structures. Multivectors. Arbitrary index labeling. Flexible, versatile parallel data redistribution. Language support for inheritance, polymorphism and extensions. View vs. Copy.
8/23/ AztecOO Krylov subspace solvers: CG, GMRES, Bi-CGSTAB,… Incomplete factorization preconditioners Aztec is the workhorse solver at Sandia: Extracted from the MPSalsa reacting flow code. Installed in dozens of Sandia apps. external licenses. AztecOO improves on Aztec by: Using Epetra objects for defining matrix and RHS. Providing more preconditioners/scalings. Using C++ class design to enable more sophisticated use. AztecOO interfaces allows: Continued use of Aztec for functionality. Introduction of new solver capabilities outside of Aztec. Developers: Mike Heroux, Alan Williams, Ray Tuminaro
8/23/ IFPACK: Algebraic Preconditioners Overlapping Schwarz preconditioners with incomplete factorizations, block relaxations, block direct solves. Accept user matrix via abstract matrix interface (Epetra versions). Uses Epetra for basic matrix/vector calculations. Supports simple perturbation stabilizations and condition estimation. Separates graph construction from factorization, improves performance substantially. Compatible with AztecOO, ML, Amesos. Can be used by NOX and ML. Developers: Marzio Sala, Mike Heroux
8/23/ Amesos Interface to direct solver for distributed sparse linear systems Challenges: No single solver dominates Different interfaces and data formats, serial and parallel Interface often change between revisions Amesos offers: A single, clear, consistent interface, to various packages Common look-and-feel for all classes Separation from specific solver details Use serial and distributed solvers; Amesos takes care of data redistribution Developers: Ken Stanley, Marzio Sala, Tim Davis
8/23/ Epetra_LinearProblem Problem( A, x, b); Amesos Afact; Amesos_BaseSolver* Solver = Afact.Create( “Klu”, Problem ) ; Solver->SymbolicFactorization( ) ; Solver->NumericFactorization( ) ; Solver->Solve( ) ; Calling Amesos
8/23/ Amesos: Supported Libraries Library NameLanguagecommunicator # procs for solution 1 KLUCSerial1 UMFPACKCSerial1 SuperLU 3.0CSerial1 SuperLU_DIST 2.0CMPIany MUMPS 4.3.1F90MPIany ScaLAPACKF77MPIany 6.0 Version: DSCPACK, PARDISO, TAUCS, WSMP 1 Matrices can be distributed over any number of processes
8/23/ ML: Multi-level Preconditioners Smoothed aggregation, multigrid and domain decomposition preconditioning package Critical technology for scalable performance of some key apps. ML compatible with other Trilinos packages: Accepts user data as Epetra_RowMatrix object (abstract interface). Any implementation of Epetra_RowMatrix works. Implements the Epetra_Operator interface. Allows ML preconditioners to be used with AztecOO and TSF. Can also be used completely independent of other Trilinos packages. Developers: Ray Tuminaro, Jonathan Hu, Marzio Sala
8/23/ NOX: Nonlinear Solvers Suite of nonlinear solution methods NOX uses abstract vector and “group” interfaces: Allows flexible selection and tuning of various strategies: Directions. Line searches. Epetra/AztecOO/ML, LAPACK, PETSc implementations of abstract vector/group interfaces. Designed to be easily integrated into existing applications. Developers: Tammy Kolda, Roger Pawlowski
8/23/ LOCA Library of continuation algorithms Provides Zero order continuation First order continuation Arc length continuation Multi-parameter continuation (via Henderson's MF Library) Turning point continuation Pitchfork bifurcation continuation Hopf bifurcation continuation Phase transition continuation Eigenvalue approximation (via ARPACK or Anasazi) Developers: Andy Salinger, Eric Phipps
8/23/ EpetraExt: Extensions to Epetra Library of useful classes not needed by everyone Most classes are types of “transforms”. Examples: Graph/matrix view extraction. Epetra/Zoltan interface. Explicit sparse transpose. Singleton removal filter, static condensation filter. Overlapped graph constructor, graph colorings. Permutations. Sparse matrix-matrix multiply. Matlab, MatrixMarket I/O functions. … Most classes are small, useful, but non-trivial to write. Developer: Robert Hoekstra, Alan Williams, Mike Heroux
8/23/ Teuchos Utility package of commonly useful tools: ParameterList class: key/value pair database, recursive capabilities. LAPACK, BLAS wrappers (templated on ordinal and scalar type). Dense matrix and vector classes (compatible with BLAS/LAPACK). FLOP counters, Timers. Ordinal, Scalar Traits support: Definition of ‘zero’, ‘one’, etc. Reference counted pointers, and more… Takes advantage of advanced features of C++: Templates Standard Template Library (STL) ParameterList: Allows easy control of solver parameters. Developers: Roscoe Barlett, Kevin Long, Heidi Thorquist, Mike Heroux, Paul Sexton, Kris Kampshoff
8/23/ New Packages: Belos and Anasazi Next generation linear solvers (Belos) and eigensolvers (Anasazi) libraries, written in templated C++. Provide a generic interface to a collection of algorithms for solving large-scale linear problems and eigenproblems. Algorithm implementation is accomplished through the use of abstract base classes (mini interface). Interfaces are derived from these base classes to operator-vector products, status tests, and any arbitrary linear algebra library. Includes block linear solvers and eigensolvers. Developers: Heidi Thorquist, Mike Heroux, Chris Baker, Rich Lehoucq
8/23/ Kokkos Goal: Isolate key non-BLAS kernels for the purposes of optimization. Kernels: Dense vector/multivector updates and collective ops (not in BLAS/Teuchos). Sparse MV, MM, SV, SM. Serial-only for now. Reference implementation provided (templated). Mechanism for improving performance: Default is aggressive compilation of reference source. BeBOP: Jim Demmel, Kathy Yelick, Rich Vuduc, UC Berkeley. Vector version: Cray. Developer: Mike Heroux
8/23/ NewPackage Package NewPackage provides jump start to develop/integrate a new package NewPackage is a “Hello World” program and website: Simple but it does work with autotools. Compiles and builds. NewPackage directory contains: Commonly used directory structure: src, test, doc, example, config. Working autotools files. Documentation templates (doxygen). Working regression test setup. Substantially cuts down on: Time to integrate new package. Variation in package integration details. Development of website.
8/23/ Two Recent Package Additions
8/23/ CLAPS Package Clark Dohrmann: Expert in computational structural mechanics. Unfamiliar with parallel computing. Limited C++ experience. Epetra: Advanced parallel basic linear algebra package. Supports sparse, dense linear algebra and parallel data redistribution. Clark+Epetra: CLOP: Overlapping Schwarz preconditioner. Scales with increasing number of constraints. sleg[1,2,4]x joint models, 12 modes, vplant cluster (previously unsolvable). # DOF#Constraints# procsTime 33,1454, m 36s 227,17018, m 22s 1,674,50270, m 22s 3 Months to develop. Depends only on Epetra. Compatible with rest of Trilinos. CLAPS=Clark’s Linear Algebra Suite
8/23/ Pliris Package Migration of Linpack benchmark code to supported environment. Used new_package as starting point. Put under source control for the first time. Portable build process for the first time. Documented distributed data model (Epetra). Has its own Bugzilla product, mail lists, regression tests. Source code browsable via Bonsai. And more… But Pliris is still an independent piece of code…and better off after the process.
8/23/ Interface Packages Thyra and PyTrilinos
8/23/ template bool sillyCgSolve( const Thyra::LinearOpBase &A,const Thyra::VectorBase &b,const int maxNumIters,const typename Teuchos::ScalarTraits ::magnitudeType tolerance,Thyra::VectorBase *x,std::ostream *out = NULL ) { using Teuchos::RefCountPtr; typedef Teuchos::ScalarTraits ST; // We need to use ScalarTraits to support arbitrary types. typedef typename ST::magnitudeType ScalarMag; // This is the type returned from a vector norm. typedef one ST::one()); // Get the vector space (domain and range spaces should be the same) RefCountPtr > space = A.domain(); // Compute initial residual : r = b - A*x RefCountPtr > r = createMember(space); Thyra::assign(&*r,b); // r = b Thyra::apply(A,Thyra::NOTRANS,*x,&* one, one); // r = -A*x + r const ScalarMag r0_nrm = Thyra::norm(*r); // Compute ||r0|| = sqrt( ) for convergence test if(r0_nrm == ST::zero()) return true; // Trivial RHS and initial LHS guess? // Create workspace vectors and scalars RefCountPtr > p = createMember(space), q = createMember(space); Scalar rho_old; // Perform the iterations for( int iter = 0; iter <= maxNumIters; ++iter ) { // Check convergence and output iteration const ScalarMag r_nrm = Thyra::norm(*r); // Compute ||r|| = sqrt( ) if( r_nrm/r0_nrm < tolerance ) return true; // Converged to tolerance, Success! // Compute iteration const Scalar rho = space->scalarProd(*r,*r); // -> rho if(iter==0) Thyra::assign(&*p,*r); // r -> p (iter == 0) else Thyra::Vp_V( &*p, *r, Scalar(rho/rho_old) ); // r+(rho/rho_old)*p -> p (iter > 0) Thyra::apply(A,Thyra::NOTRANS,*p,&*q); // A*p-> q const Scalar alpha = rho/space->scalarProd(*p,*q);// rho/ -> alpha Thyra::Vp_StV( x, Scalar(+alpha), *p ); // +alpha*p + x -> x Thyra::Vp_StV( &*r, Scalar(-alpha), *q ); // -alpha*q + r -> r rho_old = rho; // rho-> rho_old (remember rho for next iter) } return false; // Failure } // end sillyCgSolve Thyra: sillyCGSolve Generic CG code. To use, provide: Thyra::LinearOp. Thyra::Vector. Works on any parallel machine. Allows mixing of vectors matrices: PETSc & Epetra. Works with any scalar datatype.
8/23/ PyTrilinos PyTrilinos provides Python access to Trilinos packages. Uses SWIG to generate bindings. Epetra, AztecOO, IFPACK, ML, NOX, LOCA, Amesos and NewPackage are support. Possible to: Define RowMatrix implementation in Python. Use from Trilinos C++ code. Performance for large grain is equivalent to C++. Several times hit for very fine grain code.
8/23/ #include "mpi.h" #include "Epetra_MpiComm.h" #include "Epetra_Vector.h" #include "Epetra_Time.h" #include "Epetra_RowMatrix.h" #include "Epetra_CrsMatrix.h" #include "Epetra_Time.h" #include "Epetra_LinearProblem.h" #include "Trilinos_Util_CrsMatrixGallery.h" using namespace Trilinos_Util; int main(int argc, char *argv[]) { MPI_Init(&argc, &argv); Epetra_MpiComm Comm(MPI_COMM_WORLD); int nx = 1000; int ny = 1000 * Comm.NumProc(); CrsMatrixGallery Gallery("laplace_2d", Comm); Gallery.Set("ny", ny); Gallery.Set("nx", nx); Gallery.Set("problem_size",nx*ny); Gallery.Set("map_type", "linear"); Epetra_LinearProblem* Problem = Gallery.GetLinearProblem(); assert (Problem != 0); // retrieve pointers to solution (lhs), right-hand side (rhs) // and matrix itself (A) Epetra_MultiVector* lhs = Problem->GetLHS(); Epetra_MultiVector* rhs = Problem->GetRHS(); Epetra_RowMatrix* A = Problem->GetMatrix(); Epetra_Time Time(Comm); for (int i = 0 ; i < 10 ; ++i) A->Multiply(false, *lhs, *rhs); cout << Time.ElapsedTime() << endl; MPI_Finalize(); return(EXIT_SUCCESS); } // end of main() #! /usr/bin/env python from PyTrilinos import Epetra, Triutils Epetra.Init() Comm = Epetra.PyComm() nx = 1000 ny = 1000 * Comm.NumProc() Gallery = Triutils.CrsMatrixGallery("laplace_2d", Comm) Gallery.Set("nx", nx) Gallery.Set("ny", ny) Gallery.Set("problem_size", nx * ny) Gallery.Set("map_type", "linear") Matrix = Gallery.GetMatrix() LHS = Gallery.GetStartingSolution() RHS = Gallery.GetRHS() Time = Epetra.Time(Comm) for i in xrange(10): Matrix.Multiply(False, LHS, RHS) print Time.ElapsedTime() Epetra.Finalize() C++ vs. Python: Equivalent Code
8/23/ Templates and Generic Programming
8/23/ Investment in Generic Programming 2 nd Generation Trilinos packages are templated on: OrdinalType (think int). ScalarType (think double). Examples: Teuchos::SerialDenseMatrix A; Teuchos::SerialDenseMatrix B; Scalar/Ordinal Traits mechanism completes support for genericity. The following packages support templates: Teuchos (Basic Tools) Thyra (Abstract Interfaces) Tpetra (including MPI support) Belos (Krylov and Block Krylov Linear), IFPACK (algebraic preconditioners, next version), Anasazi (Eigensolvers),
8/23/ Potential Benefits of Templated Types Templated scalar (and ordinal) types have great potential: Generic: Algorithms expressed over abstract field. High Performance: Partial template specialization + Fortran. Facilitate variety of algorithmic studies. Allow study of asymptotic behavior of discretizations. Facilitate debugging: Reduces FP error as source error. Use your imagination… All new Trilinos packages are templated.
8/23/ Kokkos Results: Sparse MV Pentium M 1.6GHz Cygwin/GCC 3.2 (WinXP Laptop) Data SetDimensionNonzeros# RHSMFLOPS DIE3D dday FIDAP
8/23/ ARPREC The ARPREC library uses arrays of 64-bit floating-point numbers to represent high-precision floating-point numbers. ARPREC values behave just like any other floating-point datatype, except the maximum working precision (in decimal digits) must be specified before any calculations are done mp::mp_init(200); Illustrate the use of ARPREC with an example using Hilbert matrices.
8/23/ Hilbert Matrices A Hilbert matrix H N is a square N-by-N matrix such that: For Example:
8/23/ Hilbert Matrices Notoriously ill-conditioned κ(H 3 ) 524 κ(H 5 ) κ(H 10 ) x κ(H 20 ) x κ(H 100 ) x Hilbert matrices introduce large amounts of error
8/23/ Hilbert Matrices and Cholesky Factorization With double-precision arithmetic, Cholesky factorization will fail for H N for all N > 13. Can we improve on this using arbitrary-precision floating-point numbers? PrecisionLargest N for which Cholesky Factorization is successful Single Precision8 Double Precision13 Arbitrary Precision (20)29 Arbitrary Precision (40)40 Arbitrary Precision (200)145 Arbitrary Precision (400)233
8/23/ Software Quality
8/23/ SQA/SQE Software Quality Assurance/Engineering is important. No longer sufficient to say “We do a good job”. Trilinos facilitates SQA/SQE development/processes for packages: 32 of 47 ASCI SQE practices are directly handled by Trilinos (no requirements on packages). Trilinos provides significant support for the remaining 15. Trilinos Dev Guide Part II: Specific to ASCI requirements. Trilinos software engineering policies provide a ready-made infrastructure for new packages. Trilinos philosophy: Few requirements. Instead mostly suggested practices. Provides package with option to provide alternate process.
8/23/ Trilinos ServiceSQE Practices Impact Yearly Trilinos User Group Meeting (TUG) and Developer Forum: Once a year gathering for tutorials, package feature updates, user/developer requirements discussion and developer training. — All Requirements steps: gathering, derivation, documentation, feasibility,etc. — User and Developer training. Monthly Trilinos leaders meetings: Trilinos leaders, including package development leaders, key managers, funding sources and other stakeholders participate in monthly phone meetings to discuss any timely issues related to the Trilinos Project. —Developer Training. —Design reviews. —Policy decisions across all development phases. Trilinos and package mail lists: Trilinos lists for leaders, announcements, developers, users, checkins and similar lists at the package level support a variety of communication. All lists are archived, providing critical artifacts for assessments and audits. —Developer/user/client communication. —Requirements/design/testing artifacts. —Announcement/documenting of releases. Trilinos and Trilinos3PL source repositories: All source code, development and user documentation is retained and tracked. In addition, reference versions of all external software, including BLAS, LAPACK, Umfpack, etc. are retained in Trilinos3PL. —Source management. —Versioning. —Third-party software management. Bugzilla Products: Each package has its own Bugzilla Product with standard components. —Requirements/faults capturing and tracking. Trilinos configure script and M4 macros: The Trilinos configure script and related macros support portable installation of Trilinos and its packages —Portability. —Software release. Trilinos test harness: Trilinos provides a base testing plan and automated testing across multiple platforms, plus creation of testing artifacts. Test harness results are used to derive a variety of metrics for SQE. —Pre-checkin and regression testing. —Software metrics.
8/23/ The Trilinos Tutorial Package Trilinos is large (and still growing…) More than 600K code lines Many packages a lot of functionalities… … and also a lot to learn What is the best way for a new user to learn Trilinos? Using a Trilinos package of course: Didasko!
8/23/ What DIDASKO is not DIDASKO, as a tutorial, cannot cover the most advanced features of Trilinos: Only the “stable” features are covered DIDASKO is not a substitute of each package’s documentation and examples, which remain of fundamental importance
8/23/ Covered Packages At present, we cover EpetraTriutilsIFPACK TeuchosAztecOOML NOXAmesosEpetraExt Anasazi
8/23/ Trilinos Availability/Information Trilinos and related packages are available via LGPL. Current release (5.0) is “click release”. Unlimited availability. 650+ Downloads of Release 5.0 since 3/05 (not including internal Sandia users). 330 registered users: 57% university, 11% industry, 20% gov’t. 35% European, 35% US, 10% Asian. Trilinos Release 6: September Trilinos Awards: 2004 R&D 100 Award. SC2004 HPC Software Challenge Award. Sandia Team Employee Recognition Award. Lockheed-Martin Nova Award Nominee. More information: Additional documentation at my website: 3 rd Annual Trilinos User Group Meeting: November 1-3, 2005 at SNL.
8/23/ Summary Trilinos package model provides a welcoming environment for solver developers: Extremely flexible. Building up package, not making it conform. Scalable growth model for complex multi-physics solver suites. Basic principles can be applied to any similar “project of projects”: Modularity Interfaces Common infrastructure. Trilinos new_package package is self-contained software environment: Package and website useful to any project wanting to ramp up in use of standard tools. Available separately, Open Source. Used by groups outside of Trilinos as well as within.
8/23/ Trilinos Might be of Interest to you if… Your application includes non-PDE equations. Multiple RHS, Block Krylov linear and eigen solvers needed. Interested in multi-physics. App is in C++. Interested in collaborating on Fortran interface via SIDL/Babel. Unstructured Data redistribution is important. Multi-precision useful. Parallel Python (using MPI only) interest.