Download presentation
Presentation is loading. Please wait.
2
8/23/2005 1 Trilinos Tutorial Overview and Basic Concepts Michael A. Heroux Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL85000.
3
8/23/2005 2 Trilinos Development Team Ross Bartlett Lead Developer of Thyra Developer of Rythmos Paul Boggs Developer of Thyra Todd Coffey Lead Developer of Rythmos Jason Cross Developer of Jpetra David Day Developer of Komplex Clark Dohrmann Developer of CLAPS Michael Gee Developer of ML, NOX Bob Heaphy Lead developer of Trilinos SQA Mike Heroux Trilinos Project Leader Lead Developer of Epetra, AztecOO, Kokkos, Komplex, IFPACK, Thyra, Tpetra Developer of Amesos, Belos, EpetraExt, Jpetra Ulrich Hetmaniuk Developer of Anasazi Robert Hoekstra Lead Developer of EpetraExt Developer of Epetra, Thyra, Tpetra Russell Hooper Developer of NOX Vicki Howle Lead Developer of Meros Developer of Belos and Thyra Jonathan Hu Developer of ML Sarah Knepper Developer of Komplex Tammy Kolda Lead Developer of NOX Joe Kotulski Lead Developer of Pliris Rich Lehoucq Developer of Anasazi and Belos Kevin Long Lead Developer of Thyra, Developer of Belos and Teuchos Roger Pawlowski Lead Developer of NOX Michael Phenow Trilinos Webmaster Lead Developer of New_Package Eric Phipps Developer of LOCA and NOX Marzio Sala Lead Developer of Didasko and IFPACK Developer of ML, Amesos Andrew Salinger Lead Developer of LOCA Paul Sexton Developer of Epetra and Tpetra Bill Spotz Lead Developer of PyTrilinos Developer of Epetra, New_Package Ken Stanley Lead Developer of Amesos and New_Package Heidi Thornquist Lead Developer of Anasazi, Belos and Teuchos Ray Tuminaro Lead Developer of ML and Meros Jim Willenbring Developer of Epetra and New_Package. Trilinos library manager Alan Williams Developer of Epetra, EpetraExt, AztecOO, Tpetra
4
8/23/2005 3 Outline of Talk Background/Motivation. Trilinos Package Concepts. ANAs, LALs and APPs. Managing Parallel Data: Petra Object Model. Whirlwind Tour of Some Trilinos Packages. Generic Programming. SQA/SQE Issues. Concluding remarks.
5
8/23/2005 4 Target Problems: PDEs
6
8/23/2005 5 Target Problems: Circuits Pre-solve steps essential. Preconditioned iterative methods used routinely. Partition Circuit ScaleRCM Singleton Filter Partition LinSys
7
8/23/2005 6 Target Problems: Inhomogenous Fluids
8
8/23/2005 7 Motivation For Trilinos Sandia does LOTS of solver work. When I started at Sandia in May 1998: Aztec was a mature package. Used in many codes. FETI, PETSc, DSCPack, Spooles, ARPACK, DASPK, and many other codes were (and are) in use. New projects were underway or planned in multi-level preconditioners, eigensolvers, non-linear solvers, etc… The challenges: Little or no coordination was in place to: Efficiently reuse existing solver technology. Leverage new development across various projects. Support solver software processes. Provide consistent solver APIs for applications. ASCI was forming software quality assurance/engineering (SQA/SQE) requirements: Daunting requirements for any single solver effort to address alone.
9
8/23/2005 8 Evolving Trilinos Solution Trilinos 1 is an evolving framework to address these challenges: Fundamental atomic unit is a package. Includes core set of vector, graph and matrix classes (Epetra/Tpetra packages). Provides a common abstract solver API (Thyra package). Provides a ready-made package infrastructure (new_package package): Source code management (cvs, bonsai). Build tools (autotools). Automated regression testing (queue directories within repository). Communication tools (mailman mail lists). Specifies requirements and suggested practices for package SQA. In general allows us to categorize efforts: Efforts best done at the Trilinos level (useful to most or all packages). Efforts best done at a package level (peculiar or important to a package). Allows package developers to focus only on things that are unique to their package. 1. Trilinos loose translation: “A string of pearls”
10
8/23/2005 9 Trilinos Package Concepts Package: The Atomic Unit
11
8/23/2005 10 Trilinos Packages Trilinos is a collection of Packages. Each package is: Focused on important, state-of-the-art algorithms in its problem regime. Developed by a small team of domain experts. Self-contained: No explicit dependencies on any other software packages (with some special exceptions). Configurable/buildable/documented on its own. Sample packages: NOX, AztecOO, ML, IFPACK, Meros. Special package collections: Petra (Epetra, Tpetra, Jpetra): Concrete Data Objects Thyra: Abstract Conceptual Interfaces Teuchos: Common Tools. New_package: Jumpstart prototype.
12
8/23/2005 11 Greek Names
13
8/23/2005 12 Trilinos Package Summary ObjectivePackage(s) Distributed linear algebra objectsEpetra, Jpetra, Tpetra Krylov solversAztecOO, Belos ILU-type preconditionersAztecOO, IFPACK Multilevel preconditionersML, CLAPS Eigenvalue problemsAnasazi Block preconditionersMeros Direct sparse linear solversAmesos Direct dense solversEpetra, Teuchos, Pliris Abstract interfacesThyra Nonlinear system solversNOX, LOCA Complex linear systemsKomplex C++ utilities, (some) I/OTeuchos, EpetraExt, Kokkos Trilinos TutorialDidasko Python SupportPyTrilinos Time integrators/DAEsRythmos Archetype packageNewPackage Basic Linear Algebra classes Block Krylov Methods Scalable Preconditioners 3 rd Party Direct Solver Package. Will include WSMP and Pardiso interfaces in 6.0 Abstract Interfaces.
14
8/23/2005 13 Why Packages? Let’s Do Some Matrix Analysis!
15
8/23/2005 14 Trilinos Development Team (Again) Ross Bartlett Lead Developer of Thyra Developer of Rythmos Paul Boggs Developer of Thyra Todd Coffey Lead Developer of Rythmos Jason Cross Developer of Jpetra David Day Developer of Komplex Clark Dohrmann Developer of CLAPS Michael Gee Developer of ML, NOX Bob Heaphy Lead developer of Trilinos SQA Mike Heroux Trilinos Project Leader Lead Developer of Epetra, AztecOO, Kokkos, Komplex, IFPACK, Thyra, Tpetra Developer of Amesos, Belos, EpetraExt, Jpetra Ulrich Hetmaniuk Developer of Anasazi Robert Hoekstra Lead Developer of EpetraExt Developer of Epetra, Thyra, Tpetra Russell Hooper Developer of NOX Vicki Howle Lead Developer of Meros Developer of Belos and Thyra Jonathan Hu Developer of ML Sarah Knepper Developer of Komplex Tammy Kolda Lead Developer of NOX Joe Kotulski Lead Developer of Pliris Rich Lehoucq Developer of Anasazi and Belos Kevin Long Lead Developer of Thyra, Developer of Belos and Teuchos Roger Pawlowski Lead Developer of NOX Michael Phenow Trilinos Webmaster Lead Developer of New_Package Eric Phipps Developer of LOCA and NOX Marzio Sala Lead Developer of Didasko and IFPACK Developer of ML, Amesos Andrew Salinger Lead Developer of LOCA Paul Sexton Developer of Epetra and Tpetra Bill Spotz Lead Developer of PyTrilinos Developer of Epetra, New_Package Ken Stanley Lead Developer of Amesos and New_Package Heidi Thornquist Lead Developer of Anasazi, Belos and Teuchos Ray Tuminaro Lead Developer of ML and Meros Jim Willenbring Developer of Epetra and New_Package. Trilinos library manager Alan Williams Developer of Epetra, EpetraExt, AztecOO, Tpetra
16
8/23/2005 15 Developer and Package Node IDs Developer IDs RossBartlett = 1; PaulBoggs = 2; ToddCoffey = 3; JasonCross = 4; DavidDay = 5; ClarkDohrmann= 6; MichaelGee = 7; BobHeaphy = 8; MikeHeroux = 9; UlrichHetmaniuk = 10; RobHoekstra = 11; RussellHooper = 12; VickiHowle = 13; JonathanHu = 14; SarahKnepper = 15; TammyKolda = 16; JoeKotulski = 17; RichLehoucq = 18; KevinLong = 19; RogerPawlowski = 20; MichaelPhenow = 21; EricPhipps = 22; MarzioSala = 23; AndrewSalinger = 24; PaulSexton = 25; BillSpotz = 26; KenStanley = 27; HeidiThornquist = 28; RayTuminaro = 29; JimWillenbring = 30; AlanWilliams = 31; Package IDs PyTrilinos = 1; amesos = 2; anasazi = 3; aztecoo = 4; belos = 5; claps = 6; didasko = 7; epetra = 8; epetraext = 9; ifpack = 10; jpetra = 11; kokkos = 12; komplex = 13; meros = 14; loca = 15; ml = 16; new_package = 17; nox = 18; pliris = 19; rythmos = 20; teuchos = 21; thyra = 22; tpetra = 23; trilinosframework = 24;
17
8/23/2005 16 Developer-Package Edges A(i,j) = 1 if developer i contributes to package j A = sparse(31,24); A(RossBartlett,rythmos) = 1; A(RossBartlett,thyra) = 1; A(PaulBoggs,thyra) = 1; A(ToddCoffey,rythmos) = 1; A(JasonCross,jpetra) = 1; A(DavidDay,komplex) = 1; A(ClarkDohrmann,claps) = 1; A(MichaelGee,ml) = 1; A(MichaelGee,nox) = 1; A(BobHeaphy,trilinosframework) = 1; A(MikeHeroux,trilinosframework) = 1; A(MikeHeroux,epetra) = 1; A(MikeHeroux,aztecoo) = 1; A(MikeHeroux,kokkos) = 1; A(MikeHeroux,komplex) = 1; A(MikeHeroux,ifpack) = 1; A(MikeHeroux,thyra) = 1; A(MikeHeroux,tpetra) = 1; A(MikeHeroux,amesos) = 1; A(MikeHeroux,belos) = 1; A(MikeHeroux,epetraext) = 1; A(MikeHeroux,jpetra) = 1; A(UlrichHetmaniuk,anasazi) = 1; A(MarzioSala,didasko) = 1; A(MarzioSala,ifpack) = 1; A(MarzioSala,ml) = 1; A(MarzioSala,amesos) = 1; A(AndrewSalinger,loca) = 1; A(PaulSexton,epetra) = 1; A(PaulSexton,tpetra) = 1; A(BillSpotz,PyTrilinos) = 1; A(BillSpotz,epetra) = 1; A(BillSpotz,new_package) = 1; A(KenStanley,amesos) = 1; A(KenStanley,new_package) = 1; A(HeidiThornquist,anasazi) = 1; A(HeidiThornquist,belos) = 1; A(HeidiThornquist,teuchos) = 1; A(RayTuminaro,ml) = 1; A(RayTuminaro,meros) = 1; A(JimWillenbring,epetra) = 1; A(JimWillenbring,new_package) = 1; A(JimWillenbring,trilinosframework) = 1; A(AlanWilliams,epetra) = 1; A(AlanWilliams,epetraext) = 1; A(AlanWilliams,aztecoo) = 1; A(AlanWilliams,tpetra) = 1; A(RobHoekstra,epetra) = 1; A(RobHoekstra,thyra) = 1; A(RobHoekstra,tpetra) = 1; A(RobHoekstra,epetraext) = 1; A(RussellHooper,nox) = 1; A(VickiHowle,meros) = 1; A(VickiHowle,belos) = 1; A(VickiHowle,thyra) = 1; A(JonathanHu,ml) = 1; A(SarahKnepper,komplex) = 1; A(TammyKolda,nox) = 1; A(TammyKolda,trilinosframework) = 1; A(JoeKotulski,pliris) = 1; A(RichLehoucq,anasazi) = 1; A(RichLehoucq,belos) = 1; A(KevinLong,thyra) = 1; A(KevinLong,belos) = 1; A(KevinLong,teuchos) = 1; A(RogerPawlowski,nox) = 1; A(MichaelPhenow,trilinosframework) = 1; A(EricPhipps,loca) = 1; A(EricPhipps,nox) = 1;
18
8/23/2005 17 Developer-Package Matrix Number of developers per package: Maximum: max(sum(A)) = 6 Average:sum(sum(A))/24 = 2.875 Number of package affiliations per developer: Maximum: max(sum(A’)) = 12 (me). Minus outlier: = 4. Average: sum(sum(A’))/31 = 2.26 Observations: Small package teams. Multiple packages per developer. spy(A);
19
8/23/2005 18 Developer-Developer Matrix (A*A’) B = A*A’ Apply Symmetric AMD reordering. Two developers connected if co- authors of a package. Only two developers “disconnected”: Clark Dohrmann: CLAPS. Joe Kotulski: Pliris. Komplex Subteam ML Subteam Anasazi Subteam Epetra Subteam NOX Subteam Thyra Subteam
20
8/23/2005 19 Package-Package Matrix (A’*A) (Not sure what to say about this) B = A’*A, Apply symamd. Two packages connected if they share a developer. Only two packages “disconnected”: Clark Dohrmann: CLAPS. Joe Kotulski: Pliris. Me Epetra, Amesos, Didasko, Ifpack Without Me Meros, AztecOO, Tpetra, EpetraExt
21
8/23/2005 20 Why Packages? Partial Answer: Small Inter-related teams.
22
8/23/2005 21 Package Interoperability
23
8/23/2005 22 Interoperability vs. Dependence (“Can Use”) (“Depends On”) Although most Trilinos packages have no explicit dependence, each package must interact with some other packages: NOX needs operator, vector and solver objects. AztecOO needs preconditioner, matrix, operator and vector objects. Interoperability is enabled at configure time. For example, NOX: --enable-nox-lapack compile NOX lapack interface libraries --enable-nox-epetra compile NOX epetra interface libraries --enable-nox-petsc compile NOX petsc interface libraries Trilinos configure script is vehicle for: Establishing interoperability of Trilinos components… Without compromising individual package autonomy. Trilinos offers seven basic interoperability mechanisms.../configure –enable-python
24
8/23/2005 23 Trilinos Interoperability Mechanisms (Acquired as Package Matures) Package builds under Trilinos configure scripts. Package can be built as part of a suite of packages; cross-package interfaces enable/disable automatically Package accepts user data as Epetra or Thyra objects Applications using Epetra/Thyra can use package Package accepts parameters from Teuchos ParameterLists Applications using Teuchos ParameterLists can drive package Package can be used via Thyra abstract solver classes Applications or other packages using Thyra can use package Package can use Epetra for private data. Package can then use other packages that understand Epetra Package accesses solver services via Thyra interfaces Package can then use other packages that implement Thyra interfaces Package available via PyTrilinos Package can be used with other Trilinos packages via Python.
25
8/23/2005 24 “Can Use” vs. “Depends On” “Can Use” Interoperable without dependence. Dense is Good. Encouraged. “Depends On” OK, if essential. Epetra: 9 clients. Thyra, Teuchos, NOX: 2 clients. Discouraged.
26
8/23/2005 25 Interoperability Example: ML ML: Multi-level Preconditioner Package. Primary Developers: Ray Tuminaro, Jonathan Hu, Marzio Sala. No explicit, essential dependence on other Trilinos packages. Uses abstract interfaces to matrix/operator objects. Has independent configure/build process (but can be invoked at Trilinos level). Interoperable with other Trilinos packages and other libraries: Accepts user data as Epetra matrices/vectors. Can use Epetra for internal matrices/vectors. Can be used via Thyra abstract interfaces. Can be built via Trilinos configure/build process. Available as preconditioner to all other Trilinos packages. Can use IFPACK, Amesos, AztecOO objects as smoothers, coarse solvers. Can be driven via Teuchos ParameterLists. Available to PETSc users without dependence on any other Trilinos packages.
27
8/23/2005 26 Package Maturation Process Asynchronicity
28
8/23/2005 27 Day 1 of Package Life CVS: Each package is self-contained in Trilinos/package/ directory. Bugzilla: Each package has its own Bugzilla product. Bonsai: Each package is browsable via Bonsai interface. Mailman: Each Trilinos package, including Trilinos itself, has four mail lists: package-checkins@software.sandia.gov CVS commit emails. “Finger on the pulse” list. package-developers@software.sandia.gov Mailing list for developers. package-users@software.sandia.gov Issues for package users. package-announce@software.sandia.gov Releases and other announcements specific to the package. New_package (optional): Customizable boilerplate for Autoconf/Automake/Doxygen/Python/Thyra/Epetra/TestHarness/Website
29
8/23/2005 28 Sample Package Maturation Process StepExample Package added to CVS: Import existing code or start with new_package. ML CVS repository migrated into Trilinos (July 2002). Mail lists, Bugzilla Product, Bonsai database created. ml-announce, ml-users, ml-developers, ml-checkins, ml- regression @software.sandia.gov created, linked to CVS (July 2002). Package builds with configure/make, Trilinos- compatible ML adopts Autoconf, Automake starting from new_package (June 2003). Epetra objects recognized by package.ML accepts user data as Epetra matrices and vectors (October 2002). Package accessible via Thyra interfaces.ML adaptors written for TSFCore_LinOp (Thyra) interface (May 2003). Package uses Epetra for internal data.ML able to generate Epetra matrices. Allows use of AztecOO, Amesos, Ifpack, etc. as smoothers and coarse grid solvers (Feb- June 2004). Package parameters settable via Teuchos ParameterList ML gets manager class, driven via ParameterLists (June 2004). Package usable from Python (PyTrilinos)ML Python wrappers written using new_package template (April 2005). Startup StepsMaturation Steps
30
8/23/2005 29 Maturation Jumpstart: NewPackage NewPackage provides jump start to develop/integrate a new package NewPackage is a “Hello World” program and website: Simple but it does work with autotools. Compiles and builds. NewPackage directory contains: Commonly used directory structure: src, test, doc, example, config. Working Autoconf/Automake files. Documentation templates (doxygen). Working regression test setup. Working Python and Thyra adaptors. Substantially cuts down on: Time to integrate new package. Variation in package integration details. Development of website. NOTE: NewPackage can be used independent from Trilinos
31
8/23/2005 30 What Trilinos is not Trilinos is not a single monolithic piece of software. Each package: Can be built independent of Trilinos. Has its own self-contained CVS structure. Has its own Bugzilla product and mail lists. Development team is free to make its own decisions about algorithms, coding style, release contents, testing process, etc. Trilinos top layer is not a large amount of source code: Trilinos repository (6.0 branch) contains: 660,378 source lines of code (SLOC). Sum of the packages SLOC counts :648,993. Trilinos top layer SLOC count: 11,385 (1.7%). Trilinos is not “indivisible”: You don’t need all of Trilinos to get things done. Any collection of packages can be combined and distributed. Current public release contains only 16 of the 20+ Trilinos packages.
32
8/23/2005 31 Insight from History A Philosophy for Future Directions In the early 1800’s U.S. had many new territories. Question: How to incorporate into U.S.? Colonies? No. Expand boundaries of existing states? No. Create process for self-governing regions. Yes. Theme: Local control drawing on national resources. Trilinos package architecture has some similarities: Asynchronous maturation. Packages decide degree of interoperations, use of Trilinos facilities. Strength of each: Scalable growth with local control.
33
8/23/2005 32 Solver Collaboration: The Big Picture
34
8/23/2005 33 ANA Linear Operator Interface Solver Software Components and Interfaces 2) LAL : Linear Algebra Library (e.g. vectors, sparse matrices, sparse factorizations, preconditioners) ANA APP ANA/APP Interface ANA Vector Interface 1) ANA : Abstract Numerical Algorithm (e.g. linear solvers, eigen solvers, nonlinear solvers, stability analysis, uncertainty quantification, transient solvers, optimization etc.) 3) APP : Application (the model: physics, discretization method etc.) Example Trilinos Packages: Belos (linear solvers) Anasazi (eigen solvers) NOX (nonlinear equations) Rhythmos (ODEs,DAEs) MOOCHO (Optimization) … Example Trilinos Packages: Epetra/Tpetra (Mat,Vec) Ifpack, AztecOO, ML (Preconditioners) Meros (Preconditioners) Pliris (Interface to direct solvers) Amesos (Direct solvers) Komplex (Complex/Real forms) … Types of Software Components Thyra ANA Interfaces to Linear Algebra FEI/Thyra APP to LAL Interfaces Custom/Thyra LAL to LAL Interfaces Thyra::Nonlin Examples: SIERRA NEVADA Xyce Sundance … LAL Matrix Preconditioner Vector
35
8/23/2005 34 LAL Foundation: Petra Petra provides a “common language” for distributed linear algebra objects (operator, matrix, vector) Petra provides distributed matrix and vector services. Has 3 implementations under development.
36
8/23/2005 35 Perform redistribution of distributed objects: Parallel permutations. “Ghosting” of values for local computations. Collection of partial results from remote processors. Petra Object Model Abstract Interface to Parallel Machine Shameless mimic of MPI interface. Keeps MPI dependence to a single class (through all of Trilinos!). Allow trivial serial implementation. Opens door to novel parallel libraries (shmem, UPC, etc…) Abstract Interface for Sparse All-to-All Communication Supports construction of pre-recorded “plan” for data-driven communications. Examples: Supports gathering/scatter of off-processor x/y values when computing y = Ax. Gathering overlap rows for Overlapping Schwarz. Redistribution of matrices, vectors, etc… Describes layout of distributed objects: Vectors: Number of vector entries on each processor and global ID Matrices/graphs: Rows/Columns managed by a processor. Called “Maps” in Epetra. Dense Distributed Vector and Matrices: Simple local data structure. BLAS-able, LAPACK-able. Ghostable, redistributable. RTOp-able. Base Class for All Distributed Objects: Performs all communication. Requires Check, Pack, Unpack methods from derived class. Graph class for structure-only computations: Reusable matrix structure. Pattern-based preconditioners. Pattern-based load balancing tools. Basic sparse matrix class: Flexible construction process. Arbitrary entry placement on parallel machine.
37
8/23/2005 36 Petra Implementations Three version under development: Epetra (Essential Petra): Current production version. Restricted to real, double precision arithmetic. Uses stable core subset of C++ (circa 2000). Interfaces accessible to C and Fortran users. Tpetra (Templated Petra): Next generation C++ version. Templated scalar and ordinal fields. Uses namespaces, and STL: Improved usability/efficiency. Jpetra (Java Petra): Pure Java. Portable to any JVM. Interfaces to Java versions of MPI, LAPACK and BLAS via interfaces.
38
8/23/2005 37 Data Model Wrap-Up Our target apps require flexible data model. Petra Object Model (POM) supports: Arbitrary placement of vector, graph, matrix entries on parallel machine. Arbitrary redistribution, ghosting and collection of distributed data. This flexibility is needed by LALs: Algebraic and multi-level preconditioners. Concrete distributed matrix kernels. Direct methods. POM is complex: Non-LALs (ANAs) do not rely on it.
39
8/23/2005 38 Abstract Numerical Algorithms (ANAs)
40
8/23/2005 39 Categories of Abstract Problems and Abstract Algorithms · Linear Problems: · Linear equations: · Eigen problems: · Nonlinear Problems: · Nonlinear equations: · Stability analysis: · Transient Nonlinear Problems: · DAEs/ODEs: · Optimization Problems: · Unconstrained: · Constrained: Trilinos Packages Belos Anasazi NOX LOCA Rythmos MOOCHO
41
8/23/2005 40 Introducing Abstract Numerical Algorithms An ANA is a numerical algorithm that can be expressed abstractly solely in terms of vectors, vector spaces, and linear operators (i.e. not with direct element access) Example Linear ANA (LANA) : Linear Conjugate Gradients scalar product defined by vector space vector-vector operations linear operator applications Scalar operations Types of operationsTypes of objects What is an abstract numerical algorithm (ANA)? Linear Conjugate Gradient Algorithm
42
8/23/2005 41 Examples of Nonlinear Abstract Numerical Algorithms Class of Numerical ProblemExample ANA Nonlinear equations Newton’s method (e.g. NOX, MOOCHO) Initial Value DAE/ODE Backward Euler method (e.g. Rythmos?) Linear equations GMRES (e.g. Belos) Requires
43
8/23/2005 42 Basic Thyra Vector and Operator Interfaces LinearOps apply to Vectors An operator knows its domain and range spaces A linear operator is a kind of operator A Vector knows its VectorSpace > VectorSpaces create Vectors!
44
8/23/2005 43 Basic Thyra Vector and Operator Interfaces > Basically compatible with HCL (Symes et al.) and many other operator/vector interfaces Near optimal for many but not all abstract numerical algorithms (ANAs) What’s missing? => Multi-vectors!
45
8/23/2005 44 Introducing Multi-Vectors What is a multi-vector? An m multi-vector V is a tall thin dense matrix composed of m column vectors v j Example: m = 4 columns V == What ANAs can exploit multi-vectors? Block linear solvers (e.g. block GMRES): Block eigen solvers (i.e. block Arnoldi): Compact limited memory quasi-Newton Tensor methods for nonlinear equations Why are multi-vectors important? Cache performance Reduce global communication Examples of multi-vector operations = Y = A X Q = X T Y Y = Y + X Q Block dot products (m 2 ) Block update Operator applications (i.e. mat-vecs) =+ =
46
8/23/2005 45 Basic Thyra Vector and Operator Interfaces > Where do multi-vectors fit in?
47
8/23/2005 46 Basic Thyra Vector and Operator Interfaces A MulitVector is a linear operator A MulitVector has a collection of column vectors A MulitVector is a tall thin dense matrix > VectorSpaces create MultiVectors LinearOps apply to MultiVectors A Vector is a MultiVector
48
8/23/2005 47 Basic Thyra Vector and Operator Interfaces What about standard vector ops? Reductions (norm, dot etc.)? Transformations (axpy, scaling etc.)? What about specialized vector ops? e.g. Interior point methods for opt >
49
8/23/2005 48 What’s the Big Deal about Vector-Vector Operations? Examples from OOQP (Gertz, Wright) Example from TRICE (Dennis, Heinkenschloss, Vicente) Example from IPOPT (Waechter) Currently in MOOCHO : > 40 vector operations! Many different and unusual vector operations are needed by interior point methods for optimization!
50
8/23/2005 49 Vector Reduction/Transformation Operators Defined Reduction/Transformation Operators (RTOp) Defined z 1 i … z q i op t ( i, v 1 i … v p i, z 1 i … z q i ) element-wise transformation op r ( i, v 1 i … v p i, z 1 i … z q i ) element-wise reduction 2 op rr ( 1, 2 ) reduction of intermediate reduction objects v 1 … v p R n : p non-mutable (constant) input vectors z 1 … z q R n : q mutable (non-constant) input/output vectors : reduction target object (many be non-scalar (e.g. {y k,k}), or NULL) Key to Optimal Performance op t (…) and op r (…) applied to entire sets of subvectors (i = a…b) independently: z 1 a:b … z q a:b, op( a, b, v 1 a:b … v p a:b, z 1 a:b … z q a:b, ) Communication between sets of subvectors only for NULL, op rr ( 1, 2 ) 2 RTOpT apply_op(in sv[1..p], inout sz[1…q], inout beta ) ROpDotProd apply_op(… ) TOpAssignVectors apply_op(… ) … Vector applyOp( in : RTOpT, apply_op(in v[1..p], inout z[1…q], inout beta) SerialVectorStd applyOp( … ) MPIVectorStd applyOp( … ) Software Implementation (see RTOp paper on Trilinos website “Publications”) “Visitor” design pattern (a.k.a. function forwarding) …
51
8/23/2005 50 Basic Thyra Vector and Operator Interfaces The missing piece: Reduction/Transformation Operators Supports all needed vector operations Data/parallel independence Optimal performance R. A. Bartlett, B. G. van Bloemen Waanders and M. A. Heroux. Vector Reduction/Transformation Operators, ACM TOMS, March 2004
52
8/23/2005 51 Managing Parallel Data The Petra Object Model
53
8/23/2005 52 A Simple Epetra/AztecOO Program // Header files omitted… int main(int argc, char *argv[]) { MPI_Init(&argc,&argv); // Initialize MPI, MpiComm Epetra_MpiComm Comm( MPI_COMM_WORLD ); // ***** Create x and b vectors ***** Epetra_Vector x(Map); Epetra_Vector b(Map); b.Random(); // Fill RHS with random #s // ***** Create an Epetra_Matrix tridiag(-1,2,-1) ***** Epetra_CrsMatrix A(Copy, Map, 3); double negOne = -1.0; double posTwo = 2.0; for (int i=0; i<NumMyElements; i++) { int GlobalRow = A.GRID(i); int RowLess1 = GlobalRow - 1; int RowPlus1 = GlobalRow + 1; if (RowLess1!=-1) A.InsertGlobalValues(GlobalRow, 1, &negOne, &RowLess1); if (RowPlus1!=NumGlobalElements) A.InsertGlobalValues(GlobalRow, 1, &negOne, &RowPlus1); A.InsertGlobalValues(GlobalRow, 1, &posTwo, &GlobalRow); } A.FillComplete(); // Transform from GIDs to LIDs // ***** Map puts same number of equations on each pe ***** int NumMyElements = 1000 ; Epetra_Map Map(-1, NumMyElements, 0, Comm); int NumGlobalElements = Map.NumGlobalElements(); // ***** Report results, finish *********************** cout << "Solver performed " << solver.NumIters() << " iterations." << endl << "Norm of true residual = " << solver.TrueResidual() << endl; MPI_Finalize() ; return 0; } // ***** Create/define AztecOO instance, solve ***** AztecOO solver(problem); solver.SetAztecOption(AZ_precond, AZ_Jacobi); solver.Iterate(1000, 1.0E-8); // ***** Create Linear Problem ***** Epetra_LinearProblem problem(&A, &x, &b);
54
8/23/2005 53 Typical Flow of Epetra Object Construction Construct Comm Construct Map Construct x Construct b Construct A Any number of Comm objects can exist. Comms can be nested (ee.g., serial within MPI). Maps describe parallel layout. Maps typically associated with more than one comp object. Two maps (source and target) define an export/import object. Computational objects. Compatibility assured via common map.
55
8/23/2005 54 Parallel Data Redistribution Epetra vectors, multivectors, graphs and matrices are distributed via one of the map objects. A map is basically a partitioning of a list of global IDs: IDs are simply labels, no need to use contiguous values (Directory class handles details for general ID lists). No a priori restriction on replicated IDs. If we are given: A source map and A set of vectors, multivectors, graphs and matrices (or other distributable objects) based on source map. Redistribution is performed by: 1.Specifying a target map with a new distribution of the global IDs. 2.Creating Import or Export object using the source and target maps. 3.Creating vectors, multivectors, graphs and matrices that are redistributed (to target map layout) using the Import/Export object.
56
8/23/2005 55 Example: epetra/ex9.cpp int main(int argc, char *argv[]) { MPI_Init(&argc, &argv); Epetra_MpiComm Comm(MPI_COMM_WORLD); int NumGlobalElements = 4; // global dimension of the problem int NumMyElements; // local nodes Epetra_IntSerialDenseVector MyGlobalElements; if( Comm.MyPID() == 0 ) { NumMyElements = 3; MyGlobalElements.Size(NumMyElements); MyGlobalElements[0] = 0; MyGlobalElements[1] = 1; MyGlobalElements[2] = 2; } else { NumMyElements = 3; MyGlobalElements.Size(NumMyElements); MyGlobalElements[0] = 1; MyGlobalElements[1] = 2; MyGlobalElements[2] = 3; } // create a map Epetra_Map Map(-1,MyGlobalElements.Length(), MyGlobalElements.Values(),0, Comm); // create a vector based on map Epetra_Vector xxx(Map); for( int i=0 ; i<NumMyElements ; ++i ) xxx[i] = 10*( Comm.MyPID()+1 ); if( Comm.MyPID() == 0 ){ double val = 12; int pos = 3; xxx.SumIntoGlobalValues(1,0,&val,&pos); } cout << xxx; // create a target map, in which all elements are on proc 0 int NumMyElements_target; if( Comm.MyPID() == 0 ) NumMyElements_target = NumGlobalElements; else NumMyElements_target = 0; Epetra_Map TargetMap(-1,NumMyElements_target,0,Comm); Epetra_Export Exporter(Map,TargetMap); // work on vectors Epetra_Vector yyy(TargetMap); yyy.Export(xxx,Exporter,Add); cout << yyy; MPI_Finalize(); return( EXIT_SUCCESS ); }
57
8/23/2005 56 Output: epetra/ex9.cpp > mpirun -np 2./ex9.exe Epetra::Vector MyPID GID Value 0 0 10 0 1 10 0 2 10 Epetra::Vector 1 1 20 1 2 20 1 3 20 Epetra::Vector MyPID GID Value 0 0 10 0 1 30 0 2 30 0 3 20 Epetra::Vector PE 0 xxx(0)=10 xxx(1)=10 xxx(2)=10 PE 1 xxx(1)=20 xxx(2)=20 xxx(3)=20 PE 0 yyy(0)=10 yyy(1)=30 yyy(2)=30 yyy(3)=20 PE 1 Export/Add Before ExportAfter Export
58
8/23/2005 57 Import vs. Export Import (Export) means calling processor knows what it wants to receive (send). Distinction between Import/Export is important to user, almost identical in implementation. Import (Export) objects can be used to do an Export (Import) as a reverse operation. When mapping is bijective (1-to-1 and onto), either Import or Export is appropriate.
59
8/23/2005 58 Example: 1D Matrix Assembly a b -u xx = f u(a) = 0 u(b) = 1 x1x1 x2x2 x3x3 PE 0PE 1 3 Equations: Find u at x 1, x 2 and x 3 Equation for u at x 2 gets a contribution from PE 0 and PE 1. Would like to compute partial contributions independently. Then combine partial results.
60
8/23/2005 59 Two Maps We need two maps: Assembly map: PE 0: { 1, 2 }. PE 1: { 2, 3 }. Solver map: PE 0: { 1, 2 } (we arbitrate ownership of 2). PE 1: { 3 }.
61
8/23/2005 60 End of Assembly Phase At the end of assembly phase we have AssemblyMatrix: On PE 0: On PE 1: Want to assign all of Equation 2 to PE 0 for use with solver. NOTE: For a class of Neumann-Neumann preconditioners, the above layout is exactly what we want. Equation 1: Equation 2: Equation 2: Equation 3: Row 2 is shared
62
8/23/2005 61 Export Assembly Matrix to Solver Matrix Epetra_Export Exporter(AssemblyMap, SolverMap); Epetra_CrsMatrix SolverMatrix (Copy, SolverMap, 0); SolverMatrix.Export(AssemblyMatrix, Exporter, Add); SolverMatrix.FillComplete();
63
8/23/2005 62 Matrix Export Equation 1: Equation 2: Equation 3: Equation 1: Equation 2: Equation 2: Equation 3: PE 0 PE 1 Before Export After Export Export/Add
64
8/23/2005 63 Example: epetraext/ex2.cpp int main(int argc, char *argv[]) { MPI_Init(&argc,&argv); Epetra_MpiComm Comm (MPI_COMM_WORLD); int MyPID = Comm.MyPID(); int n=4; // Generate Laplacian2d gallery matrix Trilinos_Util::CrsMatrixGallery G("laplace_2d", Comm); G.Set("problem_size", n*n); G.Set("map_type", "linear"); // Linear map initially // Get the LinearProblem. Epetra_LinearProblem *Prob = G.GetLinearProblem(); // Get the exact solution. Epetra_MultiVector *sol = G.GetExactSolution(); // Get the rhs (b) and lhs (x) Epetra_MultiVector *b = Prob->GetRHS(); Epetra_MultiVector *x = Prob->GetLHS(); // Repartition graph using Zoltan EpetraExt::Zoltan_CrsGraph * ZoltanTrans = new EpetraExt::Zoltan_CrsGraph(); EpetraExt::LinearProblem_GraphTrans * ZoltanLPTrans = new EpetraExt::LinearProblem_GraphTrans( *(dynamic_cast *>(ZoltanTrans)) ); cout << "Creating Load Balanced Linear Problem\n"; Epetra_LinearProblem &BalancedProb = (*ZoltanLPTrans)(*Prob); // Get the rhs (b) and lhs (x) Epetra_MultiVector *Balancedb = Prob->GetRHS(); Epetra_MultiVector *Balancedx = Prob->GetLHS(); cout << "Balanced b: " << *Balancedb << endl; cout << "Balanced x: " << *Balancedx << endl; MPI_Finalize() ; return 0 ; }
65
8/23/2005 64 Need for Import/Export Solvers for complex engineering applications need expressive, easy-to-use parallel data redistribution: Allows better scaling for non-uniform overlapping Schwarz. Necessary for robust solution of multiphysics problems. We have found import and export facilities to be a very natural and powerful technique to address these issues.
66
8/23/2005 65 Details about Epetra Maps Getting beyond standard use case…
67
8/23/2005 66 1-to-1 Maps 1-to-1 map (defn): A map is 1-to-1 if each GID appears only once in the map (and is therefore associated with only a single processor). Certain operations in parallel data repartitioning require 1- to-1 maps. Specifically: The source map of an import must be 1-to-1. The target map of an export must be 1-to-1. The domain map of a 2D object must be 1-to-1. The range map of a 2D object must be 1-to-1.
68
8/23/2005 67 2D Objects: Four Maps Epetra 2D objects: CrsMatrix CrsGraph VbrMatrix Have four maps: RowMap: On each processor, the GIDs of the rows that processor will “manage”. ColMap: On each processor, the GIDs of the columns that processor will “manage”. DomainMap: The layout of domain objects (the x vector/multivector in y=Ax). RangeMap: The layout of range objects (the y vector/multivector in y=Ax). Must be 1-to-1 maps!!! Typically a 1-to-1 map Typically NOT a 1-to-1 map
69
8/23/2005 68 Sample Problem = yA x
70
8/23/2005 69 Case 1: Standard Approach RowMap = {0, 1} ColMap = {0, 1, 2} DomainMap = {0, 1} RangeMap = {0, 1} First 2 rows of A, elements of y and elements of x, kept on PE 0. Last row of A, element of y and element of x, kept on PE 1. PE 0 ContentsPE 1 Contents RowMap = {2} ColMap = {1, 2} DomainMap = {2} RangeMap = {2} Notes: Rows are wholly owned. RowMap=DomainMap=RangeMap (all 1-to-1). ColMap is NOT 1-to-1. Call to FillComplete: A.FillComplete(); // Assumes = yAx Original Problem
71
8/23/2005 70 Case 2: Twist 1 RowMap = {0, 1} ColMap = {0, 1, 2} DomainMap = {1, 2} RangeMap = {0} First 2 rows of A, first element of y and last 2 elements of x, kept on PE 0. Last row of A, last 2 element of y and first element of x, kept on PE 1. PE 0 ContentsPE 1 Contents RowMap = {2} ColMap = {1, 2} DomainMap = {0} RangeMap = {1, 2} Notes: Rows are wholly owned. RowMap is NOT = DomainMap is NOT = RangeMap (all 1-to-1). ColMap is NOT 1-to-1. Call to FillComplete: A.FillComplete(DomainMap, RangeMap); = yAx Original Problem
72
8/23/2005 71 Case 2: Twist 2 RowMap = {0, 1} ColMap = {0, 1} DomainMap = {1, 2} RangeMap = {0} First row of A, part of second row of A, first element of y and last 2 elements of x, kept on PE 0. Last row, part of second row of A, last 2 element of y and first element of x, kept on PE 1. PE 0 ContentsPE 1 Contents RowMap = {1, 2} ColMap = {1, 2} DomainMap = {0} RangeMap = {1, 2} Notes: Rows are NOT wholly owned. RowMap is NOT = DomainMap is NOT = RangeMap (all 1-to-1). RowMap and ColMap are NOT 1-to-1. Call to FillComplete: A.FillComplete(DomainMap, RangeMap); = yAx Original Problem
73
8/23/2005 72 Overview of Trilinos Packages
74
8/23/2005 73 1 Petra is Greek for “foundation”. Trilinos Common Language: Petra Petra provides a “common language” for distributed linear algebra objects (operator, matrix, vector) Petra 1 provides distributed matrix and vector services. Exists in basic form as an object model: Describes basic user and support classes in UML, independent of language/implementation. Describes objects and relationships to build and use matrices, vectors and graphs. Has 3 implementations under development.
75
8/23/2005 74 Petra Implementations Three version under development: Epetra (Essential Petra): Current production version. Restricted to real, double precision arithmetic. Uses stable core subset of C++ (circa 2000). Interfaces accessible to C and Fortran users. Tpetra (Templated Petra): Next generation C++ version. Templated scalar and ordinal fields. Uses namespaces, and STL: Improved usability/efficiency. Jpetra (Java Petra): Pure Java. Portable to any JVM. Interfaces to Java versions of MPI, LAPACK and BLAS via interfaces. Developers: Mike Heroux, Rob Hoekstra, Alan Williams, Paul Sexton
76
8/23/2005 75 Epetra Basic Stuff: What you would expect. Variable block matrix data structures. Multivectors. Arbitrary index labeling. Flexible, versatile parallel data redistribution. Language support for inheritance, polymorphism and extensions. View vs. Copy.
77
8/23/2005 76 AztecOO Krylov subspace solvers: CG, GMRES, Bi-CGSTAB,… Incomplete factorization preconditioners Aztec is the workhorse solver at Sandia: Extracted from the MPSalsa reacting flow code. Installed in dozens of Sandia apps. 1900+ external licenses. AztecOO improves on Aztec by: Using Epetra objects for defining matrix and RHS. Providing more preconditioners/scalings. Using C++ class design to enable more sophisticated use. AztecOO interfaces allows: Continued use of Aztec for functionality. Introduction of new solver capabilities outside of Aztec. Developers: Mike Heroux, Alan Williams, Ray Tuminaro
78
8/23/2005 77 IFPACK: Algebraic Preconditioners Overlapping Schwarz preconditioners with incomplete factorizations, block relaxations, block direct solves. Accept user matrix via abstract matrix interface (Epetra versions). Uses Epetra for basic matrix/vector calculations. Supports simple perturbation stabilizations and condition estimation. Separates graph construction from factorization, improves performance substantially. Compatible with AztecOO, ML, Amesos. Can be used by NOX and ML. Developers: Marzio Sala, Mike Heroux
79
8/23/2005 78 Amesos Interface to direct solver for distributed sparse linear systems Challenges: No single solver dominates Different interfaces and data formats, serial and parallel Interface often change between revisions Amesos offers: A single, clear, consistent interface, to various packages Common look-and-feel for all classes Separation from specific solver details Use serial and distributed solvers; Amesos takes care of data redistribution Developers: Ken Stanley, Marzio Sala, Tim Davis
80
8/23/2005 79 Epetra_LinearProblem Problem( A, x, b); Amesos Afact; Amesos_BaseSolver* Solver = Afact.Create( “Klu”, Problem ) ; Solver->SymbolicFactorization( ) ; Solver->NumericFactorization( ) ; Solver->Solve( ) ; Calling Amesos
81
8/23/2005 80 Amesos: Supported Libraries Library NameLanguagecommunicator # procs for solution 1 KLUCSerial1 UMFPACKCSerial1 SuperLU 3.0CSerial1 SuperLU_DIST 2.0CMPIany MUMPS 4.3.1F90MPIany ScaLAPACKF77MPIany 6.0 Version: DSCPACK, PARDISO, TAUCS, WSMP 1 Matrices can be distributed over any number of processes
82
8/23/2005 81 ML: Multi-level Preconditioners Smoothed aggregation, multigrid and domain decomposition preconditioning package Critical technology for scalable performance of some key apps. ML compatible with other Trilinos packages: Accepts user data as Epetra_RowMatrix object (abstract interface). Any implementation of Epetra_RowMatrix works. Implements the Epetra_Operator interface. Allows ML preconditioners to be used with AztecOO and TSF. Can also be used completely independent of other Trilinos packages. Developers: Ray Tuminaro, Jonathan Hu, Marzio Sala
83
8/23/2005 82 NOX: Nonlinear Solvers Suite of nonlinear solution methods NOX uses abstract vector and “group” interfaces: Allows flexible selection and tuning of various strategies: Directions. Line searches. Epetra/AztecOO/ML, LAPACK, PETSc implementations of abstract vector/group interfaces. Designed to be easily integrated into existing applications. Developers: Tammy Kolda, Roger Pawlowski
84
8/23/2005 83 LOCA Library of continuation algorithms Provides Zero order continuation First order continuation Arc length continuation Multi-parameter continuation (via Henderson's MF Library) Turning point continuation Pitchfork bifurcation continuation Hopf bifurcation continuation Phase transition continuation Eigenvalue approximation (via ARPACK or Anasazi) Developers: Andy Salinger, Eric Phipps
85
8/23/2005 84 EpetraExt: Extensions to Epetra Library of useful classes not needed by everyone Most classes are types of “transforms”. Examples: Graph/matrix view extraction. Epetra/Zoltan interface. Explicit sparse transpose. Singleton removal filter, static condensation filter. Overlapped graph constructor, graph colorings. Permutations. Sparse matrix-matrix multiply. Matlab, MatrixMarket I/O functions. … Most classes are small, useful, but non-trivial to write. Developer: Robert Hoekstra, Alan Williams, Mike Heroux
86
8/23/2005 85 Teuchos Utility package of commonly useful tools: ParameterList class: key/value pair database, recursive capabilities. LAPACK, BLAS wrappers (templated on ordinal and scalar type). Dense matrix and vector classes (compatible with BLAS/LAPACK). FLOP counters, Timers. Ordinal, Scalar Traits support: Definition of ‘zero’, ‘one’, etc. Reference counted pointers, and more… Takes advantage of advanced features of C++: Templates Standard Template Library (STL) ParameterList: Allows easy control of solver parameters. Developers: Roscoe Barlett, Kevin Long, Heidi Thorquist, Mike Heroux, Paul Sexton, Kris Kampshoff
87
8/23/2005 86 New Packages: Belos and Anasazi Next generation linear solvers (Belos) and eigensolvers (Anasazi) libraries, written in templated C++. Provide a generic interface to a collection of algorithms for solving large-scale linear problems and eigenproblems. Algorithm implementation is accomplished through the use of abstract base classes (mini interface). Interfaces are derived from these base classes to operator-vector products, status tests, and any arbitrary linear algebra library. Includes block linear solvers and eigensolvers. Developers: Heidi Thorquist, Mike Heroux, Chris Baker, Rich Lehoucq
88
8/23/2005 87 Kokkos Goal: Isolate key non-BLAS kernels for the purposes of optimization. Kernels: Dense vector/multivector updates and collective ops (not in BLAS/Teuchos). Sparse MV, MM, SV, SM. Serial-only for now. Reference implementation provided (templated). Mechanism for improving performance: Default is aggressive compilation of reference source. BeBOP: Jim Demmel, Kathy Yelick, Rich Vuduc, UC Berkeley. Vector version: Cray. Developer: Mike Heroux
89
8/23/2005 88 NewPackage Package NewPackage provides jump start to develop/integrate a new package NewPackage is a “Hello World” program and website: Simple but it does work with autotools. Compiles and builds. NewPackage directory contains: Commonly used directory structure: src, test, doc, example, config. Working autotools files. Documentation templates (doxygen). Working regression test setup. Substantially cuts down on: Time to integrate new package. Variation in package integration details. Development of website.
90
8/23/2005 89 Two Recent Package Additions
91
8/23/2005 90 CLAPS Package Clark Dohrmann: Expert in computational structural mechanics. Unfamiliar with parallel computing. Limited C++ experience. Epetra: Advanced parallel basic linear algebra package. Supports sparse, dense linear algebra and parallel data redistribution. Clark+Epetra: CLOP: Overlapping Schwarz preconditioner. Scales with increasing number of constraints. sleg[1,2,4]x joint models, 12 modes, vplant cluster (previously unsolvable). # DOF#Constraints# procsTime 33,1454,956 411m 36s 227,17018,366 2429m 22s 1,674,50270,566 14429m 22s 3 Months to develop. Depends only on Epetra. Compatible with rest of Trilinos. CLAPS=Clark’s Linear Algebra Suite
92
8/23/2005 91 Pliris Package Migration of Linpack benchmark code to supported environment. Used new_package as starting point. Put under source control for the first time. Portable build process for the first time. Documented distributed data model (Epetra). Has its own Bugzilla product, mail lists, regression tests. Source code browsable via Bonsai. And more… But Pliris is still an independent piece of code…and better off after the process.
93
8/23/2005 92 Interface Packages Thyra and PyTrilinos
94
8/23/2005 93 template bool sillyCgSolve( const Thyra::LinearOpBase &A,const Thyra::VectorBase &b,const int maxNumIters,const typename Teuchos::ScalarTraits ::magnitudeType tolerance,Thyra::VectorBase *x,std::ostream *out = NULL ) { using Teuchos::RefCountPtr; typedef Teuchos::ScalarTraits ST; // We need to use ScalarTraits to support arbitrary types. typedef typename ST::magnitudeType ScalarMag; // This is the type returned from a vector norm. typedef one ST::one()); // Get the vector space (domain and range spaces should be the same) RefCountPtr > space = A.domain(); // Compute initial residual : r = b - A*x RefCountPtr > r = createMember(space); Thyra::assign(&*r,b); // r = b Thyra::apply(A,Thyra::NOTRANS,*x,&* one, one); // r = -A*x + r const ScalarMag r0_nrm = Thyra::norm(*r); // Compute ||r0|| = sqrt( ) for convergence test if(r0_nrm == ST::zero()) return true; // Trivial RHS and initial LHS guess? // Create workspace vectors and scalars RefCountPtr > p = createMember(space), q = createMember(space); Scalar rho_old; // Perform the iterations for( int iter = 0; iter <= maxNumIters; ++iter ) { // Check convergence and output iteration const ScalarMag r_nrm = Thyra::norm(*r); // Compute ||r|| = sqrt( ) if( r_nrm/r0_nrm < tolerance ) return true; // Converged to tolerance, Success! // Compute iteration const Scalar rho = space->scalarProd(*r,*r); // -> rho if(iter==0) Thyra::assign(&*p,*r); // r -> p (iter == 0) else Thyra::Vp_V( &*p, *r, Scalar(rho/rho_old) ); // r+(rho/rho_old)*p -> p (iter > 0) Thyra::apply(A,Thyra::NOTRANS,*p,&*q); // A*p-> q const Scalar alpha = rho/space->scalarProd(*p,*q);// rho/ -> alpha Thyra::Vp_StV( x, Scalar(+alpha), *p ); // +alpha*p + x -> x Thyra::Vp_StV( &*r, Scalar(-alpha), *q ); // -alpha*q + r -> r rho_old = rho; // rho-> rho_old (remember rho for next iter) } return false; // Failure } // end sillyCgSolve Thyra: sillyCGSolve Generic CG code. To use, provide: Thyra::LinearOp. Thyra::Vector. Works on any parallel machine. Allows mixing of vectors matrices: PETSc & Epetra. Works with any scalar datatype.
95
8/23/2005 94 PyTrilinos PyTrilinos provides Python access to Trilinos packages. Uses SWIG to generate bindings. Epetra, AztecOO, IFPACK, ML, NOX, LOCA, Amesos and NewPackage are support. Possible to: Define RowMatrix implementation in Python. Use from Trilinos C++ code. Performance for large grain is equivalent to C++. Several times hit for very fine grain code.
96
8/23/2005 95 #include "mpi.h" #include "Epetra_MpiComm.h" #include "Epetra_Vector.h" #include "Epetra_Time.h" #include "Epetra_RowMatrix.h" #include "Epetra_CrsMatrix.h" #include "Epetra_Time.h" #include "Epetra_LinearProblem.h" #include "Trilinos_Util_CrsMatrixGallery.h" using namespace Trilinos_Util; int main(int argc, char *argv[]) { MPI_Init(&argc, &argv); Epetra_MpiComm Comm(MPI_COMM_WORLD); int nx = 1000; int ny = 1000 * Comm.NumProc(); CrsMatrixGallery Gallery("laplace_2d", Comm); Gallery.Set("ny", ny); Gallery.Set("nx", nx); Gallery.Set("problem_size",nx*ny); Gallery.Set("map_type", "linear"); Epetra_LinearProblem* Problem = Gallery.GetLinearProblem(); assert (Problem != 0); // retrieve pointers to solution (lhs), right-hand side (rhs) // and matrix itself (A) Epetra_MultiVector* lhs = Problem->GetLHS(); Epetra_MultiVector* rhs = Problem->GetRHS(); Epetra_RowMatrix* A = Problem->GetMatrix(); Epetra_Time Time(Comm); for (int i = 0 ; i < 10 ; ++i) A->Multiply(false, *lhs, *rhs); cout << Time.ElapsedTime() << endl; MPI_Finalize(); return(EXIT_SUCCESS); } // end of main() #! /usr/bin/env python from PyTrilinos import Epetra, Triutils Epetra.Init() Comm = Epetra.PyComm() nx = 1000 ny = 1000 * Comm.NumProc() Gallery = Triutils.CrsMatrixGallery("laplace_2d", Comm) Gallery.Set("nx", nx) Gallery.Set("ny", ny) Gallery.Set("problem_size", nx * ny) Gallery.Set("map_type", "linear") Matrix = Gallery.GetMatrix() LHS = Gallery.GetStartingSolution() RHS = Gallery.GetRHS() Time = Epetra.Time(Comm) for i in xrange(10): Matrix.Multiply(False, LHS, RHS) print Time.ElapsedTime() Epetra.Finalize() C++ vs. Python: Equivalent Code
97
8/23/2005 96 Templates and Generic Programming
98
8/23/2005 97 Investment in Generic Programming 2 nd Generation Trilinos packages are templated on: OrdinalType (think int). ScalarType (think double). Examples: Teuchos::SerialDenseMatrix A; Teuchos::SerialDenseMatrix B; Scalar/Ordinal Traits mechanism completes support for genericity. The following packages support templates: Teuchos (Basic Tools) Thyra (Abstract Interfaces) Tpetra (including MPI support) Belos (Krylov and Block Krylov Linear), IFPACK (algebraic preconditioners, next version), Anasazi (Eigensolvers),
99
8/23/2005 98 Potential Benefits of Templated Types Templated scalar (and ordinal) types have great potential: Generic: Algorithms expressed over abstract field. High Performance: Partial template specialization + Fortran. Facilitate variety of algorithmic studies. Allow study of asymptotic behavior of discretizations. Facilitate debugging: Reduces FP error as source error. Use your imagination… All new Trilinos packages are templated.
100
8/23/2005 99 Kokkos Results: Sparse MV Pentium M 1.6GHz Cygwin/GCC 3.2 (WinXP Laptop) Data SetDimensionNonzeros# RHSMFLOPS DIE3D987317333711247.62 2418.94 3577.79 4691.62 5787.90 dday01211809230061230.75 2407.96 3553.80 4668.24 5738.40 FIDAP035197162183081171.22 2349.29 3374.24 4498.99 5545.77
101
8/23/2005 100 ARPREC The ARPREC library uses arrays of 64-bit floating-point numbers to represent high-precision floating-point numbers. ARPREC values behave just like any other floating-point datatype, except the maximum working precision (in decimal digits) must be specified before any calculations are done mp::mp_init(200); Illustrate the use of ARPREC with an example using Hilbert matrices.
102
8/23/2005 101 Hilbert Matrices A Hilbert matrix H N is a square N-by-N matrix such that: For Example:
103
8/23/2005 102 Hilbert Matrices Notoriously ill-conditioned κ(H 3 ) 524 κ(H 5 ) 476610 κ(H 10 ) 1.6025 x 10 13 κ(H 20 ) 7.8413 x 10 17 κ(H 100 ) 1.7232 x 10 20 Hilbert matrices introduce large amounts of error
104
8/23/2005 103 Hilbert Matrices and Cholesky Factorization With double-precision arithmetic, Cholesky factorization will fail for H N for all N > 13. Can we improve on this using arbitrary-precision floating-point numbers? PrecisionLargest N for which Cholesky Factorization is successful Single Precision8 Double Precision13 Arbitrary Precision (20)29 Arbitrary Precision (40)40 Arbitrary Precision (200)145 Arbitrary Precision (400)233
105
8/23/2005 104 Software Quality
106
8/23/2005 105 SQA/SQE Software Quality Assurance/Engineering is important. No longer sufficient to say “We do a good job”. Trilinos facilitates SQA/SQE development/processes for packages: 32 of 47 ASCI SQE practices are directly handled by Trilinos (no requirements on packages). Trilinos provides significant support for the remaining 15. Trilinos Dev Guide Part II: Specific to ASCI requirements. Trilinos software engineering policies provide a ready-made infrastructure for new packages. Trilinos philosophy: Few requirements. Instead mostly suggested practices. Provides package with option to provide alternate process.
107
8/23/2005 106 Trilinos ServiceSQE Practices Impact Yearly Trilinos User Group Meeting (TUG) and Developer Forum: Once a year gathering for tutorials, package feature updates, user/developer requirements discussion and developer training. — All Requirements steps: gathering, derivation, documentation, feasibility,etc. — User and Developer training. Monthly Trilinos leaders meetings: Trilinos leaders, including package development leaders, key managers, funding sources and other stakeholders participate in monthly phone meetings to discuss any timely issues related to the Trilinos Project. —Developer Training. —Design reviews. —Policy decisions across all development phases. Trilinos and package mail lists: Trilinos lists for leaders, announcements, developers, users, checkins and similar lists at the package level support a variety of communication. All lists are archived, providing critical artifacts for assessments and audits. —Developer/user/client communication. —Requirements/design/testing artifacts. —Announcement/documenting of releases. Trilinos and Trilinos3PL source repositories: All source code, development and user documentation is retained and tracked. In addition, reference versions of all external software, including BLAS, LAPACK, Umfpack, etc. are retained in Trilinos3PL. —Source management. —Versioning. —Third-party software management. Bugzilla Products: Each package has its own Bugzilla Product with standard components. —Requirements/faults capturing and tracking. Trilinos configure script and M4 macros: The Trilinos configure script and related macros support portable installation of Trilinos and its packages —Portability. —Software release. Trilinos test harness: Trilinos provides a base testing plan and automated testing across multiple platforms, plus creation of testing artifacts. Test harness results are used to derive a variety of metrics for SQE. —Pre-checkin and regression testing. —Software metrics.
108
8/23/2005 107 The Trilinos Tutorial Package Trilinos is large (and still growing…) More than 600K code lines Many packages a lot of functionalities… … and also a lot to learn What is the best way for a new user to learn Trilinos? Using a Trilinos package of course: Didasko!
109
8/23/2005 108 What DIDASKO is not DIDASKO, as a tutorial, cannot cover the most advanced features of Trilinos: Only the “stable” features are covered DIDASKO is not a substitute of each package’s documentation and examples, which remain of fundamental importance
110
8/23/2005 109 Covered Packages At present, we cover EpetraTriutilsIFPACK TeuchosAztecOOML NOXAmesosEpetraExt Anasazi
111
8/23/2005 110 Trilinos Availability/Information Trilinos and related packages are available via LGPL. Current release (5.0) is “click release”. Unlimited availability. 650+ Downloads of Release 5.0 since 3/05 (not including internal Sandia users). 330 registered users: 57% university, 11% industry, 20% gov’t. 35% European, 35% US, 10% Asian. Trilinos Release 6: September 2005. Trilinos Awards: 2004 R&D 100 Award. SC2004 HPC Software Challenge Award. Sandia Team Employee Recognition Award. Lockheed-Martin Nova Award Nominee. More information: http://software.sandia.gov http://software.sandia.gov/trilinos Additional documentation at my website: http://www.cs.sandia.gov/~mheroux. 3 rd Annual Trilinos User Group Meeting: November 1-3, 2005 at SNL.
112
8/23/2005 111 Summary Trilinos package model provides a welcoming environment for solver developers: Extremely flexible. Building up package, not making it conform. Scalable growth model for complex multi-physics solver suites. Basic principles can be applied to any similar “project of projects”: Modularity Interfaces Common infrastructure. Trilinos new_package package is self-contained software environment: Package and website useful to any project wanting to ramp up in use of standard tools. Available separately, Open Source. Used by groups outside of Trilinos as well as within.
113
8/23/2005 112 Trilinos Might be of Interest to you if… Your application includes non-PDE equations. Multiple RHS, Block Krylov linear and eigen solvers needed. Interested in multi-physics. App is in C++. Interested in collaborating on Fortran interface via SIDL/Babel. Unstructured Data redistribution is important. Multi-precision useful. Parallel Python (using MPI only) interest.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.