An Overview of Epetra Michael A. Heroux Sandia National Labs.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

Intermediate Code Generation
Abstract Data Types Data abstraction, or abstract data types, is a programming methodology where one defines not only the data structure to be used, but.
Reference: / MPI Program Structure.
MATH 685/ CSI 700/ OR 682 Lecture Notes
6/10/2015C++ for Java Programmers1 Pointers and References Timothy Budd.
Reference: Message Passing Fundamentals.
Inheritance and Class Hierarchies Chapter 3. Chapter 3: Inheritance and Class Hierarchies2 Chapter Objectives To understand inheritance and how it facilitates.
OOP in Java Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Topic Overview One-to-All Broadcast and All-to-One Reduction
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
C++ fundamentals.
CSE 332: C++ Classes From Procedural to Object-oriented Programming Procedural programming –Functions have been the main focus so far Function parameters.
CSE 332: C++ templates and generic programming I Motivation for Generic Programming in C++ We’ve looked at procedural programming –Reuse of code by packaging.
Epetra Concepts Data management using Epetra Michael A. Heroux Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation,
Introduction to Java Appendix A. Appendix A: Introduction to Java2 Chapter Objectives To understand the essentials of object-oriented programming in Java.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
What About Multicore? Michael A. Heroux Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin.
Programming Languages and Paradigms Object-Oriented Programming.
LECTURE LECTURE 17 More on Templates 20 An abstract recipe for producing concrete code.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Data Structures Using C++ 2E
PyTrilinos: A Python Interface to Trilinos Bill Spotz Sandia National Laboratories Reproducible Research in Computational Geophysics August 31, 2006.
Trilinos 101 (Part II) Creating and managing linear algebra data in Trilinos: Data management using Epetra and Teuchos Michael A. Heroux Sandia National.
Trilinos 101: Getting Started with Trilinos November 6, :30-9:30 a.m. Jim Willenbring Mike Heroux (Presenter)
Amesos Sparse Direct Solver Package Ken Stanley, Rob Hoekstra, Marzio Sala, Tim Davis, Mike Heroux Trilinos Users Group Albuquerque 3 Nov 2004.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
C++ History C++ was designed at AT&T Bell Labs by Bjarne Stroustrup in the early 80's Based on the ‘C’ programming language C++ language standardised in.
C++ Programming Basic Learning Prepared By The Smartpath Information systems
CPSC 252 Operator Overloading and Convert Constructors Page 1 Operator overloading We would like to assign an element to a vector or retrieve an element.
CS 376b Introduction to Computer Vision 01 / 23 / 2008 Instructor: Michael Eckmann.
Amesos Interfaces to sparse direct solvers October 15, :30 – 9:30 a.m. Ken Stanley.
Chapter 6 Introduction to Defining Classes. Objectives: Design and implement a simple class from user requirements. Organize a program in terms of a view.
Scalable Linear Algebra Capability Area Michael A. Heroux Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation,
CPSC 252 The Big Three Page 1 The “Big Three” Every class that has data members pointing to dynamically allocated memory must implement these three methods:
New Features in ML 2004 Trilinos Users Group Meeting November 2-4, 2004 Jonathan Hu, Ray Tuminaro, Marzio Sala, Michael Gee, Haim Waisman Sandia is a multiprogram.
1 Stratimikos Unified Wrapper to Trilinos Linear Solvers and Preconditioners Roscoe A. Bartlett Department of Optimization & Uncertainty Estimation Sandia.
Parallel Solution of the Poisson Problem Using MPI
Epetra Tutorial Michael A. Heroux Sandia National Laboratories
Teuchos: Utilities for Developers & Users November 2nd, 3:30-4:30pm Roscoe Bartlett Mike Heroux Kris Kampshoff Kevin Long Paul Sexton Heidi.
Trilinos 102: Advanced Concepts November 7, :30-9:30 a.m. Mike Heroux Jim Willenbring.
Charm++ Data-driven Objects L. V. Kale. Parallel Programming Decomposition – what to do in parallel Mapping: –Which processor does each task Scheduling.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
An Overview of Belos and Anasazi October 16 th, 11-12am Heidi Thornquist Teri Barth Rich Lehoucq Mike Heroux Computational Mathematics and Algorithms Sandia.
An Introduction to MPI (message passing interface)
1 Becoming More Effective with C++ … Day Two Stanley B. Lippman
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden
Inheritance and Class Hierarchies Chapter 3. Chapter 3: Inheritance and Class Hierarchies2 Chapter Objectives To understand inheritance and how it facilitates.
Inheritance and Class Hierarchies Chapter 3. Chapter Objectives  To understand inheritance and how it facilitates code reuse  To understand how Java.
Combining Trilinos Packages To Solve Linear Systems Michael A. Heroux Sandia National Labs.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.
What’s New for Epetra Michael A. Heroux Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
CPSC 252 ADTs and C++ Classes Page 1 Abstract data types (ADTs) An abstract data type is a user-defined data type that has: private data hidden inside.
Chapter 1: Preliminaries Lecture # 2. Chapter 1: Preliminaries Reasons for Studying Concepts of Programming Languages Programming Domains Language Evaluation.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Examples (D. Schmidt et al)
Inheritance Modern object-oriented (OO) programming languages provide 3 capabilities: encapsulation inheritance polymorphism which can improve the design,
DOE/Office of Science/ASCR (Sandia National Laboratories)
Review: Two Programming Paradigms
C++ History C++ was designed at AT&T Bell Labs by Bjarne Stroustrup in the early 80's Based on the ‘C’ programming language C++ language standardised in.
Chapter 6 Intermediate-Code Generation
GENERAL VIEW OF KRATOS MULTIPHYSICS
Standard Template Library
SPL – PS1 Introduction to C++.
Presentation transcript:

An Overview of Epetra Michael A. Heroux Sandia National Labs

Trilinos Concrete Support Component: Petra Petra provides distributed matrix and vector services:  Construction of and operations with matrices, vectors and graphs.  Parallel redistribution of all these objects (including a Zoltan interface).  All Trilinos solver components understand and use Petra matrices and vectors. Three version under development:  Epetra (Essential Petra):  Restricted to real, double precision arithmetic.  Uses stable core subset of C++.  Interfaces accessible to C and Fortran users.  Tpetra (Templated Petra):  Next generation C++ version.  Templated scalar fields (and ordinal fields).  Uses namespaces, and STL: Improved usability/efficiency.  Jpetra (Java Petra):  Pure Java. Completely portable to any JVM.  Interfaces with Java versions of MPI, LAPACK and BLAS.

Solving Linear Systems via Epetra/AztecOO  Goal: Solve Ax = b, using Epetra/AztecOO.  Proceed step-by-step through the following classes:  Comm: Defines parallel machine.  Map: Defines data distribution.  Vector: Defines RHS/LHS vectors.  Matrix: Defines Linear Operator  Problem: Combines pieces to define linear problem.  AztecOO: Solves linear problem.

Epetra Features  Epetra contains constructors and utility routines for:  Distributed dense multivectors and vectors.  Local replicated multivectors, vectors.  Distributed Sparse Graphs and Matrices.  Written in C++.  C/Fortran wrapper functions provide access to library.

Epetra User Class Categories  Sparse Matrices: RowMatrix, (CrsMatrix, VbrMatrix, FECrsMatrix, FEVbrMatrix)  Linear Operator:Operator: (AztecOO, ML, Ifpack)  Dense Matrices:DenseMatrix, DenseVector, BLAS, LAPACK, SerialDenseSolver  Vectors: Vector, MultiVector  Graphs:CrsGraph  Data Layout: Map, BlockMap, LocalMap  Redistribution:Import, Export  Aggregates: LinearProblem  Parallel Machine: Comm, (SerialComm, MpiComm, MpiSmpComm)  Utilities: Time, Flops

Epetra Communication Classes  Epetra_Comm is a pure virtual class:  Has no executable code: Interfaces only.  Encapsulates behavior and attributes of the parallel machine.  Defines interfaces for basic services such as: Collective communications. Gather/scatter capabilities.  Allows multiple parallel machine implementations.  Implementation details of parallel machine confined to Comm classes.  In particular, rest of Epetra has no dependence on MPI.

Comm Methods Barrier() const=0 [pure virtual] Broadcast(double *MyVals, int Count, int Root) const=0 [pure virtual] Broadcast(int *MyVals, int Count, int Root) const=0 [pure virtual] CreateDistributor() const=0 [pure virtual] GatherAll(double *MyVals, double *AllVals, int Count) const=0 [pure virtual] GatherAll(int *MyVals, int *AllVals, int Count) const=0 [pure virtual] MaxAll(double *PartialMaxs, double *GlobalMaxs, int Count) const=0 [pure virtual] MaxAll(int *PartialMaxs, int *GlobalMaxs, int Count) const=0 [pure virtual] MinAll(double *PartialMins, double *GlobalMins, int Count) const=0 [pure virtual] MinAll(int *PartialMins, int *GlobalMins, int Count) const=0 [pure virtual] MyPID() const=0 [pure virtual] NumProc() const=0 [pure virtual] Print(ostream &os) const=0 [pure virtual] ScanSum(double *MyVals, double *ScanSums, int Count) const=0 [pure virtual] ScanSum(int *MyVals, int *ScanSums, int Count) const=0 [pure virtual] SumAll(double *PartialSums, double *GlobalSums, int Count) const=0 [pure virtual] SumAll(int *PartialSums, int *GlobalSums, int Count) const=0 [pure virtual] ~Epetra_Comm() [inline, virtual]

Comm Implementations Three current implementations of Petra_Comm:  Epetra_SerialComm: Allows easy simultaneous support of serial and parallel version of user code.  Epetra_MpiComm: OO wrapping of C MPI interface.  Epetra_MpiSmpComm: Allows definition/use of shared memory multiprocessor nodes.  PVM, LBComm versions in the future.

Map Classes  Epetra maps prescribe the layout of distributed objects across the parallel machine.  Typical map: 99 elements, 4 MPI processes could look like:  Number of elements = 25 on PE 0 through 2, = 24 on PE 3.  GlobalElementList = {0, 1, 2, …, 24} on PE 0, = {25, 26, …, 49} on PE 1. … etc.  Funky Map: 10 elements, 3 MPI processes could look like:  Number of elements= 6 on PE 0, = 4 on PE 1, = 0 on PE 2.  GlobalElementList = {22, 3, 5, 2, 99, 54} on PE 0, = { 5, 10, 12, 24} on PE 1, = {} on PE 2. Note: Global elements IDs (GIDs) are only labels:  Need not be contiguous range on a processor.  Need not be uniquely assigned to processors.  Funky map is not unreasonable, given auto-generated meshes, etc.  Use of a “Directory” facilitates arbitrary GID support.

Epetra Map Collaboration Diagram & Inheritance Graph Notes: 1.Epetra_Object is base class for all concrete Epetra classes:  Has labeling and ostream methods.  Maintains definitions of global constants. 2.BlockMap is the base map class. 3.Maps have Epetra_Directory to keep track of global ID distribution.

Types of Epetra Maps Two basic characteristic attributes:  Local or not: A local map creates and maintains replicated local objects: –Object is the same across all processors. –Useful for some algorithms, Hessenberg matrix in GMRES, block dot products, etc. Non-local creates distributed global objects: –Object is distributed across all processors. This is what we think of as a “standard” map.  Block or not: Block supports variable weight per element. Primarily used for sparse matrix whose entries are dense matrices.

BlockMap Ctors and Dtors  Epetra_BlockMap (int NumGlobalElements, int ElementSize, int IndexBase, const Epetra_Comm &Comm) Constructor for a Epetra-defined uniform linear distribution of constant block size elements.  Epetra_BlockMap (int NumGlobalElements, int NumMyElements, int ElementSize, int IndexBase, const Epetra_Comm &Comm) Constructor for a user-defined linear distribution of constant block size elements.  Epetra_BlockMap (int NumGlobalElements, int NumMyElements, int *MyGlobalElements, int ElementSize, int IndexBase, const Epetra_Comm &Comm) Constructor for a user-defined arbitrary distribution of constant block size elements.  Epetra_BlockMap (int NumGlobalElements, int NumMyElements, int *MyGlobalElements, int *ElementSizeList, int IndexBase, const Epetra_Comm &Comm) Constructor for a user-defined arbitrary distribution of variable block size elements.  Epetra_BlockMap (const Epetra_BlockMap &map) Copy constructor.  virtual ~Epetra_BlockMap (void) Destructor.

Some Map Methods Local/Global ID accessor functions int RemoteIDList (int NumIDs, const int *GIDList, int *PIDList, int *LIDList) const Returns the processor IDs and corresponding local index value for a given list of global indices. int LID (int GID) const Returns local ID of global ID, return -1 if not on this processor. int GID (int LID) const Returns global ID of local ID, return IndexBase-1 if GID not on this proc. Size and dimension accessor functions int NumGlobalElements () const Number of elements across all processors. int NumMyElements () const Number of elements on the calling processor. int MyGlobalElements (int *MyGlobalElementList) const Puts list of global elements on this processor into the user-provided array. int IndexBase () const Index base for this map.

Epetra Vector Class  Supports construction and manipulation of vectors.  Distributed global vectors.  Replicated local vectors.  Can perform common vector operations:  Dot products, vector scalings and norms.  Fill with random values.  Used with the Epetra Matrix classes for matrix-vector multiplication.  Use in a parallel or serial environment is mostly transparent.  Specialization of the Epetra MultiVector class.

Epetra MultiVector Class  A multivector is a collection of one or more vectors with the same memory layout (map).  Useful for block algorithms, multiple RHS, replicated local computations.  A generalization of a 2D array:  If the memory stride between vectors is constant, then multivector is equivalent to 2D Fortran array.  Can wrap calls to BLAS, LAPACK in this class.  Provides most of the implementation for the Epetra Vector class.

Epetra Vector/MultiVector Inheritance Graph Notes: 1.Vector is a specialization of MultiVector.  A multivector with one vector. 2.MultiVector isa: a) Distributed Object.  Data spread (or replicated) across processors. b)Computational Object.  Floating point operations occur (and will be recorded if user desires). c)BLAS Object.  Uses BLAS kernels for fast computations. d)More on common base classes later…

Epetra CrsGraph Class  Provides “skeletal” information for both sparse matrix classes (CRS and VBR).  Allows a priori construction of skeleton that can be used by multiple matrices and reused in future.  Provides graph information used by some load balancing tools.  Exists in one of two states:  Global index space.  Local index space.

Epetra Matrix Classes  Support construction and manipulation of:  Row based (Epetra_CrsMatrix) and  Block row based (Epetra_VbrMatrix) matrices.  Constructors allow:  row-by-row or entry-by-entry construction.  Injection, replacement or sum-into entry capabilities.  Supports common matrix operations:  Scaling.  Norms.  Matrix-vector multiplication.  Matrix-multivector multiplication.

Matrix Class Inheritance Details CrsMatrix and VbrMatrix inherit from: Distributed Object: How data is spread across machine. Computational Object: Performs FLOPS. BLAS: Use BLAS kernels. RowMatrix: An object from either class has a common row access interface (used by AztecOO).

LinearProblem Class  A linear problem is defined by:  Matrix A : An Epetra_RowMatrix or Epetra_Operator object. (often a CrsMatrix or VbrMatrix object.)  Vectors x, b : Vector objects.  To call AztecOO, first define a LinearProblem:  Constructed from A, x and b.  Once defined, can: Scale the problem (explicit preconditioning). Precondition it (implicitly). Change x and b.

LinearProblem Collaboration Diagram

Some LinearProblem Methods Epetra_LinearProblem (Epetra_RowMatrix *A, Epetra_MultiVector *X, Epetra_MultiVector *B) Epetra_LinearProblem Constructor. void SetOperator (Epetra_RowMatrix *A) Set Operator A of linear problem AX = B. void SetLHS (Epetra_MultiVector *X) Set left-hand-side X of linear problem AX = B. void SetRHS (Epetra_MultiVector *B) Set right-hand-side B of linear problem AX = B. int CheckInput () const Check input parameters for size consistency. int LeftScale (const Epetra_Vector &D) Perform left scaling of a linear problem. int RightScale (const Epetra_Vector &D) Perform right scaling of a linear problem.

AztecOO  Aztec is the legacy iterative solver package at Sandia:  Extracted from the MPSalsa reacting flow code.  Installed in dozens of Sandia apps.  external licenses, still dozens of downloads per month.  AztecOO leverages the investment in Aztec:  Uses Aztec iterative methods and preconditioners.  AztecOO improves on Aztec by:  Using Epetra objects for defining matrix and RHS.  Providing more preconditioners/scalings.  Using C++ class design to enable more sophisticated use.  AztecOO interfaces allows:  Continued use of Aztec for functionality.  Introduction of new solver capabilities outside of Aztec.  Advance Status Testing abilities (more tomorrow).

Some AztecOO Methods AztecOO (const Epetra_LinearProblem &problem) AztecOO Constructor. int SetAztecDefaults () AztecOO function to restore default options/parameter settings. int SetAztecOption (int option, int value) AztecOO option setting function. int SetAztecParam (int param, double value) AztecOO param setting function. int Iterate (int MaxIters, double Tolerance) AztecOO iteration function. int NumIters () const Returns the total number of iterations performed on this problem. double TrueResidual () const Returns the true unscaled residual for this problem. double ScaledResidual () const Returns the true scaled residual for this problem.

A Simple Epetra/AztecOO Program // Header files omitted… int main(int argc, char *argv[]) { MPI_Init(&argc,&argv); // Initialize MPI, MpiComm Epetra_MpiComm Comm( MPI_COMM_WORLD ); // ***** Create x and b vectors ***** Epetra_Vector x(Map); Epetra_Vector b(Map); b.Random(); // Fill RHS with random #s // ***** Create an Epetra_Matrix tridiag(-1,2,-1) ***** Epetra_CrsMatrix A(Copy, Map, 3); double negOne = -1.0; double posTwo = 2.0; for (int i=0; i<NumMyElements; i++) { int GlobalRow = A.GRID(i); int RowLess1 = GlobalRow - 1; int RowPlus1 = GlobalRow + 1; if (RowLess1!=-1) A.InsertGlobalValues(GlobalRow, 1, &negOne, &RowLess1); if (RowPlus1!=NumGlobalElements) A.InsertGlobalValues(GlobalRow, 1, &negOne, &RowPlus1); A.InsertGlobalValues(GlobalRow, 1, &posTwo, &GlobalRow); } A.TransformToLocal(); // Transform from GIDs to LIDs // ***** Map puts same number of equations on each pe ***** int NumMyElements = 1000 ; Epetra_Map Map(-1, NumMyElements, 0, Comm); int NumGlobalElements = Map.NumGlobalElements(); // ***** Report results, finish *********************** cout << "Solver performed " << solver.NumIters() << " iterations." << endl << "Norm of true residual = " << solver.TrueResidual() << endl; MPI_Finalize() ; return 0; } // ***** Create/define AztecOO instance, solve ***** AztecOO solver(problem); solver.SetAztecOption(AZ_precond, AZ_Jacobi); solver.Iterate(1000, 1.0E-8); // ***** Create Linear Problem ***** Epetra_LinearProblem problem(&A, &x, &b);

Additional Epetra Classes: Utility and Base  This completes the description of the basic user-oriented Epetra classes.  Next we discuss some of the base and utility classes.

Epetra DistObject Base Class Epetra has 9 user-oriented distributed object classes: –Vector, IntVector –MultiVector –CrsGraph –CrsMatrix, FECrsMatrix –VbrMatrix, FEVbrMatrix –MapColoring DistObject is a base class for all the above: –Construction of DistObject requires a Map (or BlockMap or LocalMap). –Has concrete methods for parallel data redistribution of an object. –Has virtual Pack/Unpack method that each derived class must implement. –DistObject advantages: –Minimized redundant code. –Facilitates incorporation of other distributed objects in future.

Epetra_DistObject Import/Export Methods int Import (const Epetra_SrcDistObject &A, const Epetra_Import &Importer, Epetra_CombineMode CombineMode) Imports an Epetra_DistObject using the Epetra_Import object. int Import (const Epetra_SrcDistObject &A, const Epetra_Export &Exporter, Epetra_CombineMode CombineMode) Imports an Epetra_DistObject using the Epetra_Export object. int Export (const Epetra_SrcDistObject &A, const Epetra_Import &Importer, Epetra_CombineMode CombineMode) Exports an Epetra_DistObject using the Epetra_Import object. int Export (const Epetra_SrcDistObject &A, const Epetra_Export &Exporter, Epetra_CombineMode CombineMode) Exports an Epetra_DistObject using the Epetra_Export object.

Epetra_DistObject Virtual Methods virtual int CheckSizes (const Epetra_SrcDistObject &Source)=0 Allows the source and target (this) objects to be compared for compatibility, return nonzero if not. virtual int CopyAndPermute (const Epetra_SrcDistObject &Source, int NumSameIDs, int NumPermuteIDs, int *PermuteToLIDs, int *PermuteFromLIDs)=0 Perform ID copies and permutations that are on processor. virtual int PackAndPrepare (const Epetra_SrcDistObject &Source, int NumExportIDs, int *ExportLIDs, int Nsend, int Nrecv, int &LenExports, char *&Exports, int &LenImports, char *&Imports, int &SizeOfPacket, Epetra_Distributor &Distor)=0 Perform any packing or preparation required for call to DoTransfer(). virtual int UnpackAndCombine (const Epetra_SrcDistObject &Source, int NumImportIDs, int *ImportLIDs, char *Imports, int &SizeOfPacket, Epetra_Distributor &Distor, Epetra_CombineMode CombineMode)=0 Perform any unpacking and combining after call to DoTransfer().

Epetra_Time and Epetra_Flops  All Epetra computational classes count floating point operations (FLOPS):  FLOPS are associated with the this object.  Op counts are serial counts, that is, independent of number of processors.  Each computational class has a FLOPS() method that can be queried for the flop count of an object: Epetra_Vector V(map); Epetra_Flops counter; V.SetFlopCounter(counter); V.Random(); V.Norm2(); double v_flops = V.Flops(); // v_flops should = 2*(len of V)

Epetra_CompObject Class Epetra has 8 major user-oriented computational object classes: –Vector –MultiVector –CrsMatrix (FECrsMatrix) –VbrMatrix (FEVbrMatrix) –SerialDenseVector –SerialDenseMatrix –SimpleSerialDenseSolver, HardSerialDenseSolver CompObject is a base class for all the above: –Trivial constructor. –Manages pointer to an Epetra_Flops counter object. –Allows a computational object to donate its FLOPS to a specified counter. –Any number of objects can be associated with a single counter object.

Epetra Serial Dense Matrix and Vector Classes Epetra provides two types of serial dense classes:  (Thin)  Epetra_BLAS, Epetra_LAPACK: Provide thin wrappers to BLAS and LAPACK routines. A single interface to any BLAS routine (There is one call to DGEMM in all of Epetra). A single method for all precision types. (GEMM covers SGEMM, DGEMM, CGEMM, ZGEMM) Helps with templates. Inheritable: Any class can be a BLAS, LAPACK class.  (OO)  Epetra_SerialDenseMatrix, Epetra_SerialDenseVector: Fairly light-weight OO Dense matrix and vector classes.  Epetra_IntSerialDenseMatrix, Epetra_IntSerialDenseVector Integer version. Very useful for Epetra_Map manipulations.  Epetra_SerialDenseSolver: Careful implementation that provide OO access to robust scaling and factorization techniques in LAPACK.  SPD version of above.

Parallel Data Redistribution  Epetra vectors, multivectors, graphs and matrices are distributed via map objects.  A map is basically a partitioning of a list of global IDs:  IDs are simply labels, no need to use contiguous values (Directory class handles details for general ID lists).  No a priori restriction on replicated IDs.  If we are given:  A source map and  A set of vectors, multivectors, graphs and matrices (or other distributable objects) based on source map.  Redistribution is performed by:  Specifying a target map with a new distribution of the global IDs.  Creating Import or Export object using the source and target maps.  Creating vectors, multivectors, graphs and matrices that are redistributed (to target map layout) using the Import/Export object.

Import vs. Export  Import (Export) means calling processor knows what it wants to receive (send).  Distinction between Import/Export is important to user, almost identical in implementation.  Import (Export) objects can be used to do an Export (Import) as a reverse operation.  When mapping is bijective (1-to-1 and onto), either Import or Export is appropriate.

Example: 1D Matrix Assembly a b -u xx = f u(a) =  0 u(b) =  1 x1x1 x2x2 x3x3 PE 0PE 1 3 Equations: Find u at x 1, x 2 and x 3 Equation for u at x 2 gets a contribution from PE 0 and PE 1. Would like to compute partial contributions independently. Then combine partial results.

Two Maps  We need two maps:  Assembly map: PE 0: { 1, 2 }. PE 1: { 2, 3 }.  Solver map: PE 0: { 1, 2 } (we arbitrate ownership of 2). PE 1: { 3 }.

End of Assembly Phase  At the end of assembly phase we have AssemblyMatrix:  On PE 0: On PE 1:  Want to assign all of Equation 2 to PE 0 for use with solver. Equation 1: Equation 2: Equation 2: Equation 3:

Export Assembly Matrix to Solver Matrix Epetra_Export Exporter(AssemblyMap, SolverMap); Epetra_CrsMatrix SolverMatrix (Copy, SolverMap, 0); SolverMatrix.Export(AssemblyMatrix, Exporter, Add); SolverMatrix.TransformToLocal();

End of Export Phase  At the end of Export phase we have SolverMatrix:  On PE 0: On PE 1:  Each row is uniquely owned. Equation 1: Equation 2: Equation 3:

Read-File Example Epetra_Map * readMap; Epetra_CrsMatrix * readA; Epetra_Vector * readx; Epetra_Vector * readb; Epetra_Vector * readxexact; // Call routine to read in HB problem. All data on PE 0. Trilinos_Util_ReadHb2Epetra(argv[1], Comm, readMap, readA, readx, readb, readxexact); // Create uniform distributed map Epetra_Map map(readMap->NumGlobalElements(), 0, Comm); // Create Exporter to distribute read-in matrix and vectors Epetra_Export exporter(*readMap, map); Epetra_CrsMatrix A(Copy, map, 0); Epetra_Vector x(map); Epetra_Vector b(map); Epetra_Vector xexact(map); // Distribute data from PE 0 to uniform layout across all PEs. x.Export(*readx, exporter, Add); b.Export(*readb, exporter, Add); xexact.Export(*readxexact, exporter, Add);

FEM Example int Drumm1(const Epetra_Map& map) { //Simple 2-element FEM problem from //Clif Drumm. Two triangular elements, //one per processor, as shown //here: // // *----* // 3|\ 2| // | \ | // | 0\1| // | \| // *----* // 0 1 // //Element 0 on processor 0, //element 1 on processor 1. //Processor 0 will own nodes 0,1 and //processor 1 will own nodes 2,3. //Each processor will pass a 3x3 //element-matrix to Epetra_FECrsMatrix. //After GlobalAssemble(), the matrix //should be as follows: // // row 0: //proc 0 row 1: // // row 2: //proc 1 row 3: // // Executable code int numProcs = map.Comm().NumProc(); int localProc = map.Comm().MyPID(); if (numProcs != 2) return(0); int indexBase = 0, ierr = 0; int numMyNodes = 2; Epetra_IntSerialDenseVector myNodes(numMyNodes); numMyAssemNodes = 3; Epetra_IntSerialDenseVector myAssemNodes(numMyAssemNodes); if (localProc == 0) { myNodes[0] = 0; myNodes[1] = 1; myAssemNodes[0] = 0; myAssemNodes[1] = 1; myAssemNodes[2] = 3; } else { myNodes[0] = 2; myNodes[1] = 3; myAssemNodes[0] = 1; myAssemNodes[1] = 2; myAssemNodes[2] = 3; } Epetra_SerialDenseVector values(9); values[0] = 2.0; values[1] = 1.0; values[2] = 1.0; values[3] = 1.0; values[4] = 2.0; values[5] = 1.0; values[6] = 1.0; values[7] = 1.0; values[8] = 2.0;

FEM Example cont. 26. Epetra_Map Map(-1, numMyNodes, myNodes.Values(), 27. indexBase, map.Comm()); 28. int rowLengths = 3; 29. Epetra_FECrsMatrix A(Copy, Map, rowLengths); 30. A.SumIntoGlobalValues(numMyAssemNodes, 31. myAssemNodes.Values(), 32. numMyAssemNodes, 33. myAssemNodes.Values(), 34. values.Values(), 35. Epetra_FECrsMatrix::ROW_MAJOR); 36. A.GlobalAssemble(); 37. return(0); 38.}

Need for Import/Export  Solvers for complex engineering applications need expressive, easy-to-use parallel data redistribution:  Allows better scaling for non-uniform overlapping Schwarz.  Necessary for robust solution of multiphysics problems.  We have found import and export facilities to be a very natural and powerful technique to address these issues.  A Zoltan interface exists to create Epetra_Export objects with optimal layouts.

Other uses for Import/Export  In addition, import and export facilities provide a variety of other capabilities:  Communication needs of sparse matrix multiplication.  Parallel assembly: Shared nodes receive contributions from multiple processors, reverse operation replicates results back.  Higher order interpolations are easy to implement.  Ghost node distributions.  Changing work loads can be re-balanced.  Sparse matrix transpose become trivial to implement.  Allows gradual MPI-izing of an application.  Cached Overlapped distributed vectors (generalization of distributed sparse MV).  Rendezvous algorithms easy to implement.

View vs. Copy  All Epetra objects that contain user-specified data can be created in View or Copy mode:  Copy mode:  Any user-specified data is copied.  User can delete or modify input data without affecting Epetra object.  View mode:  User-specified data is “pointed to” and not copied.  If user modifies input data, Epetra object is affected.  Views are dangerous but necessary.

Vector View vs. Copy Example 1. int VectorViews(const Epetra_Map& map) 2.{ 3. // Simple vector constructor 4. // Defaults to “Copy” mode 5. // Values are set to zero. 6. Epetra_Vector x0(map); 7. x0.Random(); 8. // Get pointer to values in x0 9. double *y; 10. x0.ExtractView(&y); 11. // Vector constructor, copy in data 12. Epetra_Vector x1(Copy, map, y); 13. // Redefine y[0]. What else changes? 14. y[0] *= 2; // x0[0] changes, x1[0] doesn’t 15. // Vector constructor, view of data 16. Epetra_Vector x2(View, map, y); 17. // Redefine y[0] again. What changes? 18. y[0] *= 2; // x0[0], x2[0] change, x1[0] doesn’t. 19. return(0); 20.}

Special Notes on View/Copy  View/Copy name conflicts (Sorry!)  In a design oversight, View/Copy is not in a name space.  Occasionally cause name conflicts with user variable names.  Beware and sorry! Fixed in Tpetra.  Use of View with matrix and graph objects force some restrictions:  The insertion methods can only be called once for each row. Necessary because class has no control over data space.  Use of View should be considered an “optimization” phase if possible:  Write initial code using Copy mode if possible.

Final Comments  Epetra is the current production concrete matrix and vector library for Trilinos.  It is being actively developed, but interfaces are very stable.  Near-term major developments:  Permutation classes.  Kokkos package (high-performance kernels).  Extensive online documentation:   User Guide in Preparation. Will announce to when ready.