Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Overview of TSFCore Roscoe A. Bartlett 9211, Optimization and Uncertainty Estimation Sandia is a multiprogram laboratory operated by Sandia Corporation,

Similar presentations


Presentation on theme: "An Overview of TSFCore Roscoe A. Bartlett 9211, Optimization and Uncertainty Estimation Sandia is a multiprogram laboratory operated by Sandia Corporation,"— Presentation transcript:

1 An Overview of TSFCore Roscoe A. Bartlett 9211, Optimization and Uncertainty Estimation Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL85000.

2 TSFCore SAND Reports Get most recent copy at: Trilinos/doc/TSFCore

3 Nonlinear Equations : Foundation for all our Work! Applications Discretized PDEs (e.g. finite element, finite volume, finite difference etc.) Network problems (e.g. Xyce)

4 Nonlinear Equations : Sensitivities Related Algorithms Gradient-based optimization SAND NAND Nonlinear equations (NLS) Multidisciplinary analysis Linear (matrix) analysis Block iterative solvers Eigenvalue problems Uncertainty quantification SFE Stability analysis / continuation Transients (ODEs, DAEs) B. van Bloemen Waanders, R. A. Bartlett, K. R. Long and P. T. Boggs. Large Scale Non-Linear Programming: PDE Applications and Dynamical Systems, Sandia National Laboratories, SAND2002-3198, 2002

5 Applications, Algorithms, Linear-Algebra Software APP : Application (e.g. MPSalsa, Xyce, SIERRA, NEVADA etc.) LAL : Linear-Algebra Library (e.g. Petra/Ifpack, PETSc, Aztec etc.) ANA : Abstract Numerical Algorithm (e.g. optimization, nonlinear solvers, stability analysis, SFE, transient solvers etc.) Key points Complex algorithms Complex software Complex interfaces Complex computers Duplication of effort? Examples: Epetra_RowMatrix fei::Matrix TSF::MatrixOperator

6 TSFCore Key points Maximizing development impact Software can be run on more sophisticated computers Fosters improved algorithm development TSFCore TSFCore::Nonlin

7 Requirements for TSFCore TSFCore should: · Be portable to ASCI platforms · Provide for stable and accurate numerical computations · Represent a minimal but complete interface that will result in implementations that are: · Near optimal in computational speed · Near optimal in storage · Be independent of computing environment (SPMD, MS, CS etc.) · Be easy to develop adapters for existing libraries (e.g. Epetra, PETSc etc.)

8 Example ANA : Linear Conjugate Gradient Solver

9 TSFCore : Basic Linear Algebra Interfaces An operator knows its domain and range spaces A linear operator is a kind of operator Warning! Unified Modeling Langage (UML) Notation! A Vector knows its VectorSpace > VectorSpaces create Vectors!

10 TSFCore : Basic Linear Algebra Interfaces >

11 TSFCore : Basic Linear Algebra Interfaces A MulitVector is a linear operator! A MulitVector has a collection of column vectors! A MulitVector is a tall thin dense matrix > VectorSpaces create MultiVectors!

12 TSFCore : Basic Linear Algebra Interfaces >

13 TSFCore : Basic Linear Algebra Interfaces The Key to success! Reduction/Transformation Operators Supports all needed vector operations Data/parallel independence Optimal performance R. A. Bartlett, B. G. van Bloemen Waanders and M. A. Heroux. Vector Reduction/Transformation Operators, Accepted to ACM TOMS, 2003

14 Background for TSFCore · 1996 : Hilbert Class Library (HCL), [Symes and Gockenbach] · Abstract vector spaces, vectors, linear operators · 2000 : Epetra, [Heroux] · Concrete multi-vectors · 2001 : Trilinos Solver Framework (TSF) 0.1, [Long] · 2001 : AbstractLinAlgPack (ALAP) (MOOCHO LA interfaces), [Bartlett] · Reduction/transformation operators (RTOp) · Abstract multi-vectors

15 TSFCore : Basic Linear Algebra Interfaces VectorSpaceFactory is related to MultiVectors VectorSpaces create Vectors and MultiVectors! MultiVector subviews can be created! Vector and MultiVector versions of apply(…)! Adjoints supported but are optional! Only one vector method!

16 TSFCore Details  All interfaces are templated on Scalar type (support real and complex)  Smart reference counted pointer class Teuchos::RefCountPtr<> used for all dynamic memory management · Many operations have default implementations based on very few pure virtual methods  RTOp operators (and wrapper functions) are provided for many common level- 1 vector and multi-vector operations  Default implementation provided for MultiVector ( MultiVectorCols )  Default implementations provided for serial computation: VectorSpace ( SerialVectorSpace ), VectorSpaceFactory ( SerialVectorSpaceFactory ), Vector ( SerialVector )

17 Vector-Vector Operations Provided with TSFCore namespace TSFCore { template Scalar sum( const Vector & v ); // result = sum(v(i)) template Scalar norm_1( const Vector & v ); // result = ||v||1 template Scalar norm_2( const Vector & v ); // result = ||v||2 template Scalar norm_inf( const Vector & v_rhs ); // result = ||v||inf template Scalar dot( const Vector & x,const Vector & y ); // result = x'*y template Scalar get_ele( const Vector & v, Index i ); // result = v(i) template void set_ele( Index i, Scalar alpha,Vector * v ); // v(i) = alpha template void assign( Vector * y, const Scalar& alpha ); // y = alpha template void assign( Vector * y,const Vector & x ); // y = x template void Vp_S( Vector * y, const Scalar& alpha ); // y += alpha template void Vt_S( Vector * y, const Scalar& alpha ); // y *= alpha template void Vp_StV( Vector * y, const Scalar& alpha,const Vector & x ); // y = alpha*x + y template void ele_wise_prod( const Scalar& alpha,const Vector & x, const Vector & v, Vector * y ); // y(i)+=alpha*x(i)*v(i) template void ele_wise_divide( const Scalar& alpha,const Vector & x, const Vector & v, Vector * y ); // y(i)=alpha*x(i)/v(i) template void seed_randomize( unsigned int ); // Seed for randomize() template void randomize( Scalar l, Scalar u, Vector * v ); // v(i) = random(l,u) } // end namespace TSFCore

18 TSFCore : Vectors and Vector Spaces C++ code: template Scalar foo( const VectorSpace & S ) { Teuchos::RefCountPtr > x = S.createMember(), // create x y = S.createMember(); // create y assign( &*x, 1.0 ); // x = 1 randomize( -1.0, +1.0, &*y ); // y = rand(-1,1) Vp_StV( &*y, -2.0, *x ); // y += -2.0 * x Scalar gamma = dot(*x,*y); // gamma = x’*y return gamma; } Mathematical notation:

19 TSFCore : Applying a Linear Operator C++ Prototype: namespace TSFCore { enum ETransp { NOTRANS, TRANS, CONJTRANS }; template class LinearOp : public virtual OpBase { public: virtual void apply( ETransp M_trans, const Vector &x, Vector *y,Scalar alpha = 1.0, Scalar beta = 0.0 ) const = 0; }; } Example: template void myOp( const Vector &x, const LinearOp &M,Vector *y ) { M.apply( NOTRANS, x, y ); }

20 Example ANA : Linear Conjugate Gradient Solver

21 Multi-vector Conjugate-Gradient Solver : Single Iteration template void CGSolver ::doIteration( const LinearOp &M, ETransp opM_notrans,ETransp opM_trans, MultiVector *X, Scalar a,const LinearOp *M_tilde_inv,ETransp opM_tilde_inv_notrans, ETransp opM_tilde_inv_trans ) const { const Index m = currNumSystems_; int j; if( M_tilde_inv ) M_tilde_inv->apply( opM_tilde_inv_notrans, *R_, &*Z_ ); else assign( &*Z_, *R_ ); dot( *Z_, *R_, &rho_[0] ); if( currIteration_ == 1 ) { assign( &*P_, *Z_ ); } else { for(j=0;j<m;++j) beta_[j] = rho_[j]/rho_old_[j]; update( *Z_, &beta_[0], 1.0, &*P_ ); } M.apply( opM_notrans, *P_, &*Q_ ); dot( *P_, *Q_, &gamma_[0] ); for(j=0;j<m;++j) alpha_[j] = rho_[j]/gamma_[j]; update( &alpha_[0], +1.0, *P_, X ) update( &alpha_[0], -1.0, *Q_, &*R_ ); }

22 The TSFCore Trilinos package · packages/TSFCore · src · interfaces · Core : VectorSpace, Vector, LinearOp etc … · Solvers : Iterative linear solver interfaces (unofficial!) · Nonlin : Nonlinear problem interfaces (unofficial!) · utilities · Core : Testing etc … · Solvers : Some iterative solvers (CG, BiCG, GMRES) · Nonlin : Testing etc … · adapters · mpi-base : Node classes for MPI-based vector spaces · Epetra : EpetraVectorSpace, EpetraVector etc … · examples …

23 TSFCore::Nonlin : Interfaces to Nonlinear Problems Supported Areas NAND optimization SAND optimization Nonlinear equations Multidisciplinary analysis Stability analysis / continuation SFE Function evaluations: NonlinearProblem (nonsingular) State Jacobian evaluations : Auxiliary Jacobian evaluations : NonlinearProblemFirstOrder

24 State constraints and response functions TSFCore::Nonlin : Interfaces to Nonlinear Problems Supported Areas SAND Nonlinear equations Multidisciplinary analysis Stability analysis / continuation SFE

25 Summary SAND Reports R. A. Bartlett, M. A. Heroux and K. R. Long. TSFCore : A Package of Light- Weight Object-Oriented Abstractions for the Development of Abstract Numerical Algorithms and Interfacing to Linear Algebra Libraries and Applications, Sandia National Laboratories, SAND2003-1378, 2003 R. A. Bartlett, TSFCore::Nonlin : An Extension of TSFCore for the Development of Nonlinear Abstract Numerical Algorithms and Interfacing to Nonlinear Applications, Sandia National Laboratories, SAND2003-1377, 2003 Location: Trilinos/doc/TSFCore

26 The End Thank You!

27 Extra Slides

28 Examples of Non-Standard Vector Operations Examples from OOQP (Gertz, Wright) Example from TRICE (Dennis, Heinkenschloss, Vicente) Example from IPOPT (Waechter) Currently in MOOCHO : > 40 vector operations!

29 Goals for a Vector Interface Compute efficiency => Near optimal performance Optimization developers add new operations => Independence of linear algebra. library developers Compute environment independence => Flexible optimization software Minimal number of methods => Easy to write adapters

30 Approaches to Developing Vector Interfaces (1) Linear algebra library allows direct access to vector elements (2) Optimizer-specific interfaces (3) General-purpose primitive vector operations

31 Vector Reduction/Transformation Operators Defined Reduction/Transformation Operators (RTOp) Defined z 1 i … z q i  op t ( i, v 1 i … v p i, z 1 i … z q i ) element-wise transformation   op r ( i, v 1 i … v p i, z 1 i … z q i ) element-wise reduction  2  op rr (  1,  2 ) reduction of intermediate reduction objects v 1 … v p  R n : p non-mutable input vectors z 1 … z q  R n : q mutable input/output vectors  : reduction target object (many be non-scalar (e.g. {y k,k}), or NULL) Key to Optimal Performance op t (…) and op r (…) applied to entire sets of subvectors (i = a…b) independently: z 1 a:b … z q a:b,   op( a, b, v 1 a:b … v p a:b, z 1 a:b … z q a:b,  ) Communication between sets of subvectors only for   NULL, op rr (  1,  2 )   2

32 Object-Oriented Design for User Defined RTOp Operators Advantages: Functionality Linear-algebra implementations can be changed with no impact on optimizer Optimizer developers can unilaterally add new vector operations Performance Near optimal performance (large subvectors) Multiple simultaneous global reductions => no sequential bottlenecks No unnecessary temporary vectors or multiple vector read/writes Disadvantages: New concepts, initially harder to understand interfaces?

33 RTOp vs. Primitives : Communication Compare –RTOp (all-at-once reduction (i.e. ISIS++ QMR solver)) {  , , , ,  }  { (x T x) 1/2, (v T v) 1/2, (w T w) 1/2, w T v, v T t } –Primitives (5 separate reductions)   (x T x) 1/2,   (v T v) 1/2,   (w T w) 1/2,   w T v,   v T * 128 processors on CPlant ®

34 RTOp vs. Primitives : Multiple Ops and Temporaries Compare –RTOp (all-at-once reduction) { max  : x +  d  } = min{ max( (  - x i )/d i, 0 ), for i = 1 … n }   –Primitives (5 temporaries, 6 vector operations) -x i  u i, x i +   v i, v i / d i  w i, 0  y i, max{w i,y i }  z i, min{z i,i=1…n}   * 1 processor (gcc 3.1 under Linux)

35 Question: Does OO C++ allow for good scalability for massively parallel computing (i.e. 100 to 10000 processors)? Parallel Scalability of MOOCHO Scaleable example NLP (m = n/2) Variable reduction range / null space decomposition Diagonal matrices => All vector ops! Where is the parallel bottleneck? Is it OO C++ or MPI? * Red Hat Linux cluster (4 nodes) 2.0 GHz Intel P4 processors MPICH 1.2.2.1 Answer => MPI Serial overhead of MOOCHO (n=2, N p =1)  0.41 milliseconds per rSQP iteration Overhead of MPI communication (N p =4)  0.42 milliseconds per global reduction n (global dim) 20,000 200,000 2,000,000


Download ppt "An Overview of TSFCore Roscoe A. Bartlett 9211, Optimization and Uncertainty Estimation Sandia is a multiprogram laboratory operated by Sandia Corporation,"

Similar presentations


Ads by Google