An Overview of TSFCore Roscoe A. Bartlett 9211, Optimization and Uncertainty Estimation Sandia is a multiprogram laboratory operated by Sandia Corporation,

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

Christian Delbe1 Christian Delbé OASIS Team INRIA -- CNRS - I3S -- Univ. of Nice Sophia-Antipolis November Automatic Fault Tolerance in ProActive.
1 Approved for unlimited release as SAND C Verification Practices for Code Development Teams Greg Weirs Computational Shock and Multiphysics.
Problem Uncertainty quantification (UQ) is an important scientific driver for pushing to the exascale, potentially enabling rigorous and accurate predictive.
OpenFOAM on a GPU-based Heterogeneous Cluster
MPI in uClinux on Microblaze Neelima Balakrishnan Khang Tran 05/01/2006.
MOOCHO and TSFCore object-oriented software and interfaces for the development of optimization and other advanced abstract numerical algorithms Roscoe.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Antonio M. Vidal Jesús Peinado
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Page 1 Trilinos Software Engineering Technologies and Integration Capability Area Overview Roscoe A. Bartlett Department.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Integrating Trilinos Solvers to SEAM code Dagoberto A.R. Justo – UNM Tim Warburton – UNM Bill Spotz – Sandia.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Page 1 Trilinos Software Engineering Technologies and Integration Capability Area Overview Roscoe A. Bartlett Department.
Page 1 Trilinos Software Engineering Technologies and Integration Numerical Algorithm Interoperability and Vertical Integration –Abstract Numerical Algorithms.
Large-Scale Stability Analysis Algorithms Andy Salinger, Roger Pawlowski, Ed Wilkes Louis Romero, Rich Lehoucq, John Shadid Sandia National Labs Albuquerque,
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Page 1 Embedded Sensitivities and Optimization From Research to Applications Roscoe A. Bartlett Department of Optimization & Uncertainty Estimation Sandia.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
Page 1 Trilinos Release Improvement Issues Roscoe A. Bartlett Department of Optimization & Uncertainty Estimation Trilinos.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 1 Barry Smith Argonne National Laboratory DOE 2000 Numerics Capability
TSF and Abstract Interfaces in Trilinos TSFCore, TSFExtended … Roscoe A. Bartlett Department 9211: Optimization and Uncertainty Estimation Sandia National.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
After step 2, processors know who owns the data in their assumed partitions— now the assumed partition defines the rendezvous points Scalable Conceptual.
1 ModelEvaluator Scalable, Extendable Interface Between Embedded Nonlinear Analysis Algorithms and Applications Roscoe A. Bartlett Department of Optimization.
Sensitivities and Optimization: Going Beyond the Forward Solve (to Enable More Predictive Simulations) Roscoe A. Bartlett Department of Optimization &
Strategies for Solving Large-Scale Optimization Problems Judith Hill Sandia National Laboratories October 23, 2007 Modeling and High-Performance Computing.
Solvers Made Easy (to use and use together) Thyra, Stratimikos, Handles and More... Roscoe A. Bartlett Department of Optimization & Uncertainty Estimation.
An Overview of the Thyra Interoperability Effort Current Status and Future Plans Roscoe A. Bartlett Department 1411: Optimization and Uncertainty Estimation.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
Danny Dunlavy, Andy Salinger Sandia National Laboratories Albuquerque, New Mexico, USA SIAM Parallel Processing February 23, 2006 SAND C Sandia.
Scalable Linear Algebra Capability Area Michael A. Heroux Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation,
Thyra from a Developer's Perspective Roscoe A. Bartlett Department 1411: Optimization and Uncertainty Estimation Sandia National Laboratories Sandia is.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
New Features in ML 2004 Trilinos Users Group Meeting November 2-4, 2004 Jonathan Hu, Ray Tuminaro, Marzio Sala, Michael Gee, Haim Waisman Sandia is a multiprogram.
1 Stratimikos Unified Wrapper to Trilinos Linear Solvers and Preconditioners Roscoe A. Bartlett Department of Optimization & Uncertainty Estimation Sandia.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Teuchos: Utilities for Developers & Users November 2nd, 3:30-4:30pm Roscoe Bartlett Mike Heroux Kris Kampshoff Kevin Long Paul Sexton Heidi.
STK (Sierra Toolkit) Update Trilinos User Group meetings, 2014 R&A: SAND PE Sandia National Laboratories is a multi-program laboratory operated.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
Fortress Aaron Becker Abhinav Bhatele Hassan Jafri 2 May 2006.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory.
Multifidelity Optimization Using Asynchronous Parallel Pattern Search and Space Mapping Techniques Genetha Gray*, Joe Castro i, Patty Hough*, and Tony.
1 VSIPL++: Parallel Performance HPEC 2004 CodeSourcery, LLC September 30, 2004.
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway.
Programming Massively Parallel Graphics Multiprocessors using CUDA Final Project Amirhassan Asgari Kamiabad
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
What’s New for Epetra Michael A. Heroux Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin.
Page 1 Open-Source Software for Interfacing and Support of Large-scale Embedded Nonlinear Optimization Roscoe A. Bartlett
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
On the Path to Trinity - Experiences Bringing Codes to the Next Generation ASC Platform Courtenay T. Vaughan and Simon D. Hammond Sandia National Laboratories.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Jens Krüger Technische Universität München
ModelEvaluator Scalable, Extendable Interface Between Embedded Nonlinear Analysis Algorithms and Applications Roscoe A. Bartlett Department of Optimization.
Trilinos Software Engineering Technologies and Integration
GENERAL VIEW OF KRATOS MULTIPHYSICS
Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz
VSIPL++: Parallel Performance HPEC 2004
Embedded Nonlinear Analysis Tools Capability Area
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

An Overview of TSFCore Roscoe A. Bartlett 9211, Optimization and Uncertainty Estimation Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL85000.

TSFCore SAND Reports Get most recent copy at: Trilinos/doc/TSFCore

Nonlinear Equations : Foundation for all our Work! Applications Discretized PDEs (e.g. finite element, finite volume, finite difference etc.) Network problems (e.g. Xyce)

Nonlinear Equations : Sensitivities Related Algorithms Gradient-based optimization SAND NAND Nonlinear equations (NLS) Multidisciplinary analysis Linear (matrix) analysis Block iterative solvers Eigenvalue problems Uncertainty quantification SFE Stability analysis / continuation Transients (ODEs, DAEs) B. van Bloemen Waanders, R. A. Bartlett, K. R. Long and P. T. Boggs. Large Scale Non-Linear Programming: PDE Applications and Dynamical Systems, Sandia National Laboratories, SAND , 2002

Applications, Algorithms, Linear-Algebra Software APP : Application (e.g. MPSalsa, Xyce, SIERRA, NEVADA etc.) LAL : Linear-Algebra Library (e.g. Petra/Ifpack, PETSc, Aztec etc.) ANA : Abstract Numerical Algorithm (e.g. optimization, nonlinear solvers, stability analysis, SFE, transient solvers etc.) Key points Complex algorithms Complex software Complex interfaces Complex computers Duplication of effort? Examples: Epetra_RowMatrix fei::Matrix TSF::MatrixOperator

TSFCore Key points Maximizing development impact Software can be run on more sophisticated computers Fosters improved algorithm development TSFCore TSFCore::Nonlin

Requirements for TSFCore TSFCore should: · Be portable to ASCI platforms · Provide for stable and accurate numerical computations · Represent a minimal but complete interface that will result in implementations that are: · Near optimal in computational speed · Near optimal in storage · Be independent of computing environment (SPMD, MS, CS etc.) · Be easy to develop adapters for existing libraries (e.g. Epetra, PETSc etc.)

Example ANA : Linear Conjugate Gradient Solver

TSFCore : Basic Linear Algebra Interfaces An operator knows its domain and range spaces A linear operator is a kind of operator Warning! Unified Modeling Langage (UML) Notation! A Vector knows its VectorSpace > VectorSpaces create Vectors!

TSFCore : Basic Linear Algebra Interfaces >

TSFCore : Basic Linear Algebra Interfaces A MulitVector is a linear operator! A MulitVector has a collection of column vectors! A MulitVector is a tall thin dense matrix > VectorSpaces create MultiVectors!

TSFCore : Basic Linear Algebra Interfaces >

TSFCore : Basic Linear Algebra Interfaces The Key to success! Reduction/Transformation Operators Supports all needed vector operations Data/parallel independence Optimal performance R. A. Bartlett, B. G. van Bloemen Waanders and M. A. Heroux. Vector Reduction/Transformation Operators, Accepted to ACM TOMS, 2003

Background for TSFCore · 1996 : Hilbert Class Library (HCL), [Symes and Gockenbach] · Abstract vector spaces, vectors, linear operators · 2000 : Epetra, [Heroux] · Concrete multi-vectors · 2001 : Trilinos Solver Framework (TSF) 0.1, [Long] · 2001 : AbstractLinAlgPack (ALAP) (MOOCHO LA interfaces), [Bartlett] · Reduction/transformation operators (RTOp) · Abstract multi-vectors

TSFCore : Basic Linear Algebra Interfaces VectorSpaceFactory is related to MultiVectors VectorSpaces create Vectors and MultiVectors! MultiVector subviews can be created! Vector and MultiVector versions of apply(…)! Adjoints supported but are optional! Only one vector method!

TSFCore Details  All interfaces are templated on Scalar type (support real and complex)  Smart reference counted pointer class Teuchos::RefCountPtr<> used for all dynamic memory management · Many operations have default implementations based on very few pure virtual methods  RTOp operators (and wrapper functions) are provided for many common level- 1 vector and multi-vector operations  Default implementation provided for MultiVector ( MultiVectorCols )  Default implementations provided for serial computation: VectorSpace ( SerialVectorSpace ), VectorSpaceFactory ( SerialVectorSpaceFactory ), Vector ( SerialVector )

Vector-Vector Operations Provided with TSFCore namespace TSFCore { template Scalar sum( const Vector & v ); // result = sum(v(i)) template Scalar norm_1( const Vector & v ); // result = ||v||1 template Scalar norm_2( const Vector & v ); // result = ||v||2 template Scalar norm_inf( const Vector & v_rhs ); // result = ||v||inf template Scalar dot( const Vector & x,const Vector & y ); // result = x'*y template Scalar get_ele( const Vector & v, Index i ); // result = v(i) template void set_ele( Index i, Scalar alpha,Vector * v ); // v(i) = alpha template void assign( Vector * y, const Scalar& alpha ); // y = alpha template void assign( Vector * y,const Vector & x ); // y = x template void Vp_S( Vector * y, const Scalar& alpha ); // y += alpha template void Vt_S( Vector * y, const Scalar& alpha ); // y *= alpha template void Vp_StV( Vector * y, const Scalar& alpha,const Vector & x ); // y = alpha*x + y template void ele_wise_prod( const Scalar& alpha,const Vector & x, const Vector & v, Vector * y ); // y(i)+=alpha*x(i)*v(i) template void ele_wise_divide( const Scalar& alpha,const Vector & x, const Vector & v, Vector * y ); // y(i)=alpha*x(i)/v(i) template void seed_randomize( unsigned int ); // Seed for randomize() template void randomize( Scalar l, Scalar u, Vector * v ); // v(i) = random(l,u) } // end namespace TSFCore

TSFCore : Vectors and Vector Spaces C++ code: template Scalar foo( const VectorSpace & S ) { Teuchos::RefCountPtr > x = S.createMember(), // create x y = S.createMember(); // create y assign( &*x, 1.0 ); // x = 1 randomize( -1.0, +1.0, &*y ); // y = rand(-1,1) Vp_StV( &*y, -2.0, *x ); // y += -2.0 * x Scalar gamma = dot(*x,*y); // gamma = x’*y return gamma; } Mathematical notation:

TSFCore : Applying a Linear Operator C++ Prototype: namespace TSFCore { enum ETransp { NOTRANS, TRANS, CONJTRANS }; template class LinearOp : public virtual OpBase { public: virtual void apply( ETransp M_trans, const Vector &x, Vector *y,Scalar alpha = 1.0, Scalar beta = 0.0 ) const = 0; }; } Example: template void myOp( const Vector &x, const LinearOp &M,Vector *y ) { M.apply( NOTRANS, x, y ); }

Example ANA : Linear Conjugate Gradient Solver

Multi-vector Conjugate-Gradient Solver : Single Iteration template void CGSolver ::doIteration( const LinearOp &M, ETransp opM_notrans,ETransp opM_trans, MultiVector *X, Scalar a,const LinearOp *M_tilde_inv,ETransp opM_tilde_inv_notrans, ETransp opM_tilde_inv_trans ) const { const Index m = currNumSystems_; int j; if( M_tilde_inv ) M_tilde_inv->apply( opM_tilde_inv_notrans, *R_, &*Z_ ); else assign( &*Z_, *R_ ); dot( *Z_, *R_, &rho_[0] ); if( currIteration_ == 1 ) { assign( &*P_, *Z_ ); } else { for(j=0;j<m;++j) beta_[j] = rho_[j]/rho_old_[j]; update( *Z_, &beta_[0], 1.0, &*P_ ); } M.apply( opM_notrans, *P_, &*Q_ ); dot( *P_, *Q_, &gamma_[0] ); for(j=0;j<m;++j) alpha_[j] = rho_[j]/gamma_[j]; update( &alpha_[0], +1.0, *P_, X ) update( &alpha_[0], -1.0, *Q_, &*R_ ); }

The TSFCore Trilinos package · packages/TSFCore · src · interfaces · Core : VectorSpace, Vector, LinearOp etc … · Solvers : Iterative linear solver interfaces (unofficial!) · Nonlin : Nonlinear problem interfaces (unofficial!) · utilities · Core : Testing etc … · Solvers : Some iterative solvers (CG, BiCG, GMRES) · Nonlin : Testing etc … · adapters · mpi-base : Node classes for MPI-based vector spaces · Epetra : EpetraVectorSpace, EpetraVector etc … · examples …

TSFCore::Nonlin : Interfaces to Nonlinear Problems Supported Areas NAND optimization SAND optimization Nonlinear equations Multidisciplinary analysis Stability analysis / continuation SFE Function evaluations: NonlinearProblem (nonsingular) State Jacobian evaluations : Auxiliary Jacobian evaluations : NonlinearProblemFirstOrder

State constraints and response functions TSFCore::Nonlin : Interfaces to Nonlinear Problems Supported Areas SAND Nonlinear equations Multidisciplinary analysis Stability analysis / continuation SFE

Summary SAND Reports R. A. Bartlett, M. A. Heroux and K. R. Long. TSFCore : A Package of Light- Weight Object-Oriented Abstractions for the Development of Abstract Numerical Algorithms and Interfacing to Linear Algebra Libraries and Applications, Sandia National Laboratories, SAND , 2003 R. A. Bartlett, TSFCore::Nonlin : An Extension of TSFCore for the Development of Nonlinear Abstract Numerical Algorithms and Interfacing to Nonlinear Applications, Sandia National Laboratories, SAND , 2003 Location: Trilinos/doc/TSFCore

The End Thank You!

Extra Slides

Examples of Non-Standard Vector Operations Examples from OOQP (Gertz, Wright) Example from TRICE (Dennis, Heinkenschloss, Vicente) Example from IPOPT (Waechter) Currently in MOOCHO : > 40 vector operations!

Goals for a Vector Interface Compute efficiency => Near optimal performance Optimization developers add new operations => Independence of linear algebra. library developers Compute environment independence => Flexible optimization software Minimal number of methods => Easy to write adapters

Approaches to Developing Vector Interfaces (1) Linear algebra library allows direct access to vector elements (2) Optimizer-specific interfaces (3) General-purpose primitive vector operations

Vector Reduction/Transformation Operators Defined Reduction/Transformation Operators (RTOp) Defined z 1 i … z q i  op t ( i, v 1 i … v p i, z 1 i … z q i ) element-wise transformation   op r ( i, v 1 i … v p i, z 1 i … z q i ) element-wise reduction  2  op rr (  1,  2 ) reduction of intermediate reduction objects v 1 … v p  R n : p non-mutable input vectors z 1 … z q  R n : q mutable input/output vectors  : reduction target object (many be non-scalar (e.g. {y k,k}), or NULL) Key to Optimal Performance op t (…) and op r (…) applied to entire sets of subvectors (i = a…b) independently: z 1 a:b … z q a:b,   op( a, b, v 1 a:b … v p a:b, z 1 a:b … z q a:b,  ) Communication between sets of subvectors only for   NULL, op rr (  1,  2 )   2

Object-Oriented Design for User Defined RTOp Operators Advantages: Functionality Linear-algebra implementations can be changed with no impact on optimizer Optimizer developers can unilaterally add new vector operations Performance Near optimal performance (large subvectors) Multiple simultaneous global reductions => no sequential bottlenecks No unnecessary temporary vectors or multiple vector read/writes Disadvantages: New concepts, initially harder to understand interfaces?

RTOp vs. Primitives : Communication Compare –RTOp (all-at-once reduction (i.e. ISIS++ QMR solver)) {  , , , ,  }  { (x T x) 1/2, (v T v) 1/2, (w T w) 1/2, w T v, v T t } –Primitives (5 separate reductions)   (x T x) 1/2,   (v T v) 1/2,   (w T w) 1/2,   w T v,   v T * 128 processors on CPlant ®

RTOp vs. Primitives : Multiple Ops and Temporaries Compare –RTOp (all-at-once reduction) { max  : x +  d  } = min{ max( (  - x i )/d i, 0 ), for i = 1 … n }   –Primitives (5 temporaries, 6 vector operations) -x i  u i, x i +   v i, v i / d i  w i, 0  y i, max{w i,y i }  z i, min{z i,i=1…n}   * 1 processor (gcc 3.1 under Linux)

Question: Does OO C++ allow for good scalability for massively parallel computing (i.e. 100 to processors)? Parallel Scalability of MOOCHO Scaleable example NLP (m = n/2) Variable reduction range / null space decomposition Diagonal matrices => All vector ops! Where is the parallel bottleneck? Is it OO C++ or MPI? * Red Hat Linux cluster (4 nodes) 2.0 GHz Intel P4 processors MPICH Answer => MPI Serial overhead of MOOCHO (n=2, N p =1)  0.41 milliseconds per rSQP iteration Overhead of MPI communication (N p =4)  0.42 milliseconds per global reduction n (global dim) 20, ,000 2,000,000