Material adapted from a tutorial by: PETSc Satish Balay Bill Gropp Lois Curfman McInnes Barry Smith www-fp.mcs.anl.gov/petsc/docs/tutorials/index.htm Mathematics.

Slides:



Advertisements
Similar presentations
Introduction to C Programming
Advertisements

MPI Message Passing Interface
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Programming Paradigms and languages
Advanced Computational Software Scientific Libraries: Part 2 Blue Waters Undergraduate Petascale Education Program May 29 – June
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
1 A Common Application Platform (CAP) for SURAgrid -Mahantesh Halappanavar, John-Paul Robinson, Enis Afgane, Mary Fran Yafchalk and Purushotham Bangalore.
CSCI 317 Mike Heroux1 Sparse Matrix Computations CSCI 317 Mike Heroux.
Cactus in GrADS (HFA) Ian Foster Dave Angulo, Matei Ripeanu, Michael Russell.
Sparse Matrix Algorithms CS 524 – High-Performance Computing.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
Introduction to the ACTS Collection Why/How Do We Use These Tools? Tony Drummond Computational Research Division Lawrence Berkeley National Laboratory.
PETSc Tutorial Numerical Software Libraries for
PETSc Portable, Extensible Toolkit for Scientific computing.
CS240A: Conjugate Gradients and the Model Problem.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Numerical Software Libraries for the Scalable Solution of PDEs Introduction to PETSc Satish Balay, Bill Gropp, Lois C. McInnes, Barry Smith Mathematics.
Programming Languages: Telling the Computers What to Do Chapter 16.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.
Introduction to MATLAB Session 1 Prepared By: Dina El Kholy Ahmed Dalal Statistics Course – Biomedical Department -year 3.
Introduction to PETSc VIGRE Seminar, Wednesday, November 8, 2006.
1 Intel Mathematics Kernel Library (MKL) Quickstart COLA Lab, Department of Mathematics, Nat’l Taiwan University 2010/05/11.
Numerical Software Libraries for the Scalable Solution of PDEs PETSc Tutorial Satish Balay Kris Buschelman Bill Gropp Lois Curfman McInnes Barry Smith.
HDF5 A new file format & software for high performance scientific data management.
Non-uniformly Communicating Non-contiguous Data: A Case Study with PETSc and MPI P. Balaji, D. Buntinas, S. Balay, B. Smith, R. Thakur and W. Gropp Mathematics.
1 Using the PETSc Parallel Software library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems (M. Minkoff.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 1 Barry Smith Argonne National Laboratory DOE 2000 Numerics Capability
Using the PETSc Linear Solvers Lois Curfman McInnes in collaboration with Satish Balay, Bill Gropp, and Barry Smith Mathematics and Computer Science Division.
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
After step 2, processors know who owns the data in their assumed partitions— now the assumed partition defines the rendezvous points Scalable Conceptual.
Numerical Methods for Sparse Systems
PETSc and Neuronal Networks Toby Isaac VIGRE Seminar, Wednesday, November 15, 2006.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
Term 2, 2011 Week 1. CONTENTS Problem-solving methodology Programming and scripting languages – Programming languages Programming languages – Scripting.
Intel Math Kernel Library Wei-Ren Chang, Department of Mathematics, National Taiwan University 2009/03/10.
1 SciDAC TOPS PETSc Work SciDAC TOPS Developers Satish Balay Chris Buschelman Matt Knepley Barry Smith.
Parallelizing Gauss-Seidel Solver/Pre-conditioner Aim: To parallelize a Gauss-Seidel Solver, which can be used as a pre-conditioner for the finite element.
1 1  Capabilities: Scalable algebraic solvers for PDEs Freely available and supported research code Usable from C, C++, Fortran 77/90, Python, MATLAB.
Parallel Solution of the Poisson Problem Using MPI
CS240A: Conjugate Gradients and the Model Problem.
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
CCA Common Component Architecture CCA Forum Tutorial Working Group CCA Status and Plans.
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
Implementing Hypre- AMG in NIMROD via PETSc S. Vadlamani- Tech X S. Kruger- Tech X T. Manteuffel- CU APPM S. McCormick- CU APPM Funding: DE-FG02-07ER84730.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
Fortress Aaron Becker Abhinav Bhatele Hassan Jafri 2 May 2006.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden
2/22/2001Greenbook 2001/OASCR1 Greenbook/OASCR Activities Focus on technology to enable SCIENCE to be conducted, i.e. Software tools Software libraries.
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
Algebraic Solvers in FASTMath Argonne Training Program on Extreme-Scale Computing August 2015.
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
Parallel Computing Activities at the Group of Scientific Software Xing Cai Department of Informatics University of Oslo.
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory.
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
Data Objects Object creation Object assembly Setting options Viewing User-defined customizations Vectors ( Vec ) –focus: field data arising in nonlinear.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Xing Cai University of Oslo
Distributed Shared Memory
Lecture 19 MA471 Fall 2003.
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Presentation transcript:

Material adapted from a tutorial by: PETSc Satish Balay Bill Gropp Lois Curfman McInnes Barry Smith www-fp.mcs.anl.gov/petsc/docs/tutorials/index.htm Mathematics and Computer Science Division Argonne National Laboratory

PETSc Philosophy Writing hand-parallelized (message-passing or distributed shared memory) application codes from scratch is extremely difficult and time consuming. Scalable parallelizing compilers for real application codes are very far in the future. We can ease the development of parallel application codes by developing general-purpose, parallel numerical PDE libraries.

PETSc Concepts in the Tutorial How to specify the mathematics of the problem –Data objects vectors, matrices How to solve the problem –Solvers linear, nonlinear, and timestepping (ODE) solvers Parallel computing complications –Parallel data layout structured and unstructured meshes

Tutorial Outline Getting started –sample results –programming paradigm Data objects –vectors (e.g., field variables) –matrices (e.g., sparse Jacobians) Viewers –object information –visualization Solvers –linear –nonlinear –timestepping (and ODEs) Data layout and ghost values –structured mesh problems –unstructured mesh problems –partitioning and coloring Putting it all together –a complete example Debugging and error handling Profiling and performance tuning –-log_summary –stages and preloading –user-defined events Extensibility issues

Advanced –user-defined customization of algorithms and data structures Developer –advanced customizations, intended primarily for use by library developers Tutorial Approach Beginner –basic functionality, intended for use by most programmers Intermediate –selecting options, performance evaluation and tuning From the perspective of an application programmer: beginner 1 intermediate 2 advanced 3 developer 4

Incremental Application Improvement Beginner –Get the application “up and walking” Intermediate –Experiment with options –Determine opportunities for improvement Advanced –Extend algorithms and/or data structures as needed Developer –Consider interface and efficiency issues for integration and interoperability of multiple toolkits

The PETSc Programming Model Goals –Portable, runs everywhere –Performance –Scalable parallelism Approach –Distributed memory, “shared-nothing” Requires only a compiler (single node or processor) Access to data on remote machines through MPI –Can still exploit “compiler discovered” parallelism on each node (e.g., SMP) –Hide within parallel objects the details of the communication –User orchestrates communication at a higher abstract level than message passing tutorial introduction

Collectivity MPI communicators (MPI_Comm) specify collectivity (processors involved in a computation) All PETSc creation routines for solver and data objects are collective with respect to a communicator, e.g., VecCreate(MPI_Comm comm, int m, int M, Vec *x) Some operations are collective, while others are not, e.g., –collective: VecNorm( ) –not collective: VecGetLocalSize() If a sequence of collective routines is used, they must be called in the same order on each processor tutorial introduction

Computation and Communication Kernels MPI, MPI-IO, BLAS, LAPACK Profiling Interface PETSc PDE Application Codes Object-Oriented Matrices, Vectors, Indices Grid Management Linear Solvers Preconditioners + Krylov Methods Nonlinear Solvers, Unconstrained Minimization ODE Integrators Visualization Interface PDE Application Codes tutorial introduction

Compressed Sparse Row (AIJ) Blocked Compressed Sparse Row (BAIJ) Block Diagonal (BDIAG) DenseOther IndicesBlock IndicesStrideOther Index Sets Vectors Line SearchTrust Region Newton-based Methods Other Nonlinear Solvers Additive Schwartz Block Jacobi ILUICC LU (Sequential only) Others Preconditioners Euler Backward Euler Pseudo Time Stepping Other Time Steppers GMRESCGCGSBi-CG-STABTFQMRRichardsonChebychevOther Krylov Subspace Methods Matrices PETSc Numerical Components tutorial introduction

PETSc codeUser code Application Initialization Function Evaluation Jacobian Evaluation Post- Processing PCKSP PETSc Main Routine Linear Solvers (SLES) Nonlinear Solvers (SNES) Timestepping Solvers (TS) Flow of Control for PDE Solution tutorial introduction

True Portability Tightly coupled systems –Cray T3D/T3E –SGI/Origin –IBM SP –Convex Exemplar Loosely coupled systems, e.g., networks of workstations –Sun OS, Solaris 2.5 –IBM AIX –DEC Alpha –HP tutorial introduction –Linux –Freebsd –Windows NT/95

Data Objects Object creation Object assembly Setting options Viewing User-defined customizations Vectors ( Vec ) –focus: field data arising in nonlinear PDEs Matrices ( Mat ) –focus: linear operators arising in nonlinear PDEs (i.e., Jacobians) tutorial outline: data objects beginner intermediate advanced intermediate

Vectors Fundamental objects for storing field solutions, right-hand sides, etc. VecCreateMPI(...,Vec *) –MPI_Comm - processors that share the vector –number of elements local to this processor –total number of elements Each process locally owns a subvector of contiguously numbered global indices data objects: vectors beginner proc 3 proc 2 proc 0 proc 4 proc 1

Vectors: Example #include "vec.h" int main(int argc,char **argv) { Vec x,y,w; /* vectors */ Vec *z; /* array of vectors */ double norm,v,v1,v2;int n = 20,ierr; Scalar one = 1.0,two = 2.0,three = 3.0,dots[3],dot; PetscInitialize(&argc,&argv,(char*)0,help); ierr = OptionsGetInt(PETSC_NULL,"n",&n,PETSC_NULL); CHKERRA(ierr); ierr =VecCreate(PETSC_COMM_WORLD,PETSC_DECIDE,n,&x); CHKERRA(ierr); ierr = VecSetFromOptions(x);CHKERRA(ierr); ierr = VecDuplicate(x,&y);CHKERRA(ierr); ierr = VecDuplicate(x,&w);CHKERRA(ierr); ierr = VecSet(&one,x);CHKERRA(ierr); ierr = VecSet(&two,y);CHKERRA(ierr); ierr = VecSet(&one,z[0]);CHKERRA(ierr); ierr = VecSet(&two,z[1]);CHKERRA(ierr); ierr = VecSet(&three,z[2]);CHKERRA(ierr); ierr = VecDot(x,x,&dot);CHKERRA(ierr); ierr = VecMDot(3,x,z,dots);CHKERRA(ierr);

Vectors: Example ierr = PetscPrintf(PETSC_COMM_WORLD,"Vector length %d\n",(int)dot); CHKERRA(ierr); ierr = VecScale(&two,x);CHKERRA(ierr); ierr = VecNorm(x,NORM_2,&norm);CHKERRA(ierr); v = norm-2.0*sqrt((double)n); if (v > -1.e-10 && v < 1.e-10) v = 0.0; ierr = PetscPrintf(PETSC_COMM_WORLD,"VecScale %g\n",v); CHKERRA(ierr); ierr = VecDestroy(x);CHKERRA(ierr); ierr = VecDestroy(y);CHKERRA(ierr); ierr = VecDestroy(w);CHKERRA(ierr); ierr = VecDestroyVecs(z,3);CHKERRA(ierr); PetscFinalize(); return 0; }

Vector Assembly VecSetValues(Vec,…) –number of entries to insert/add –indices of entries –values to add –mode: [INSERT_VALUES,ADD_VALUES] VecAssemblyBegin(Vec) VecAssemblyEnd(Vec) data objects: vectors beginner

Parallel Matrix and Vector Assembly Processors may generate any entries in vectors and matrices Entries need not be generated on the processor on which they ultimately will be stored PETSc automatically moves data during the assembly process if necessary data objects: vectors and matrices beginner

Selected Vector Operations

Sparse Matrices Fundamental objects for storing linear operators (e.g., Jacobians) MatCreateMPIAIJ(…,Mat *) –MPI_Comm - processors that share the matrix –number of local rows and columns –number of global rows and columns –optional storage pre-allocation information data objects: matrices beginner

Parallel Matrix Distribution MatGetOwnershipRange(Mat A, int *rstart, int *rend) –rstart: first locally owned row of global matrix –rend -1: last locally owned row of global matrix Each process locally owns a submatrix of contiguously numbered global rows. proc 0 } proc 3: locally owned rowsproc 3 proc 2 proc 1 proc 4 data objects: matrices beginner

Matrix Assembly MatSetValues(Mat,…) –number of rows to insert/add –indices of rows and columns –number of columns to insert/add –values to add –mode: [INSERT_VALUES,ADD_VALUES] MatAssemblyBegin(Mat) MatAssemblyEnd(Mat) data objects: matrices beginner

Viewers Printing information about solver and data objects Visualization of field and matrix data Binary output of vector and matrix data tutorial outline: viewers beginner intermediate

Viewer Concepts Information about PETSc objects –runtime choices for solvers, nonzero info for matrices, etc. Data for later use in restarts or external tools –vector fields, matrix contents –various formats (ASCII, binary) Visualization –simple x-window graphics vector fields matrix sparsity structure beginner viewers

Viewing Vector Fields VecView(Vec x,Viewer v); Default viewers –ASCII (sequential): VIEWER_STDOUT_SELF –ASCII (parallel): VIEWER_STDOUT_WORLD –X-windows: VIEWER_DRAW_WORLD Default ASCII formats –VIEWER_FORMAT_ASCII_DEFAU LT –VIEWER_FORMAT_ASCII_MATLA B –VIEWER_FORMAT_ASCII_COMM ON –VIEWER_FORMAT_ASCII_INFO –etc. viewers beginner Solution components, using runtime option - snes_vecmonitor velocity: u velocity: v temperature: T vorticity: z

Viewing Matrix Data MatView(Mat A, Viewer v); Runtime options available after matrix assembly –-mat_view_info info about matrix assembly –-mat_view_draw sparsity structure –-mat_view data in ASCII –etc. viewers beginner

Linear Solvers Goal: Support the solution of linear systems, Ax=b, particularly for sparse, parallel problems arising within PDE-based models User provides: –Code to evaluate A, b solvers: linear beginner

PETSc Application Initialization Evaluation of A and b Post- Processing Solve Ax = b PCKSP Linear Solvers (SLES) PETSc codeUser code Linear PDE Solution Main Routine solvers: linear beginner

Linear Solvers ( SLES ) Application code interface Choosing the solver Setting algorithmic options Viewing the solver Determining and monitoring convergence Providing a different preconditioner matrix Matrix-free solvers User-defined customizations SLES: Scalable Linear Equations Solvers tutorial outline: solvers: linear beginner intermediate advanced

Linear Solvers in PETSc 2.0 Conjugate Gradient GMRES CG-Squared Bi-CG-stab Transpose-free QMR etc. Block Jacobi Overlapping Additive Schwarz ICC, ILU via BlockSolve95 ILU(k), LU (sequential only) etc. Krylov Methods (KSP) Preconditioners (PC) solvers: linear beginner

Basic Linear Solver Code (C/C++) SLES sles; /* linear solver context */ Mat A; /* matrix */ Vec x, b; /* solution, RHS vectors */ int n, its; /* problem dimension, number of iterations */ MatCreate(MPI_COMM_WORLD,n,n,&A); /* assemble matrix */ VecCreate(MPI_COMM_WORLD,n,&x); VecDuplicate(x,&b); /* assemble RHS vector */ SLESCreate(MPI_COMM_WORLD,&sles); SLESSetOperators(sles,A,A,DIFFERENT_NONZERO_PA TTERN); SLESSetFromOptions(sles); SLESSolve(sles,b,x,&its); solvers: linear beginner

Basic Linear Solver Code (Fortran) SLES sles Mat A Vec x, b integer n, its, ierr call MatCreate(MPI_COMM_WORLD,n,n,A,ierr) call VecCreate(MPI_COMM_WORLD,n,x,ierr) call VecDuplicate(x,b,ierr) call SLESCreate(MPI_COMM_WORLD,sles,ierr) call SLESSetOperators(sles,A,A,DIFFERENT_NONZERO_PATTE RN,ierr) call SLESSetFromOptions(sles,ierr) call SLESSolve(sles,b,x,its,ierr) C then assemble matrix and right-hand-side vector solvers: linear beginner

Customization Options Procedural Interface –Provides a great deal of control on a usage-by-usage basis inside a single code –Gives full flexibility inside an application Command Line Interface –Applies same rule to all queries via a database –Enables the user to have complete control at runtime, with no extra coding solvers: linear intermediate

Setting Solver Options within Code SLESGetKSP(SLES sles,KSP *ksp) –KSPSetType(KSP ksp,KSPType type) –KSPSetTolerances(KSP ksp,double rtol,double atol,double dtol, int maxits) –... SLESGetPC(SLES sles,PC *pc) –PCSetType(PC pc,PCType) –PCASMSetOverlap(PC pc,int overlap) –... solvers: linear intermediate

-ksp_type [cg,gmres,bcgs,tfqmr,…] -pc_type [lu,ilu,jacobi,sor,asm,…] -ksp_max_it -ksp_gmres_restart -pc_asm_overlap -pc_asm_type [basic,restrict,interpolate,none] etc... Setting Solver Options at Runtime solvers: linear beginner 1 intermediate 2 1 2

SLES: Review of Basic Usage SLESCreate( ) - Create SLES context SLESSetOperators( ) - Set linear operators SLESSetFromOptions( ) - Set runtime solver options for [SLES, KSP,PC] SLESSolve( ) - Run linear solver SLESView( ) - View solver options actually used at runtime (alternative: -sles_view ) SLESDestroy( ) - Destroy solver beginner solvers: linear

SLES: Review of Selected Preconditioner Options And many more options... solvers: linear: preconditioners beginner 1 intermediate 2 1 2

SLES: Review of Selected Krylov Method Options solvers: linear: Krylov methods beginner 1 intermediate 2 And many more options

SLES: Runtime Script Example solvers: linear intermediate

Viewing SLES Runtime Options solvers: linear intermediate

SLES: Example Programs ex1.c, ex1f.F - basic uniprocessor codes ex2.c, ex2f.F - basic parallel codes ex11.c - using complex numbers ex4.c - using different linear system and preconditioner matrices ex9.c - repeatedly solving different linear systems ex15.c - setting a user-defined preconditioner And many more examples... Location: solvers: linear 1 2 beginner 1 intermediate 2 advanced 3 3