Introduction to PETSc VIGRE Seminar, Wednesday, November 8, 2006.

Slides:

Advertisements

Similar presentations

Introduction to C Programming

Advertisements

Write a program step by step. Step 1: Problem definition. Given the coordinate of two points in 2-D space, compute and print their straight distance.

Parallel Matrix Operations using MPI CPS 5401 Fall 2014 Shirley Moore, Instructor November 3,

Algorithms Today we will look at: what we mean by efficiency in programs why efficiency matters what causes programs to be inefficient? will one algorithm.

High Performance Computing

Copyright © 2008 Pearson Addison-Wesley. All rights reserved. Chapter 12 Separate Compilation Namespaces Simple Make Files (Ignore all class references.

Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.

Chapter 6 C Arrays Acknowledgment The notes are adapted from those provided by Deitel & Associates, Inc. and Pearson Education Inc. Arrays are data structures.

Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.

Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.

 2007 Pearson Education, Inc. All rights reserved C Arrays.

ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Upcrc.illinois.edu OpenMP Lab Introduction. Compiling for OpenMP Open project Properties dialog box Select OpenMP Support from C/C++ -> Language.

CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

Programming Languages -1 (Introduction to C) arrays Instructor: M.Fatih AMASYALI

Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.

chap13 Chapter 13 Programming in the Large.

Bigben Pittsburgh Supercomputing Center J. Ray Scott

1 Functions 1 Parameter, 1 Return-Value 1. The problem 2. Recall the layout 3. Create the definition 4. "Flow" of data 5. Testing 6. Projects 1 and 2.

By Noorez Kassam Welcome to JNI. Why use JNI ? 1. You already have significantly large and tricky code written in another language and you would rather.

Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,

Comp 248 Introduction to Programming Chapter 6 Arrays Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia University,

IT253: Computer Organization Lecture 3: Memory and Bit Operations Tonga Institute of Higher Education.

MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator Department of Computer Science Iowa State University.

CSE 332: C++ debugging in Eclipse C++ Debugging in Eclipse We’ve now covered several key program features –Variable declarations, expressions and statements.

Algorithm and Programming Array Dr. Ir. Riri Fitri Sari MM MSc International Class Electrical Engineering Dept University of Indonesia 15 March 2009.

UNIX Commands. Why UNIX Commands Are Noninteractive Command may take input from the output of another command (filters). May be scheduled to run at specific.

PETSc and Neuronal Networks Toby Isaac VIGRE Seminar, Wednesday, November 15, 2006.

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.

File IO and command line input CSE 2451 Rong Shi.

What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)

JAVA AND MATRIX COMPUTATION

CS 591 x I/O in MPI. MPI exists as many different implementations MPI implementations are based on MPI standards MPI standards are developed and maintained.

Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.

 2007 Pearson Education, Inc. All rights reserved C Arrays.

CS 471 Final Project 2d Advection/Wave Equation Using Fourier Methods December 10, 2003 Jose L. Rodriguez

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

DATA & COMPUTER SECURITY (CSNB414) MODULE 3 MODERN SYMMETRIC ENCRYPTION.

CPS120: Introduction to Computer Science Compiling a C++ Program From The Command Line.

Functions. Motivation What is a function? A function is a self-contained unit of program code designed to accomplish a particular task. We already used.

Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.

MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.

1 Running MPI on “Gridfarm” Bryan Carpenter February, 2005.

Announcements Assignment 1 due Wednesday at 11:59PM Quiz 1 on Thursday 1.

Processes and Threads MICROSOFT.  Process  Process Model  Process Creation  Process Termination  Process States  Implementation of Processes  Thread.

3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.

Hands-On Microsoft Windows Server 2008 Chapter 5 Configuring Windows Server 2008 Printing.

Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.

Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.

Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.

Distributed Monitoring with Nagios: Past, Present, Future Mike Guthrie

Advanced Computing Facility Introduction

Development Environment

User-Written Functions

OpenPBS – Distributed Workload Management System

MPI Message Passing Interface

Other Kinds of Arrays Chapter 11

Computer Systems and Networks

User-Defined Functions

MATLAB: Structures and File I/O

Introduction to cuBLAS

I/O in C Lecture 6 Winter Quarter Engineering H192 Winter 2005

P A R A L L E L C O M P U T I N G L A B O R A T O R Y

Quick Tutorial on MPICH for NIC-Cluster

Working in The IITJ HPC System

SPL – PS1 Introduction to C++.

Presentation transcript:

Introduction to PETSc VIGRE Seminar, Wednesday, November 8, 2006

Parallel Computing How (basically) does it work?

Parallel Computing How (basically) does it work? Assign each processor a number

Parallel Computing How (basically) does it work? Assign each processor a number The same program goes to all

Parallel Computing How (basically) does it work? Assign each processor a number The same program goes to all Each uses separate memory

Parallel Computing How (basically) does it work? Assign each processor a number The same program goes to all Each uses separate memory They pass information back and forth as necessary

Parallel Computing Example 1: Matrix-Vector Product

Parallel Computing Example 1: Matrix-Vector Product and are inputs into the program. 0: 1: 2:

Parallel Computing Example 1: Matrix-Vector Product The control node (0) reads in the matrix and distributes the rows amongst the processors. 0: (a, b, c) 1: (d, e, f) 2: (g, h, i)

Parallel Computing Example 1: Matrix-Vector Product The control node also sends the vector to each processor’s memory. 0: (a, b, c) ; (j, k, l) 1: (d, e, f) ; (j, k, l) 2: (g, h, i) ; (j, k, l)

Parallel Computing Example 1: Matrix-Vector Product Each processor computes its own dot product. 0: (a, b, c) (j, k, l) = aj+bk+cl 1: (d, e, f) (j, k, l) = dj+ek+fl 2: (g, h, i) (j, k, l) = gj+hk+il

Parallel Computing Example 1: Matrix-Vector Product The processors send their results to the control node, which outputs. 0: (a, b, c) (j, k, l) = aj+bk+cl 1: (d, e, f) (j, k, l) = dj+ek+fl 2: (g, h, i) (j, k, l) = gj+hk+il

Parallel Computing Example 2: Matrix-Vector Product Suppose for memory reasons each processor only has part of the vector. 0: (a, b, c) ; j 1: (d, e, f) ; k 2: (g, h, i) ; l

Parallel Computing Example 2: Matrix-Vector Product Before the multiply, each processor sends the necessary information elsewhere. 0: (a, b, c) ; j ; (k from 1) ; (l from 2) 1: (d, e, f) ; (j from 0) ; k ; (l from 2) 2: (g, h, i) ; (j from 0) ; (k from 1) ; l

Parallel Computing Example 2: Matrix-Vector Product After the multiply, the space is freed again for other uses. 0: (a, b, c) ; j 1: (d, e, f) ; k 2: (g, h, i) ; l

Parallel Computing Example 3: Matrix-Matrix Product The previous case illustrates how to multiply matrices stored across multiple processors. 0: (a, b, c) ; (j, k, l) 1: (d, e, f) ; (m, n, o) 2: (g, h, i) ; (p, q, r)

Parallel Computing Example 3: Matrix-Matrix Product Each column is distributed for processing in turn. 1) (a, b, c)(j, m, p)=α 0: 2) (a, b, c)(k, n, q)=β 3) (a, b, c)(l, o, r)=γ 1) (d, e, f)(j, m, p)=δ 1: 2) (d, e, f)(k, n, q)=ε 3) (d, e, f)(l, o, r)=ζ 1) (g, h,i)(j, m, p)=η 2: 2) (g, h, i)(k, n, q)=θ 3) (g, h, i)(l, o, r)=ι

Parallel Computing Example 3: Matrix-Matrix Product The result is a matrix with the same parallel row structure as the first matrix and column structure as the right. 0: (α, β, γ) 1: (δ, ε, ζ) 2: (η, θ, ι)

Parallel Computing Example 3: Matrix-Matrix Product The original indices could also have been sub-matrices, as long as they were compatible. 0: (α, β, γ) 1: (δ, ε, ζ) 2: (η, θ, ι)

Parallel Computing Example 4: Block Diagonal Product Suppose the second matrix is block diagonal. 0: (A, B, C) ; (J, 0, 0) 1: (D, E, F) ; (0, K, 0) 2: (G, H, I) ; (0, 0, L)

Parallel Computing Example 4: Block Diagonal Product Much less information needs to be passed between the processors. 1) AJ=α 0: 2) BK=β 3) CL=γ 1) DJ=δ 1: 2) EK=ε 3) FL=ζ 1) GJ=η 2: 2) HK=θ 3) IL=ι

Parallel Computing When is it worth it to parallelize?

Parallel Computing When is it worth it to parallelize? There is a time cost associated with passing messages

Parallel Computing When is it worth it to parallelize? There is a time cost associated with passing messages The amount of message passing is dependent on the problem and the program (algorithm)

Parallel Computing When is it worth it to parallelize? Therefore, the benefits depend more on the structure of the problem and the program than on the size/speed of the parallel network (diminishing returns).

Parallel Networks How do I use multiple processors?

Parallel Networks How do I use multiple processors? This depends on the network, but… Most networks use some variation of PBS, a job scheduler, and mpirun or mpiexec.

Parallel Networks How do I use multiple processors? This depends on the network, but… Most networks use some variation of PBS, a job scheduler, and mpirun or mpiexec. A parallel program needs to be submitted as a batch job.

Parallel Networks Suppose I have a program myprog, which gets data from data.dat, which I call in the following fashion when only using one processor:./myprog –f data.dat I would write a file myprog.pbs that looks like the following:

Parallel Networks #PBS –q compute (name of the processing queue [not necessary on all networks]) #PBS -N myprog (the name of the job) #PBS –l nodes=2:ppn=1,walltime=00:10:00 (number of nodes and number of processes per node, maximum time to allow the program to run) #PBS -o /home/me/mydir/myprog.out (where the output of the program should be written) #PBS -e /home/me/mydir/myprog.err (where the error stream should be written) These are the headers that tell the job scheduler how to handle your job.

Parallel Networks Although what follows depends on the MPI software that the network runs, it should look something like this: cd $PBS_O_WORKDIR (makes the processors run the program in the directory where myprog.pbs is saved) mpirun –machinefile $PBS_NODEFILE –np 2 myprog –f mydata.dat (tells the MPI software which processes to use and how many processes to start: notice that command line arguments follows as usual)

Parallel Networks Once the.pbs file is written, it can be submitted to the job scheduler with qsub: qsub myprog.pbs

Parallel Networks Once the.pbs file is written, it can be submitted to the job scheduler with qsub: qsub myprog.pbs You can check to see if your job is running with the command qstat.

Parallel Networks Some systems (but not all) will allow you to simulate running your program in parallel on one processor, which is useful for debugging: mpirun –np 3 myprog –f mydata.dat

Parallel Networks What parallel systems are available?

Parallel Networks What parallel systems are available? RTC : Rice Terascale Cluster: 244 processors.

Parallel Networks What parallel systems are available? RTC : Rice Terascale Cluster: 244 processors. ADA : Cray XD1: 632 processors.

Parallel Networks What parallel systems are available? RTC : Rice Terascale Cluster: 244 processors. ADA : Cray XD1: 632 processors. caamster: CAAM department exclusive: 8(?) processors.

PETSc What do I use PETSc for?

PETSc What do I use PETSc for? File I/O with “minimal” understanding of MPI

PETSc What do I use PETSc for? File I/O with “minimal” understanding of MPI Vector and matrix based data management (in particular: sparse)

PETSc What do I use PETSc for? File I/O with “minimal” understanding of MPI Vector and matrix based data management (in particular: sparse) Linear algebra routines familiar from the famous serial packages

PETSc At the moment, ada and caamster (and harvey) have PETSc installed

PETSc At the moment, ada and caamster (and harvey) have PETSc installed You can download and install PETSc on your own machine (requires cygwin for Windows), for educational and debugging purposes

PETSc PETSc builds on existing software BLAS and LAPACK: which implementations to use can be specified at configuration

PETSc PETSc builds on existing software BLAS and LAPACK: which implementations to use can be specified at configuration Has (slower) debugging configuration and (faster, tacit) optimized configuration

PETSc Installation comes with documentation, examples, and manual pages.

PETSc Installation comes with documentation, examples, and manual pages. The biggest part of learning how to use PETSc is learning how to use the manual pages.

PETSc It is extremely useful to have an environmental variable PETSC_DIR in you shell of choice, which gives the path to the installation of PETSc, e.g. PETSC_DIR=/usr/local/src/petsc p13/ export PETSC_DIR

PETSc Makefile

PETSc Makefile You can pretty much copy/paste/modify the makefiles in the examples, but here is the basic setup:

PETSc Makefile (…) (Other definitions for CFLAGS, etc.) LOCDIR = ~/mydir include ${PETSC_DIR}/bmake/common/base (This is why it is useful to have this variable saved) myprog: myprog.o chkopts -${CLINKER} -o myprog myprog.o ${PETSC_LIB} ${RM} myprog.o

PETSc Headers

PETSc Headers #include “petsc.h” in all files, unless the routines that you use need more specific headers.

PETSc Headers #include “petsc.h” in all files, unless the routines that you use need more specific headers. How do you know? Consult the manual pages!

PETSc Data Types

PETSc Data Types PETSc has a slew of its own data types: PetscInt, PetscReal, PetscScalar, etc.

PETSc Data Types PETSc has a slew of its own data types: PetscInt, PetscReal, PetscScalar, etc. Usually aliases of normal data types: PetscInt ~ int, PetscReal ~ double

PETSc Data Types PETSc has a slew of its own data types: PetscInt, PetscReal, PetscScalar, etc. Usually aliases of normal data types: PetscInt ~ int, PetscReal ~ double Safer to use for compatibility

PETSc Usage in C/C++

PETSc Usage in C/C++ The top program should begin: Static char[] help=“Your message here.” int main(int argc,char **argv){ (… declarations) PetscInitialize(&argc,&argv,PETSC_NULL,help)

PETSc Usage in C/C++ The top program should end: (…) PetscFinalize(); return(0); }

PETSc Usage in C/C++ When first programming, include the following variable: PetscErrorCode ierr; Where you’d call a PETSc routine, Routine(arg); write instead ierr=Routing(arg);CHKERRQ(ierr);

PETSc Usage in C/C++ When you try to run your program, you will be informed of any problems with incompatible data types/dimensions/etc.

PETSc Data Anything data type larger than a scalar has a Create and a Destroy routine.

PETSc Data Anything data type larger than a scalar has a Create and a Destroy routine. If you run./myprog –log_summary, you get # created and # destroyed for each data type, to find memory leaks.

PETSc Example: Vec

PETSc Example: Vec Two types: global and local

PETSc Example: Vec Two types: global and local Dependent on function: do other processors need to see this data?

PETSc Example: Vec Two types: global and local Dependent on function: do other processors need to see this data? Basic usage: Vec X; VecCreate( [ PETSC_COMM_WORLD / PETSC_COMM_SELF ], &X);

PETSc Example: Vec Advanced usage:

PETSc Example: Vec Advanced usage: VecCreateSeq(PETSC_COMM_SELF,n,&X);

PETSc Example: Vec Advanced usage: VecCreateSeq(PETSC_COMM_SELF,n,&X); VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X);

PETSc Example: Vec Advanced usage: VecCreateSeq(PETSC_COMM_SELF,n,&X); VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X); VecLoad(instream,VECSEQ,&X);

PETSc Example: Vec Advanced usage: VecCreateSeq(PETSC_COMM_SELF,n,&X); VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X); VecLoad(instream,VECSEQ,&X); VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,&X);

PETSc Example: Vec Advanced usage: VecCreateSeq(PETSC_COMM_SELF,n,&X); VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X); VecLoad(instream,VECSEQ,&X); VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,&X); VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,&X);

PETSc Example: Vec Advanced usage: VecCreateSeq(PETSC_COMM_SELF,n,&X); VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X); VecLoad(instream,VECSEQ,&X); VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,&X); VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,&X); VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,&X);

PETSc Example: Vec Advanced usage: VecCreateSeq(PETSC_COMM_SELF,n,&X); VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X); VecLoad(instream,VECSEQ,&X); VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,&X); VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,&X); VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,&X); VecLoad(instream,VECMPI,&X);

PETSc Example: Vec Advanced usage: VecCreateSeq(PETSC_COMM_SELF,n,&X); VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X); VecLoad(instream,VECSEQ,&X); VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,&X); VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,&X); VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,&X); VecLoad(instream,VECMPI,&X); VecDuplicate(Y,&X);

PETSc Example: Vec Advanced usage: VecCreateSeq(PETSC_COMM_SELF,n,&X); VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X); VecLoad(instream,VECSEQ,&X); VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,&X); VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,&X); VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,&X); VecLoad(instream,VECMPI,&X); VecDuplicate(Y,&X); MatGetVecs(M,&X,PETSC_NULL);

PETSc Example: Vec Advanced usage: VecCreateSeq(PETSC_COMM_SELF,n,&X); VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X); VecLoad(instream,VECSEQ,&X); VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,&X); VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,&X); VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,&X); VecLoad(instream,VECMPI,&X); VecDuplicate(Y,&X); MatGetVecs(M,&X,PETSC_NULL); MatGetVecs(M,PETSC_NULL,&X);

PETSc Example: Vec If not created with array or loaded from file, values still needed:

PETSc Example: Vec If not created with array or loaded from file, values still needed To copy the values of another Vec, with the same parallel structure, use VecCopy(Y,X).

PETSc Example: Vec If not created with array or loaded from file, values still needed To copy the values of another Vec, with the same parallel structure, use VecCopy(Y,X). To set all values to a single scalar value, use VecSet(X,alpha).

PETSc Example: Vec There are routines for more complicated ways to set values

PETSc Example: Vec There are other routines for more complicated ways to set values PETSc guards the block of data where the actual values are stored very closely

PETSc Example: Vec There are other routines for more complicated ways to set values PETSc guards the block of data where the actual values are stored very closely An assembly routine must be called after these other routines

PETSc Example: Vec Other routines:

PETSc Example: Vec Other routines: VecSetValue

PETSc Example: Vec Other routines: VecSetValue VecSetValueLocal (different indexing used)

PETSc Example: Vec Other routines: VecSetValue VecSetValueLocal (different indexing used) VecSetValues

PETSc Example: Vec Other routines: VecSetValue VecSetValueLocal (different indexing used) VecSetValues VecSetValuesLocal

PETSc Example: Vec Other routines: VecSetValue VecSetValueLocal (different indexing used) VecSetValues VecSetValuesLocal VecSetValuesBlocked

PETSc Example: Vec Other routines: VecSetValue VecSetValueLocal (different indexing used) VecSetValues VecSetValuesLocal VecSetValuesBlocked VecSetValuesBlockedLocal

PETSc Example: Vec Once a vector is assembled, there are routines for (almost) every function we could want from a vector: AXPY, dot product, absolute value, pointwise multiplication, etc.

PETSc Example: Vec Once a vector is assembled, there are routines for (almost) every function we could want from a vector: AXPY, dot product, absolute value, pointwise multiplication, etc. Call VecDestroy(X) to free its array when it isn’t needed anymore.

PETSc Example: Mat

PETSc Example: Mat Like Vec, a Mat can be global or local (MPI/Seq)

PETSc Example: Mat Like Vec, a Mat can be global or local (MPI/Seq) A Mat can take on a large number of data structures to optimize * and \, even though the same routine is used on all structures.

PETSc Example: Mat Row compressed Block row compressed Symmetric block row compressed Block diagonal And even dense

PETSc File I/O

PETSc File I/O The equivalent to a stream is a viewer.

PETSc File I/O PETSc has equivalent routines to printf, but you must decide if you want every node to print or just the control node

PETSc File I/O PETSc has equivalent routines to printf, but you must decide if you want every node to print or just the control node To ensure clarity when multiple nodes print, use PetscSynchronizedPrintf followed by PetscSynchronizedFlush.

PETSc File I/O The equivalent to a stream is a “viewer”, but a viewer organizes data across multiple processors.

PETSc File I/O The equivalent to a stream is a “viewer”, but a viewer organizes data across multiple processors. A viewer combines an output location (file/stdout/stderr), with a format.

PETSc File I/O The equivalent to a stream is a “viewer”, but a viewer organizes data across multiple processors. A viewer combines an output location (file/stdout/stderr), with a format. Most data types have a View routine such as MatView(M,viewer)

PETSc File I/O On a batch server, ASCII I/O can be horrendously slow.

PETSc File I/O On a batch server, ASCII I/O can be horrendously slow. PETSc only reads into a parallel format data which is stored in binary form.

PETSc File I/O On a batch server, ASCII I/O can be horrendously slow. PETSc only reads into a parallel format data which is stored in binary form. Lots of output data is likely: binary is more compressed than ASCII.

PETSc I have ASCII input data: solution?

PETSc I have ASCII input data: solution? Write a wrapper program Runs on one processor Creates the data to be used in parallel, and “views” it to a binary input file In parallel, it will be automatically distributed

PETSc Next Time, Issues for Large Dynamical Systems: Time Stepping Updating algebraically Managing lots of similar equations (Scattering/Gathering)