Gauss: A Framework for Verifying Scientific Software Robert Palmer Steve Barrus, Yu Yang, Ganesh Gopalakrishnan, Robert M. Kirby University of Utah Supported.

Gauss: A Framework for Verifying Scientific Software Robert Palmer Steve Barrus, Yu Yang, Ganesh Gopalakrishnan, Robert M. Kirby University of Utah Supported in part by NSF Award ITR-0219805

Motivations “One of the simulations will run for 30 days. A Cray supercomputer built in 1995 would take 60,000 years to perform the same calculations.” 12,300 GFLOPS Need permission/grant to use it.

Motivations $10k/week on Blue Gene (180 GFLOPS) at IBM’s Deep Computing Lab 136,800 GFLOPS Max

Motivations 50% of development of parallel scientific codes spent in debugging [Vetter and deSupinski 2000] Programmers from a variety of backgrounds—often not computer science

Overview What Scientific programs look like What challenges are faced by scientific code developers How formal methods can help The Utah Gauss project

SPMD Programs Single Program Multiple Data –Same image runs on each node in the grid –Processes do different things based on rank –Possible to impose a virtual topology within the program

MPI Library for communication MPI is to HPC what PThreads to systems or OpenGL is to Graphics More than 60% of HPC applications use MPI Libraries in some form There are proprietary and open source implementations Provides both communication primitives and virtual topologies in MPI-1

Concurrency Primitives Point to point communications that –Don’t specify system buffering (but might have it in some implementations) and Block Don’t’ block –Use user program provided buffering (with possibly hard or soft limitations) and Block Don’t block Collective communications that –“can (but are not required to) return as soon as their participation in the collective communication is complete.” [MPI-1.1 Standard pg 93, lines 10-11]

MPI Tutorial #include #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }

MPI Tutorial #include #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); } P0P1P2P3

Why is parallel scientific programming hard? Portability Scaling Performance

Variety of bugs that are common in parallel scientific programs Deadlock Race Conditions Misunderstanding the semantics of MPI procedures Resource related assumptions Incorrectly matched send/receives

State of the art in Debugging TotalView –Parallel debugger – trace visualization Parallel DBX gdb MPICHECK –Does some deadlock checking –Uses trace analysis

Related work Verification of wildcard free models [Siegel, Avrunin, 2005] –Deadlock free with length zero buffers ==> deadlock free with length > zero buffers. SPIN models of MPI programs [Avrunin, Seigel, Seigel, 2005] and [Seigel, Mironova, Avrunin, Clarke, 2005] –Compare serial and parallel versions of numerical computations for numerical equivelnace.

Automatic Formal Analysis Can prove it correct by hand in a theorem prover Don’t want to spend time making models Approach should be completely automatic (Intended for use by the scientific community at large)

The Big Picture Automatic model extraction Improved static analysis –Model checking Better partial-order reduction Parallel state-space enumeration Symmetry Abstraction Refinement Integration with existing tools –Visual Studio –TotalView

The Big Picture Model Generator MC Server MC Client … #include int main(int argc, char** argv){ int myid; int numprocs; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myid); if(myid == 0){ int i; for(i = 1; i < numprocs; ++i){ MPI_Send(&i, 1, MPI_INT, i, 0, MPI_COMM_WORLD); } printf("%d Value: %d\n", myid, myid); } else { int val; MPI_Status s; MPI_Recv(&val, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &s); printf("%d Value: %d\n", myid, val); } MPI_Finalize(); return 0; } MPI Program int y; active proctype T1(){ int x; x = 1; if :: x = 0; :: x = 2; fi; y = x; } active proctype T2(){ int x; x = 2; if :: y = x + 1; :: y = 0; fi; assert( y == 0 ); } Program Model Compiler 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 00100100111010101101101001001001001100 MPI Binary Error Simulator Result Analyzer Refinement OK proctype MPI_Send(chan out, int c){ out!c; } proctype MPI_Bsend(chan out, int c){ out!c; } proctype MPI_Isend(chan out, int c){ out!c; } typedef MPI_Status{ int MPI_SOURCE; int MPI_TAG; int MPI_ERROR; } … MPI Library Model + Zing Abstractor Environment Model +

Environment modeling C –Very prevalent among HPC developers Want to analyze code as it is written for performance Zing has it all. –Numeric types –Pointers, Arrays, Casting, Recursion, … –Missing one thing only: “&” (not bitwise and) We provide a layer that makes this possible –Can also track pointer arithmetic and unsafe casts Also provide a variety of stubs for system calls

Environment Example class pointer{ object reference; static object addressof(pointer p){ pointer ret; ret = new pointer; ret.reference = p; return ret; } … } Data encapsulated in Zing objects makes it possible to handle additional Cisms

MPI Library MPI Library modeled carefully by hand from the MPI Specification Preliminary shared memory based implementation –Send, Recv, Barrier, BCast, Init, Rank, Size, Finalize.

Library Example integer MPI_Send(pointer buf, integer count, integer datatype, integer dest, integer tag, integer c){ … comm = getComm(c); atomic { ret = new integer; msg1 = comm.create(buf, count, datatype, _mpi_rank, dest, tag, true); msg2 = comm.find_match(msg1); if (msg1 != msg2) { comm.copy(msg1, msg2); comm.remove(msg2); msg1.pending = false; } else { comm.add(msg1); } ret.value = 0; } select{ wait(!msg1.pending) -> ; }

Model Extraction Map C onto Zing (using CIL) –First through the cpp –Processes to Zing Threads –File to Zing Class Structs and Unions also extracted to Classes –Integral data types to environment layer All numeric types Pointer Class

Extraction Example __cil_tmp45 = integer.addressof(recvdata1); __cil_tmp46 = integer.create(1); __cil_tmp47 = integer.create(6); __cil_tmp48 = integer.create(1); __cil_tmp49 = integer.add(mynode, __cil_tmp48); __cil_tmp50 = integer.mod(__cil_tmp49, totalnodes); __cil_tmp51 = integer.create(1); __cil_tmp52 = integer.create(91); __cil_tmp53 = __anonstruct_MPI_Status_1.addressof(status); MPI_Recv(__cil_tmp45, __cil_tmp46, __cil_tmp47, __cil_tmp50, __cil_tmp51, __cil_tmp52,__cil_tmp53); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status);

Experimental Results Correct example –2 processes: 12,882 states –4 processes: Does not complete Deadlock example –24 processes: 2,522 states

Possible Improvements Atomic regions Constant reuse More formal extraction semantics

Looking ahead (in the coming year) Full MPI-1.1 Library Model in Zing –All point to point and collective communication primitives –Virtual Topologies ANSI C Capable Model Extractor –Dependencies: CIL, GCC, CYGWIN Preliminary Tool Integration –Error visualization and simulation Text book and validation suite examples

Looking ahead (beyond) Better Static Analysis through –Partial-order reduction MPI library model is intended to leverage transaction based reduction Can improve by determining transaction independence

Looking ahead (beyond) Better Static Analysis through –Abstraction Refinement Control flow determined mostly by rank Nondeterministic over-approximation

Looking ahead (beyond) Better Static Analysis through –Distributed computing Grid based Client server based

Looking ahead (beyond) More library support –MPI-2 Library One sided communication –PThread Library Mixed MPI/PThread concurrency More languages –FORTRAN –C++ Additional static analysis techniques

Can we get more performance? Can we phrase a performance bug as a safety property? –There does not exist a communication chain longer than N Is there a way to leverage formal methods to reduce synchronizations? Can formal methods help determine the right balance between MPI and PThreads for concurrency?

Questions?

Gauss: A Framework for Verifying Scientific Software Robert Palmer Steve Barrus, Yu Yang, Ganesh Gopalakrishnan, Robert M. Kirby University of Utah Supported.

Similar presentations

Presentation on theme: "Gauss: A Framework for Verifying Scientific Software Robert Palmer Steve Barrus, Yu Yang, Ganesh Gopalakrishnan, Robert M. Kirby University of Utah Supported."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Gauss: A Framework for Verifying Scientific Software Robert Palmer Steve Barrus, Yu Yang, Ganesh Gopalakrishnan, Robert M. Kirby University of Utah Supported.

Similar presentations

Presentation on theme: "Gauss: A Framework for Verifying Scientific Software Robert Palmer Steve Barrus, Yu Yang, Ganesh Gopalakrishnan, Robert M. Kirby University of Utah Supported."— Presentation transcript:

Similar presentations

About project

Feedback