Building Symbiotic Relationships Between Formal Verification and High Performance Computing Mike Kirby School of Computing and Scientific Computing and.

Building Symbiotic Relationships Between Formal Verification and High Performance Computing Mike Kirby School of Computing and Scientific Computing and Imaging Institute University of Utah Salt Lake City, UT, USA

Scientific Computing and Imaging Institute, University of Utah Faculty Ganesh Gopalakrishnan Mike Kirby Post-Docs and Students Dr. Igor Melatti (postdoc) Robert Palmer (PhD) Yu Yang (PhD) Salman Pervez (PhD) Steve Barrus (BS/MS) Sonjong Hwang (BS/MS) Jeffrey Sawaya (BS) Funding Acknowledgements: NSF (CSR–SMA: Toward Reliable and Efficient Message Passing Software Through Formal Analysis) Microsoft (Formal Analysis and Code Generation Support for MPI) Gauss Group

Scientific Computing and Imaging Institute, University of Utah Motivation Connection Between Formal Methods and HPC Three Applications Example 1: Modeling of the MPI Library Example 2: Verifying One-Sided MPI Constructs Example 3: Parallel Model Checking Outline

Scientific Computing and Imaging Institute, University of Utah Motivation $10k/week on Blue Gene (180 GFLOPS) at IBM’s Deep Computing Lab 136,800 GFLOPS Max

Scientific Computing and Imaging Institute, University of Utah Motivation 50% of development of parallel scientific codes spent in debugging [Vetter and deSupinski 2000] Programmers from a variety of backgrounds—often not computer science

Scientific Computing and Imaging Institute, University of Utah Needs of an HPC programmer Typical HPC program development cycle consists of: * Understand what is being simulated (the physics, biology, etc). * Develop a mathematical model of relevant "features" of interest * Generate a numerical discretization of the mathematical model * Solve numerical problem Usually begins as serial code Later the numerical problem – not the serial code – is parallelized * Often best to develop numerical model that’s amenable for parallelization * At every step, check consistency (e.g. conservation of energy) * Tune for load-balancing ; make code adaptive ; …

Scientific Computing and Imaging Institute, University of Utah Challenges in producing Dependable and Fast MPI / Threads programs Threads style : - Deal with Locks, Condition Variables, Re-entrancy, Thread Cancellation, … MPI : - Deal with the complexity of * Single-program Multiple Data (SPMD) programming * Performance optimizations to reduce communication costs * Deal with the complexity of MPI (MPI-1.has 130 calls ; MPI-2 has 180 ; various flavors of sends / receives) Threads and MPI are often used together MPI libraries are threaded

Scientific Computing and Imaging Institute, University of Utah Solved and Unsolved Problems in MPI/Thread programming Solved Problems : ( Avrunin and Siegel (MPI) as well as our group) - Modeling MPI library in Promela - Model-checking simple MPI programs Unsolved Problems: a rather long list, with some being: - Model-extraction - Handling Mixed-paradigm programs - Formal Methods to find / justify optimizations - Verifying Reactive aspects / Computational aspects

Scientific Computing and Imaging Institute, University of Utah Example 1: Modeling of the MPI Library

Scientific Computing and Imaging Institute, University of Utah Variety of bugs that are common in parallel scientific programs Deadlock Communication Race Conditions Misunderstanding the semantics of MPI procedures Resource related assumptions Incorrectly matched send/receives

Scientific Computing and Imaging Institute, University of Utah State of the art in Debugging TotalView –Parallel debugger – trace visualization Parallel DBX gdb MPICHECK –Does some deadlock checking –Uses trace analysis

Scientific Computing and Imaging Institute, University of Utah Related work Verification of wildcard free models [Siegel, Avrunin, 2005] –Deadlock free with length zero buffers ==> deadlock free with length > zero buffers SPIN models of MPI programs [Avrunin, Seigel, Seigel, 2005] and [Seigel, Mironova, Avrunin, Clarke, 2005] –Compare serial and parallel versions of numerical computations for numerical equivelnace

Scientific Computing and Imaging Institute, University of Utah The Big Picture Model Generator MC Server MC Client … #include int main(int argc, char** argv){ int myid; int numprocs; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myid); if(myid == 0){ int i; for(i = 1; i < numprocs; ++i){ MPI_Send(&i, 1, MPI_INT, i, 0, MPI_COMM_WORLD); } printf("%d Value: %d\n", myid, myid); } else { int val; MPI_Status s; MPI_Recv(&val, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &s); printf("%d Value: %d\n", myid, val); } MPI_Finalize(); return 0; } MPI Program int y; active proctype T1(){ int x; x = 1; if :: x = 0; :: x = 2; fi; y = x; } active proctype T2(){ int x; x = 2; if :: y = x + 1; :: y = 0; fi; assert( y == 0 ); } Program Model Compiler 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 00100100111010101101101001001001001100 MPI Binary Error Simulator Result Analyzer Refinement OK proctype MPI_Send(chan out, int c){ out!c; } proctype MPI_Bsend(chan out, int c){ out!c; } proctype MPI_Isend(chan out, int c){ out!c; } typedef MPI_Status{ int MPI_SOURCE; int MPI_TAG; int MPI_ERROR; } … MPI Library Model + Zing Abstractor Environment Model +

Scientific Computing and Imaging Institute, University of Utah Goal Verification / Transformation of MPI programs –“that is nice that you may be able to show my program does not deadlock but can you make it faster?” –Verification of safety properties… –Automatic optimization through “verifiably safe” transformation (Send with ISend/Wait, etc.)

Scientific Computing and Imaging Institute, University of Utah Example 2: Verifying One-Sided MPI Constructs

Scientific Computing and Imaging Institute, University of Utah Byte-Range locking using MPI one-sided communication One process makes its memory space available for communication Global state is stored in this memory space Each process is associated with a flag, start and end values stored in array A Pi’s flag value is in A[3 * i] for all i

Scientific Computing and Imaging Institute, University of Utah Lock Acquire lock_acquire (strat, end) { 1 val[0] = 1; /* flag */ val[1] = start; val[2] = end; 2 while(1) { 3 lock_win 4 place val in win 5 get values of other processes from win 6 unlock_win 7 for all i, if (Pi conflicts with my range) 8 conflict = 1; 9 if(conflict) { 10 val[0] = 0 11 lock_win 12 place val in win 13 unlock_win 14 MPI_Recv(ANY_SOURCE) 15 } 16 else 17 /* lock is acquire */ 18 break; 19 }

Scientific Computing and Imaging Institute, University of Utah Lock Release lock_release (strat, end) { val[0] = 0; /* flag */ val[1] = -1; val[2] = -1; lock_win place val in win get values of other processes from win unlock_win for all i, if (Pi conflicts with my range) MPI_Send(Pi); }

Scientific Computing and Imaging Institute, University of Utah Error Trace

Scientific Computing and Imaging Institute, University of Utah Error Discussion Problem: too many Send’s, not enough Recv’s Not really a problem –Messages are 0 bytes only –Send’s could be made non-blocking Maybe a problem –Even 0 byte messages cause unknown memory leaks, not desirable –More importantly, if there are unused Send’s in the system, processes that were supposed to be blocked may wake up by consuming these. This ties up processor resources and hurts performance!

Scientific Computing and Imaging Institute, University of Utah Example 3: Parallel Model Checking The Eddy-Murphi Model Checker

Scientific Computing and Imaging Institute, University of Utah Parallel Model Checking Each computation node “owns” a portion of the state space –Each node locally stores and analyzes its own states –Newly generated states which do not belong to the current node are sent to the owner node Standard distributed algorithm may be chosen for termination

Scientific Computing and Imaging Institute, University of Utah Eddy Algorithm For each node, two threads are used –Worker thread: analyzes, generates and partitions states If there are no states to be visited, it sleeps –Communication thread: repeatedly sends/receives states to/from the other nodes It also handles termination Communication between the two threads –Via shared memory –Via mutex signals primitives

Scientific Computing and Imaging Institute, University of Utah Worker ThreadCommunication Thread Hash Consumption Queue Communication Queue Take State Off Consumption Queue Expand State (get new set of states) Make decision about Set of states Receive and process inbound Messages Initiate Isends Check completion of Isends

Scientific Computing and Imaging Institute, University of Utah The Communication Queue There is one communication queue for each node Each communication queue has N lines and M states per line States additions are made (by the worker thread) only on one active line The other lines may be already full or empty

Scientific Computing and Imaging Institute, University of Utah The Communication Queue Summing up, this is the evolution of a line status: WTBAActiveWTBSCBS

Scientific Computing and Imaging Institute, University of Utah Eddy-Murphi Performance Tuning of the communication queuing mechanism –High number of states per line is required Much better sending many states at a time –Not too few number of lines Or the worker will not be able to submit new states

Scientific Computing and Imaging Institute, University of Utah Eddy-Murphi Performances Comparison with previous versions of parallel Murphi –When ported to MPI, old versions of parallel Murphi perform worse than serial Murphi Comparison with serial Murphi; almost linear speedup is expected

Scientific Computing and Imaging Institute, University of Utah Eddy-Murphi Performance

Scientific Computing and Imaging Institute, University of Utah Summary Complex systems can benefit from formal methods Rich interdisciplinary ground for FM and HPC to interact Win-Win scenarios exist

Scientific Computing and Imaging Institute, University of Utah

Possible Fix - Bidding Main Idea: when Pi releases a lock, it bids on the right to wake up Pj –If two processes want to wake up the same process, only one will be able to do so –Bidding array much like existing array, whoever writes first wins. Not Perfect! –Performance hit due to more synchronization for bidding array. –Still possible to have too many Send’s. –Bidding array needs to be reset. –Who resets it? Doesn’t really matter. It is always possible for a process to sneak in just at the right time and pretend it is the highest bidder! But not bad –Number of extra Send’s dramatically reduced. Better performance, lesser memory leaks.

Scientific Computing and Imaging Institute, University of Utah Better Solution - Picking Main Idea: The process about to be blocked picks who will wake it up and indicates so by writing to shared memory in lines 11 and 13 –The process that was just picked sees this information when it releases the lock and wakes up the blocked process –Suppose Pi sees Pj in the critical section, it chooses Pj to wake it up. But, Pj leaves before Pi can write the information to shared memory. –Solution: Pi will know Pj has left as it writes to shared memory. It can also read! So, instead of blocking, it must now choose Pk and retry. If it runs out of processes to choose from, it must retry.

Scientific Computing and Imaging Institute, University of Utah Discussion on picking Good news –This works! Our original problem is solved. No two processes can send messages to the same blocked process. Bad news –Well it doesn’t ‘quite’ work! –Problem: What if Pi chooses Pj, but Pj releases the lock before it finds out. Pi will now choose Pk, but before it can do so, Pj comes back into the critical section, sees its name in shared memory and assumes it has to wake Pi. Extra Send again! More good news –Number of extra sends ‘extremely’ low now. –We can fix this problem too. There are two possible fixes. We need to differentiate between locking cycles. Pi must not only pick Pj, but also the exact locking cycle Pj was in. Simply assign numbers whenever locking cycle is entered. So in the next locking cycle, Pj is not confused. Force each process to pick a different byte range each time. If you chose (m, n) in one cycle you cannot chose it in the next. This is reasonable. This gives a unique pair to each locking cycle, the byte range. So this solution works as far as we know! Although no formal proof has been done yet.

Scientific Computing and Imaging Institute, University of Utah High level view of MPI model (formalization) All process have a local context. MPI_COMM_WORLD is the set of all local contexts Communications Implemented as operations on the local context. MPI communicator does exchange of messages “invisibly”

Scientific Computing and Imaging Institute, University of Utah High level view of MPI model

Scientific Computing and Imaging Institute, University of Utah Pros & Cons Elegant – All operation as sets & operations on sets. Too large / complicated –Can’t model check it (this is really bad!) –All semantics must be explicitly stated as set operations –How do you implement the big “invisible” communicator actions? –How do you maintain a handle on a given message?

Scientific Computing and Imaging Institute, University of Utah Another view of MPI Communicators (MPI_COMM_WORLD) are shared objects that has two message slots serving as contexts. Messages have state. All communication actions are implemented as state transitions on the message slots.

Scientific Computing and Imaging Institute, University of Utah Another view of MPI

Scientific Computing and Imaging Institute, University of Utah Pros & Cons Much simpler – Hope to model check this – Some semantics become implicit Noninterference of point to point and collective communication Non overtaking of messages from a particular node

Scientific Computing and Imaging Institute, University of Utah Other projects Sonjong Hwang(BS/MS) –Translator from TLA+ to NuSMV –Test model checking approach to validate sequences of MPI calls Geof Sawaya(BS) –Model Checker built out of Visual Studio –Verisoft style explicit state model checker

Building Symbiotic Relationships Between Formal Verification and High Performance Computing Mike Kirby School of Computing and Scientific Computing and.

Similar presentations

Presentation on theme: "Building Symbiotic Relationships Between Formal Verification and High Performance Computing Mike Kirby School of Computing and Scientific Computing and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Building Symbiotic Relationships Between Formal Verification and High Performance Computing Mike Kirby School of Computing and Scientific Computing and.

Similar presentations

Presentation on theme: "Building Symbiotic Relationships Between Formal Verification and High Performance Computing Mike Kirby School of Computing and Scientific Computing and."— Presentation transcript:

Similar presentations

About project

Feedback