Message Passing: Formalization, Dynamic Verification Ganesh Gopalakrishnan School of Computing, University of Utah, Salt Lake City, UT 84112, USA based.

Slides:



Advertisements
Similar presentations
Operating Systems Components of OS
Advertisements

MicroKernel Pattern Presented by Sahibzada Sami ud din Kashif Khurshid.
Construction process lasts until coding and testing is completed consists of design and implementation reasons for this phase –analysis model is not sufficiently.
Demo of ISP Eclipse GUI Command-line Options Set-up Audience with LiveDVD About 30 minutes – by Ganesh 1.
Concurrency Issues Motivation, Problems, Directions Dennis Kafura - CS Operating Systems1.
MPI Message Passing Interface
Executional Architecture
Μπ A Scalable & Transparent System for Simulating MPI Programs Kalyan S. Perumalla, Ph.D. Senior R&D Manager Oak Ridge National Laboratory Adjunct Professor.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Illustration of ISP for each bug-class MPI Happens-before: how MPI supports Out-of-order execution The POE algorithm of ISP explained using MPI happens-before.
Module 7: Advanced Development  GEM only slides here  Started on page 38 in SC09 version Module 77-0.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
EFFICIENT DYNAMIC VERIFICATION ALGORITHMS FOR MPI APPLICATIONS Dissertation Defense Sarvani Vakkalanka Committee: Prof. Ganesh Gopalakrishnan (advisor),
Practical Formal Verification of MPI and Thread Programs Sarvani Vakkalanka Anh Vo* Michael DeLisi Sriram Aananthakrishnan Alan Humphrey Christopher Derrick.
Evis Trandafili Polytechnic University of Tirana Albania Functional Programming Languages 1.
Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.
Java.  Java is an object-oriented programming language.  Java is important to us because Android programming uses Java.  However, Java is much more.
Scheduling Considerations for building Dynamic Verification Tools for MPI Sarvani Vakkalanka, Michael DeLisi Ganesh Gopalakrishnan, Robert M. Kirby School.
1 An Approach to Formalization and Analysis of Message Passing Libraries Robert Palmer Intel Validation Research Labs, Hillsboro, OR (work done at the.
Fundamental Design Issues for Parallel Architecture Todd C. Mowry CS 495 January 22, 2002.
1 Distributed Dynamic Partial Order Reduction based Verification of Threaded Software Yu Yang (PhD student; summer intern at CBL) Xiaofang Chen (PhD student;
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
The Problem  Rigorous descriptions for widely used APIs essential  Informal documents / Experiments not a substitute Goals / Benefits  Define MPI rigorously.
CS533 - Concepts of Operating Systems
The shift from sequential to parallel and distributed computing is of fundamental importance for the advancement of computing practices. Unfortunately,
Contemporary Languages in Parallel Computing Raymond Hummel.
1 TRAPEZOIDAL RULE IN MPI Copyright © 2010, Elsevier Inc. All rights Reserved.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
1 Testing Concurrent Programs Why Test?  Eliminate bugs?  Software Engineering vs Computer Science perspectives What properties are we testing for? 
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Building Tools by Model Transformations in Eclipse Oskars Vilitis, Audris Kalnins, Edgars Celms, Elina Kalnina, Agris Sostaks, Janis Barzdins Institute.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
Advanced / Other Programming Models Sathish Vadhiyar.
Dynamic Analysis of Multithreaded Java Programs Dr. Abhik Roychoudhury National University of Singapore.
1 Causal-Consistent Reversible Debugging Ivan Lanese Focus research group Computer Science and Engineering Department University of Bologna/INRIA Bologna,
CE Operating Systems Lecture 3 Overview of OS functions and structure.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
1 Qualitative Reasoning of Distributed Object Design Nima Kaveh & Wolfgang Emmerich Software Systems Engineering Dept. Computer Science University College.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
The shift from sequential to parallel and distributed computing is of fundamental importance for the advancement of computing practices. Unfortunately,
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Programmability Hiroshi Nakashima Thomas Sterling.
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Winter 2007SEG2101 Chapter 121 Chapter 12 Verification and Validation.
Gauss Students’ Views on Multicore Processors Group members: Yu Yang (presenter), Xiaofang Chen, Subodh Sharma, Sarvani Vakkalanka, Anh Vo, Michael DeLisi,
Agenda  Quick Review  Finish Introduction  Java Threads.
Testing Concurrent Programs Sri Teja Basava Arpit Sud CSCI 5535: Fundamentals of Programming Languages University of Colorado at Boulder Spring 2010.
Lecture 1 Page 1 CS 111 Summer 2013 Important OS Properties For real operating systems built and used by real people Differs depending on who you are talking.
Pitfalls: Time Dependent Behaviors CS433 Spring 2001 Laxmikant Kale.
Distributed Shared Memory
Mobile Application Test Case Automation
Processes and Threads Processes and their scheduling
Memory Consistency Models
Memory Consistency Models
Harry Xu University of California, Irvine & Microsoft Research
Chapter 4: Threads.
Human Complexity of Software
Memory Consistency Models
Chapter 4: Threads & Concurrency
Chapter 4: Threads.
Why Threads Are A Bad Idea (for most purposes)
Foundations and Definitions
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
Presentation transcript:

Message Passing: Formalization, Dynamic Verification Ganesh Gopalakrishnan School of Computing, University of Utah, Salt Lake City, UT 84112, USA based on research done by students Sarvani Vakkalanka, Anh Vo, Michael DeLisi, Alan Humphrey, Chris Derrick, Sriram Aananthakrishnan, and faculty colleague Mike Kirby / formal_verification Supported by NSF CNS and Microsoft 1

Correctness Concerns Will Loom Everywhere… Debug Concurrent Systems, providing rigorous guarantees 2

Need for help / rigor noted by notable practitioners “Sequential programming is really hard, and parallel programming is a step beyond that” Tanenbaum, USENIX 2008 Lifetime Achievement Award talk “Formal methods provide the only truly scalable approach to developing correct code in this complex programming environment.” Rusty Lusk, in his EC Invited Talk entitled “Slouching Towards Exascale: Programming Models for High Performance Computing”

Must Cover BOTH Types of Concurrency Shared Memory Enjoys the most attention (esp. from the CS FV community) Message Passing Formal aspects of message passing are represented by CCS, CSP, … Many practical message passing libraries exist, but without a rigorous semantics that characterizes their stand-alone behavior and/or their semantics in the context of a standard programming language (e.g. how compiler optimizations work in their presence) The time is now ripe to make progress with respect to a few important message passing libraries (e.g., MPI, MCAPI, …)

Importance of Formalizing High-performance Message Passing Behavior Fundamental to dealing with the Message Passing Interface (MPI) API MPI is VERY widely used Enables reasoning about the reactive behavior of API calls Out of order issue and completion – easily explained thru Happens-before (HB) This HB took us a long time to discover; but it is surprisingly easy to explain! Made up of MATCHES-BEFORE and COMPLETES-BEFORE Happens-before depends on available run-time resources Can help characterize compiler optimizations formally Handle new correctness-critical message-passing libraries Multi-core Communications API or MCAPI for embedded systems use (e.g. Cell- phones etc) – can be understood using VERY SIMILAR formalism Understanding / pedagogy of message-passing program behavior No need to dismiss this area as “too hairy” Enables building formal dynamic verification tools Find bugs, reveal lurking “unexpected behaviors”, …

In general, we must get better at verifying concurrent programs written against a growing number of real APIs Code written using mature libraries (MPI, OpenMP, PThreads, …) Code written using mature libraries (MPI, OpenMP, PThreads, …) API calls made from real programming languages (C, Fortran, C++) API calls made from real programming languages (C, Fortran, C++) Runtime semantics determined by realistic Compilers and Runtimes Model building and Model maintenance have HUGE costs (I would assert: “impossible in practice”) and does not ensure confidence !! 6

7 SiCortex 5832 processor System (Courtesy SiCortex) IBM Blue Gene (Picture Courtesy IBM) LANL’s Petascale machine “Roadrunner” (AMD Opteron CPUs and IBM PowerX Cell) Importance of MPI Program Analysis / Debugging Almost the default choice for large-scale parallel simulations Huge support base Very mature codes exist in MPI – cannot easily be re-implemented Performs critical simulations in Science and Engineering – Weather / Earthquake Prediction, Computational Chemistry,…Parallel Model Checking,..

Two Classes of MPI Programs Mostly Computational these are sequential programs “pulled apart” one can see higher order functions (map, …) While optimizing these programs, reactive behavior creeps in non-blocking sends overlapped with computation probing for computations finishing and initiating new work early Highly Reactive User level libraries written in MPI e.g. Adaptive Dynamic Load Balancing libraries Bottom-line : must employ suitable dynamic verification methods for MPI

Our Work We have a formal model for MPI This formal model explains succinctly the space of all standard-compliant executions of MPI What must a standard-compliant MPI library together with the support infrastructure (runtime, compilers, …) finally amount to?

Practical Contribution of Our Work We have built the only push-button dynamic analysis tool for MPI / C programs called ISP Work on MPI / Fortran in progress Runs on MAC OS/X, Windows, Linux Tested against five state-of-the-art MPI libraries MPICH2, OpenMPI, MSMPI, MVAPICH, IBM MPI (in progress) Visual-Studio and Eclipse Parallel Tools Platform integration 100s of large case studies Efficiency is decent (getting better) 15K LOC Parmetis Hypergraph Partitioner analyzed for deadlocks, resource leaks, assertion violations for a given test harness in < 5 seconds for 2 MPI processes on a laptop Being downloaded by many Contribution to the Eclipse Consortium underway ISP can dynamically execute and reveal the space of all standard-compliant executions of MPI even when running on an arbitrary (standard-compliant) platform ISP’s internal scheduling decisions are taken in a fairly general way

One-page Ad on ISP 11 (BlueGene/L - Image courtesy of IBM / LLNL) (Image courtesy of Steve Parker, U of Utah) Verifies MPI User Applications, generating only the Relevant Process Interleavings Detects all Deadlocks, Assert Violations, MPI object leaks, and Default Safety Properties Works by Instrumenting MPI Calls Computing Relevant Interleavings, Replaying

This talk Explains the core of MPI using four letters S, R, B, W S starts a DMA send transfer, R starts a DMA receive transfer, W waits for the transfer to finish, B arranges for efficient global synchronization. [Hunch] Any attempt to create efficient message passing will result in a similar set of primitives We can now explain one-liner MPI programs that can confound even experts! This explanation is what ISP’s algorithm also uses

Summary of Some MPI Commands MPI_Isend(destination, msg_buf, request_structure, other args) This is a non-blocking call It initiates copying of msg_buf into MPI runtime so that a matching MPI Receive invoked from process destination will receive the contents of msg_buf MPI_Wait(… request_structure…) typically follows MPI_Isend When this BLOCKING call returns, the copying is finished 13

Summary of Some MPI Commands MPI_Isend(destination, msg_buf, request_structure, others) We will abbreviate this call as – Isend(destination, request_structure) – Example: Isend(2, req).. And finally as S(2) or S(to:2) or S(to:2, req) 14

Summary of Some MPI Commands MPI_Irecv(source, msg_bug, request_structure, other args) This is a non-blocking call It initiates receipt into msg_buf from the MPI runtime so that a matching MPI Send invoked from process source can provide the contents of msg_buf MPI_Wait(… request_structure…) typically follows MPI_Irecv When this BLOCKING call returns, the receipt is finished Wait is abbreviated W(req) or W or … 15

Summary of Some MPI Commands MPI_Irecv(source, msg_bug, request_structure, other args) Abbreviated as Irecv(source, req) Example : Irecv(3, req) OR EVEN Irecv(*, req) – in case any available source would do.. Finall as R(from:3, req), R(from:3), R(3), … 16

More MPI Commands – MPI_Barrier(…) is abbreviated as Barrier() or even Barrier – All processes must invoke Barrier before any process can return from the Barrier invocation – Useful high-performance global sync. operation –.. Abbreviated as B 17

Simple MPI Program : ‘lucky.c’ Process P0 R(from:*, r1) ; R(from:2, r2); S(to:2, r3); R(from:*, r4); All the Ws… Process P1 Sleep(3); S(to:0, r1); All the Ws… Process P2 //Sleep(3); S(to:0, r1); R(from:0, r2); S(to:0, r3); All the Ws… 18

Simple MPI Program : ‘lucky.c’ Process P0 R(from:*, r1) ; R(from:2, r2); S(to:2, r3); R(from:*, r4); All the Ws… Process P1 Sleep(3); S(to:0, r1); All the Ws… Process P2 //Sleep(3); S(to:0, r1); R(from:0, r2); S(to:0, r3); All the Ws… 19

Simple MPI Program : ‘lucky.c’ Process P0 R(from:*, r1) ; R(from:2, r2); S(to:2, r3); R(from:*, r4); All the Ws… Process P1 Sleep(3); S(to:0, r1); All the Ws… Process P2 //Sleep(3); S(to:0, r1); R(from:0, r2); S(to:0, r3); All the Ws… 20 deadlock

Simple MPI Program : ‘unlucky.c’ Process P0 R(from:*, r1) ; R(from:2, r2); S(to:2, r3); R(from:*, r4); All the Ws… Process P1 // Sleep(3); S(to:0, r1); All the Ws… Process P2 Sleep(3); S(to:0, r1); R(from:0, r2); S(to:0, r3); All the Ws… 21

Simple MPI Program : ‘unlucky.c’ Process P0 R(from:*, r1) ; R(from:2, r2); S(to:2, r3); R(from:*, r4); All the Ws… Process P1 // Sleep(3); S(to:0, r1); All the Ws… Process P2 Sleep(3); S(to:0, r1); R(from:0, r2); S(to:0, r3); All the Ws… 22 No deadlock

Runs of lucky.c and unlucky.c on mpich using “standard testing” (“lucky” for tester) 23 mpicc lucky.c -o lucky.out mpirun -np 3./lucky.out (0) is alive on ganesh-desktop (1) is alive on ganesh-desktop (2) is alive on ganesh-desktop Rank 0 did Irecv Rank 2 did Send Sleep over Rank 1 did Send [.. hang..] mpicc unlucky.c -o unlucky.out mpirun -np 3./unlucky.out (0) is alive on ganesh-desktop (2) is alive on ganesh-desktop (1) is alive on ganesh-desktop Rank 0 did Irecv Rank 1 did Send Rank 0 got 11 Sleep over Rank 2 did Send (2) Finished normally (1) Finished normally (0) Finished normally [.. OK..]

Runs of lucky.c and unlucky.c using ISP ISP will find the deadlock in both cases, unaffected by the “sleep”s The tailor-made DPOR that ISP uses, the dynamic instruction rewriting based execution control,… discussed elsewhere 24

How many interleavings in lucky.c? 25 Process P0 R(from:*, r1) ; R(from:2, r2); S(to:2, r3); R(from:*, r4); All the Ws… Process P1 Sleep(3); S(to:0, r1); All the Ws… Process P2 //Sleep(3); S(to:0, r1); R(from:0, r2); S(to:0, r3); All the Ws… > 500 interleavings without any reductions

How many relevant interleavings? Just two ! One for each Irecv(..) match. 26 Process P0 R(from:*, r1) ; R(from:2, r2); S(to:2, r3); R(from:*, r4); All the Ws… Process P1 Sleep(3); S(to:0, r1); All the Ws… Process P2 //Sleep(3); S(to:0, r1); R(from:0, r2); S(to:0, r3); All the Ws…

MPI is tricky… till you see how it really works!

Which send must be allowed to finish first? P0 --- S(to:1, big-message, h1); … S(to:2, small-message, h2); … W(h2); … W(h1); P1 --- R(from:1, buf1, h3); … W(h3); P1 --- R(from:2, buf2, h4); … W(h4);

MPI is tricky… till you see how it really works! Will this single-process example called “Auto-send” deadlock ? P0 : R(from:0, h1); B; S(to:0, h2); W(h1); W(h2);

30 The “Crooked Barrier” example P0 --- S 1 (to : P2 ); B P1 --- B; P2 --- R(from : *); B S 2 (to : P2 ) Can S 2 (to : P2 ) match R(from : *) ?

31 The “Crooked Barrier” example P0 --- S 1 (to : P2 ); B P1 --- B; P2 --- R(from : *); B S 2 (to : P2 ) Can S 2 (to : P2 ) match R(from : *) ? Match Across Barrier Possible ?

It will be good to explain all these programs without relying upon “bee dances”

MPI HB to the rescue! These pairs WITHIN A PROCESS are in the MPI HB S(to:x); … ; S(to:x) R(from:y); … ; R(from:y) R(from:*); … ; R(from:any) S(to:x, h); … ; W(h) R(from:y, h); … ; W(h) W(h); … ; any B; … ; any 33

This HB is what makes MPI high-performance !! S(to:x); … ; S(to:x) -- order only for non-overtaking R(from:y); … ; R(from:y) -- ditto R(from:*); … ; R(from:any) -- OK wildcard trumps ordinary-card S(to:x, h); … ; W(h) -- Neat! Resource modeling hidden here! (so neat that in our latest work, this HB explains slack inelasticity!!) R(from:y, h); … ; W(h) -- Neat too W(h); … ; any -- One place to truly block B; … ; any -- Another place to block! 34

Strictly, we must define HB on inner events Issued -- > Call returned -- < Call matched -- <> Call completed -- * S, R go thru all four states W has no meaningful <> (take it the same as *) B has no meaningful * (take it the same as <>) For this talk, define HB wrt the higher level instructions themselves (see FM 2009 for details) 35

HB based state transition semantics Fence = instructions that order all later program-ordered instructions via HB also (for us, they are B and W) “Process at a fence” = Process just issued a fence instruction During dynamic verification, each process that is not at a fence is permitted to issue its next instruction, and then extend the HB graph Define HB-ancestor, HB-descendent, matched-HB-ancestor Match-enabled instruction = Whose HB-ancestors have all matched Allow any match-enabled instruction to form a match-set suitably – S goes with matching R, B goes with another B – For S(to:1), S(to:2), and R(from:*), dynamically rewrite to match sets {S(to:1), R(from:1)}, and {S(to:2), R(from:2)} This is called an R* match-set (actually set of match-sets) Fire match sets; an R* match-set is fired only when there are no non- R* match sets, and all processes are at a fence 36

P0 : R(from:0, h1); B; S(to:0, h2); W(h1); W(h2); How Example Auto-send works

P0 : R(from:0, h1); B; S(to:0, h2); W(h1); W(h2); The HB How Example Auto-send works

P0 : R(from:0, h1); B; S(to:0, h2); W(h1); W(h2); Issue R(from:0, h1), because prior to issuing R, P0 is not at a fence

How Example Auto-send works P0 : R(from:0, h1); B; S(to:0, h2); W(h1); W(h2); Issue B, because after issuing R, P0 is not at a fence

P0 : R(from:0, h1); B; S(to:0, h2); W(h1); W(h2); Form match set; Match-enabled set is {B} How Example Auto-send works

P0 : R(from:0, h1); B; S(to:0, h2); W(h1); W(h2); Fire Match-enabled set {B} How Example Auto-send works

P0 : R(from:0, h1); B; S(to:0, h2); W(h1); W(h2); Issue S(to:0, h2) because since B is gone, P0 is no longer at a fence How Example Auto-send works

P0 : R(from:0, h1); B; S(to:0, h2); W(h1); W(h2); Issue W(h1) because after S(to:0, h2), P0 is not at a fence How Example Auto-send works

P0 : R(from:0, h1); B; S(to:0, h2); W(h1); W(h2); Can’t form a { W(h1) } match set because it has an unmatched ancestor (namely R(from:0, h1) ). How Example Auto-send works

P0 : R(from:0, h1); B; S(to:0, h2); W(h1); W(h2); Form and issue the { R(from:0, h1), S(to:0, h2) } match set, and issue How Example Auto-send works

P0 : R(from:0, h1); B; S(to:0, h2); W(h1); W(h2); Now form and issue the match set { W(h1) } How Example Auto-send works

P0 : R(from:0, h1); B; S(to:0, h2); W(h1); W(h2); Now issue W(h2) How Example Auto-send works

P0 : R(from:0, h1); B; S(to:0, h2); W(h1); W(h2); Form match set { W(h2) } and fire it. Done. How Example Auto-send works

50 The “Crooked Barrier” example P0 --- S 1 (to : P2 ); B P1 --- B; P2 --- R(from : *); B S 2 (to : P2 ) S 2 (to : P2 ) can match R(from : *) ! Here is how …

51 The “Crooked Barrier” example P0 --- S 1 (to : P2 ); B P1 --- B; P2 --- R(from : *); B S 2 (to : P2 ) S 2 (to : P2 ) can match R(from : *) ! Here is how …

52 The “Crooked Barrier” example P0 --- S 1 (to : P2 ); B P1 --- B; P2 --- R(from : *); B S 2 (to : P2 ) S 2 (to : P2 ) can match R(from : *) ! Here is how …

53 The “Crooked Barrier” example P0 --- S 1 (to : P2 ); B P1 --- B; P2 --- R(from : *); B S 2 (to : P2 ) S 2 (to : P2 ) can match R(from : *) ! Here is how …

MPI Program that needs this sort of API-aware dyn. Verif. (we will see how ISP works on this example) Process P0 Isend(1, req) ; Barrier ; Wait(req) ; Process P1 Irecv(*, req) ; Barrier ; Recv(2) ; Wait(req) ; Process P2 Barrier ; Isend(1, req) ; Wait(req) ; 54

Executable Proc 1 Proc 2 …… Proc n Scheduler that generates ALL RELEVANT schedules (Mazurkeiwicz Traces) Run MPI Runtime 55 MPI Program Interposition Layer Workflow of ISP

56 P0 P1 P2 Barrier Isend(1, req) Wait(req) Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) sendNext Barrier MPI Runtime POE

P0 P1 P2 Barrier Isend(1, req) Wait(req) Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) sendNext Barrier Irecv(*) Barrier 57 MPI Runtime POE

P0 P1 P2 Barrier Isend(1, req) Wait(req) Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) Barrier Irecv(*) Barrier Barrier Barrier 58 MPI Runtime POE

P0 P1 P2 Barrier Isend(1, req) Wait(req) MPI Runtime Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) Barrier Irecv(*) Barrier Wait (req) Recv(2) Isend(1) SendNext Wait (req) Irecv(2) Isend Wait No Match-Set No Match-Set 59 Deadlock! POE

Buffering Sensitive Deadlock (deadlocks if buffering not present in MPI runtime – same theory works) Process P0 Send(to:1, tag:10); Send(to:2, tag:9); Process P1 Recv(from:*, tag:11); Recv(from:*, tag:10); Process P2 Recv(from:0, tag:9); Send(to:1, tag:11); If Send(to:1, tag:10); is provided INSUFFICIENT BUFFERING by the runtime, then the execution will deadlock 60

Concluding Remarks Formal Verification for Concurrency serves many purposes – Helps find bugs – Helps understand programs – Helps improve efficiency of code with FV serving as safety-net One of the biggest remaining challenges – Efficient DEBUGGING – Safe Design Practices – Exploitation of Concurrency Patterns to reduce verification complexity How to formally downscale systems? How to address symmetry? How to achieve parameterized verification? How to DESIGN well-parameterized systems so that downscaling is easier? 61

Extra Slides 62

Summary of Some MPI Commands – Let Send(2) stand for atomic { Isend(2, req); Wait(req) } – Let Recv(3) stand for atomic { Irecv(3, req); Wait(req) } – These are actually BLOCKING MPI operations 63

How Dynamic Verification using Stateless Search Relies on Replays (a recap…) P0 P1 P2 lock(y) ………….. unlock(y) lock(x) ………….. unlock(x) lock(x) ………….. unlock(x) L0L0 U0U0 L1L1 L2L2 U1U1 U2U2 L0L0 U0U0 L2L2 U2U2 L1L1 U1U1 64