Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scheduling Considerations for building Dynamic Verification Tools for MPI Sarvani Vakkalanka, Michael DeLisi Ganesh Gopalakrishnan, Robert M. Kirby School.

Similar presentations


Presentation on theme: "Scheduling Considerations for building Dynamic Verification Tools for MPI Sarvani Vakkalanka, Michael DeLisi Ganesh Gopalakrishnan, Robert M. Kirby School."— Presentation transcript:

1 Scheduling Considerations for building Dynamic Verification Tools for MPI Sarvani Vakkalanka, Michael DeLisi Ganesh Gopalakrishnan, Robert M. Kirby School of Computing, University of Utah, Salt Lake City Supported by Microsoft HPC Institutes, NSF CNS-0509379 http://www.cs.utah.edu/formal_verification 1

2 2 (BlueGene/L - Image courtesy of IBM / LLNL) (Image courtesy of Steve Parker, CSAFE, Utah) The scientific community is increasingly employing expensive supercomputers that employ distributed programming libraries…. …to program large-scale simulations in all walks of science, engineering, math, economics, etc. Background

3 Current Programming Realities Code written using mature libraries (MPI, OpenMP, PThreads, …) Code written using mature libraries (MPI, OpenMP, PThreads, …) API calls made from real programming languages (C, Fortran, C++) API calls made from real programming languages (C, Fortran, C++) Runtime semantics determined by realistic Compilers and Runtimes How best to verify codes that will run on actual platforms? 3

4 Classical Model Checking Finite State Model of Concurrent Program Check Properties Extraction of Finite State Models for realistic programs is difficult. 4

5 Dynamic Verification Actual Concurrent Program Check Properties Avoid model extraction which can be tedious and imprecise Program serves as its own model Reduce Complexity through Reduction of interleavings (and other methods) 5

6 Dynamic Verification Actual Concurrent Program Check Properties One Specific Test Harness  Need test harness in order to run the code.  Will explore ONLY RELEVANT INTERLEAVINGS (all Mazurkeiwicz traces) for the given test harness  Conventional testing tools cannot do this !!  E.g. 5 threads, 5 instructions each  10  interleavings !! 6

7 Dynamic Verification Actual Concurrent Program Check Properties One Specific Test Harness Need to consider all test harnesses FOR MANY PROGRAMS, this number seems small (e.g. Hypergraph Partitioner) 7

8 Related Work Dynamic Verification tool: – CHESS – Verisoft (POPL ’97) – DPOR (POPL’ 05) – JPF ISP is similar to CHESS and DPOR 8

9 Dynamic Partial Order Reduction (DPOR) P0 P1 P2 lock(x) ………….. unlock(x) lock(x) ………….. unlock(x) lock(x) ………….. unlock(x) L0L0 U0U0 L1L1 L2L2 U1U1 U2U2 L0L0 U0U0 L2L2 U2U2 L1L1 U1U1 9

10 Executable Proc 1 Proc 2 …… Proc n Scheduler Run MPI Runtime ISP  Manifest only/all relevant interleavings (DPOR)  Manifest ALL relevant interleavings of the MPI Progress Engine : - Done by DYNAMIC REWRITING of WILDCARD Receives. MPI Program Profiler 10

11 Using PMPI Scheduler MPI Runtime P0’s Call Stack User_Function MPI_Send SendEnvelope P0: MPI_Send TCP socket PMPI_Send In MPI Runtime PMPI_Send MPI_Send 11

12 DPOR and MPI  Implemented an Implicit deadlock detection technique form a single program trace.  Issues with MPI progress engine for wildcard receives could not be resolved.  More details can be found in our CAV’2008 paper: “Dynamic Verification of MPI Programs with Reductions in Presence of Split Operations and Relaxed Orderings” 12

13 POE P0 P1 P2 Barrier Isend(1, req) Wait(req) MPI Runtime Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) sendNext Barrier 13

14 P0 P1 P2 Barrier Isend(1, req) Wait(req) MPI Runtime Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) sendNext Barrier Irecv(*) Barrier 14 POE

15 P0 P1 P2 Barrier Isend(1, req) Wait(req) MPI Runtime Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) Barrier Irecv(*) Barrier Barrier Barrier 15 POE

16 P0 P1 P2 Barrier Isend(1, req) Wait(req) MPI Runtime Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) Barrier Irecv(*) Barrier Wait (req) Recv(2) Isend(1) SendNext Wait (req) Irecv(2) Isend Wait No Match-Set No Match-Set Deadlock! 16 POE

17 MPI_Waitany + POE P0 P1 P2 MPI Runtime Scheduler Barrier Recv(0) Barrier Isend(1, req[0]) Waitany(2,req) Isend(2, req[1]) Barrier Isend(1, req[0]) sendNext Isend(2, req[0]) sendNext Waitany(2, req) Recv(0) Barrier 17

18 P0 P1 P2 MPI Runtime Scheduler Barrier Recv(0) Barrier Isend(1, req[0]) Waitany(2,req) Isend(2, req[1]) Barrier Isend(1, req[0]) Isend(2, req[0]) Waitany(2, req) Recv(0) Barrier Isend(1,req[0]) Recv Barrier Error! req[1] invalid Valid Invalid req[0] req[1] MPI_REQ_NULL 18 MPI_Waitany + POE

19 MPI Progress Engine Issues P0 P1 MPI Runtime Scheduler Isend(0, req) Wait(req) Irecv(1, req) Wait(req) Barrier Irecv(1, req) Barrier Wait Isend(0, req) Barrier sendNext PMPI_Wait Does not Return Scheduler Hangs PMPI_Irecv + PMPI_Wait 19

20 Experiments  ISP was run on 69 examples of the Umpire test suite.  Detected deadlocks in these examples where tools like Marmot cannot detect these deadlocks.  Produced far smaller number of interleavings compared to those without reduction.  ISP run on Game of Life ~ 500 lines code.  ISP run on Parmetis ~ 14k lines of code.  Widely used for parallel partitioning of large hypergraphs  ISP run on MADRE  (Memory aware data redistribution engine by Siegel and Siegel, EuroPVM/MPI 08) Found previously KNOWN deadlock, but AUTOMATICALLY within one second !  Results available at: http://www.cs.utah.edu/formal_verification/ISP_Tests 20

21 Concluding Remarks  Tool available (download and try)  Future work  Distributed ISP scheduler  Handle MPI + Threads  Do large-scale bug hunt now that ISP can execute large- scale codes. 21

22 Implicit Deadlock Detection P0 P1 P2 Irecv(*, req) Recv(2) Wait(req) Isend(0, req) Wait(req) Isend(0, req) Wait(req) MPI Runtime Scheduler P0 : Irecv(*) P1 : Isend(P0) P2 : Isend(P0) P0 : Recv(P2) P1 : Wait(req) P2 : Wait(req) P0 : Wait(req) No Matching Send Deadlock! 22


Download ppt "Scheduling Considerations for building Dynamic Verification Tools for MPI Sarvani Vakkalanka, Michael DeLisi Ganesh Gopalakrishnan, Robert M. Kirby School."

Similar presentations


Ads by Google