Scheduling Considerations for building Dynamic Verification Tools for MPI Sarvani Vakkalanka, Michael DeLisi Ganesh Gopalakrishnan, Robert M. Kirby School.

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

Demo of ISP Eclipse GUI Command-line Options Set-up Audience with LiveDVD About 30 minutes – by Ganesh 1.
1 Chao Wang, Yu Yang*, Aarti Gupta, and Ganesh Gopalakrishnan* NEC Laboratories America, Princeton, NJ * University of Utah, Salt Lake City, UT Dynamic.
Message Passing: Formalization, Dynamic Verification Ganesh Gopalakrishnan School of Computing, University of Utah, Salt Lake City, UT 84112, USA based.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Module 7: Advanced Development  GEM only slides here  Started on page 38 in SC09 version Module 77-0.
A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
EFFICIENT DYNAMIC VERIFICATION ALGORITHMS FOR MPI APPLICATIONS Dissertation Defense Sarvani Vakkalanka Committee: Prof. Ganesh Gopalakrishnan (advisor),
1 Semantics Driven Dynamic Partial-order Reduction of MPI-based Parallel Programs Robert Palmer Intel Validation Research Labs, Hillsboro, OR (work done.
Practical Formal Verification of MPI and Thread Programs Sarvani Vakkalanka Anh Vo* Michael DeLisi Sriram Aananthakrishnan Alan Humphrey Christopher Derrick.
Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
CHESS: A Systematic Testing Tool for Concurrent Software CSCI6900 George.
Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.
Progress Guarantee for Parallel Programs via Bounded Lock-Freedom Erez Petrank – Technion Madanlal Musuvathi- Microsoft Bjarne Steensgaard - Microsoft.
1 Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking Dissertation Defense Anh Vo Committee: Prof. Ganesh Gopalakrishnan.
Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.
Introduction CS 524 – High-Performance Computing.
1 An Approach to Formalization and Analysis of Message Passing Libraries Robert Palmer Intel Validation Research Labs, Hillsboro, OR (work done at the.
1 Distributed Dynamic Partial Order Reduction based Verification of Threaded Software Yu Yang (PhD student; summer intern at CBL) Xiaofang Chen (PhD student;
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
Microsoft Research Faculty Summit Yuanyuan(YY) Zhou Associate Professor University of Illinois, Urbana-Champaign.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Parallel Programming Models and Paradigms
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
1 MPI Verification Ganesh Gopalakrishnan and Robert M. Kirby Students Yu Yang, Sarvani Vakkalanka, Guodong Li, Subodh Sharma, Anh Vo, Michael DeLisi, Geof.
Argonne National Laboratory School of Computing and SCI Institute, University of Utah Practical Model-Checking Method For Verifying Correctness of MPI.
Virtual Machines for HPC Paul Lu, Cam Macdonell Dept of Computing Science.
The Problem  Rigorous descriptions for widely used APIs essential  Informal documents / Experiments not a substitute Goals / Benefits  Define MPI rigorously.
1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,
Introduction In the process of writing or optimizing High Performance Computing software, mostly using MPI these days, designers can inadvertently introduce.
The shift from sequential to parallel and distributed computing is of fundamental importance for the advancement of computing practices. Unfortunately,
Contemporary Languages in Parallel Computing Raymond Hummel.
CSE 486/586 CSE 486/586 Distributed Systems PA Best Practices Steve Ko Computer Sciences and Engineering University at Buffalo.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Dynamic Analysis of Multithreaded Java Programs Dr. Abhik Roychoudhury National University of Singapore.
Pallavi Joshi* Mayur Naik † Koushik Sen* David Gay ‡ *UC Berkeley † Intel Labs Berkeley ‡ Google Inc.
Functional Verification Figure 1.1 p 6 Detection of errors in the design Before fab for design errors, after fab for physical errors.
Games Development 2 Concurrent Programming CO3301 Week 9.
CIS 842: Specification and Verification of Reactive Systems Lecture 1: Course Overview Copyright 2001, Matt Dwyer, John Hatcliff, and Radu Iosif. The.
A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group NERSC User Group Meeting September 17, 2007.
The shift from sequential to parallel and distributed computing is of fundamental importance for the advancement of computing practices. Unfortunately,
Deadlock Analysis with Fewer False Positives Thread T1: sync(G){ sync(L1){ sync(L2){} } }; T3 = new T3(); j3.start(); J3.join(); sync(L2){ sync(L1){} }
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
UniTesK Test Suite Architecture Igor Bourdonov Alexander Kossatchev Victor Kuliamin Alexander Petrenko.
Bronis R. de Supinski and Jeffrey S. Vetter Center for Applied Scientific Computing August 15, 2000 Umpire: Making MPI Programs Safe.
CAPP: Change-Aware Preemption Prioritization Vilas Jagannath, Qingzhou Luo, Darko Marinov Sep 6 th 2011.
Grigore Rosu Founder, President and CEO Professor of Computer Science, University of Illinois
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Inspect, ISP, and FIB Tools for Dynamic Verification and Analysis of Concurrent Programs Faculty Ganesh Gopalakrishnan and Robert M. Kirby Students Inspect.
Gauss Students’ Views on Multicore Processors Group members: Yu Yang (presenter), Xiaofang Chen, Subodh Sharma, Sarvani Vakkalanka, Anh Vo, Michael DeLisi,
Reachability Testing of Concurrent Programs1 Reachability Testing of Concurrent Programs Richard Carver, GMU Yu Lei, UTA.
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
Testing Concurrent Programs Sri Teja Basava Arpit Sud CSCI 5535: Fundamentals of Programming Languages University of Colorado at Boulder Spring 2010.
Symbolic Model Checking of Software Nishant Sinha with Edmund Clarke, Flavio Lerda, Michael Theobald Carnegie Mellon University.
Defining the Competencies for Leadership- Class Computing Education and Training Steven I. Gordon and Judith D. Gardiner August 3, 2010.
Pitfalls: Time Dependent Behaviors CS433 Spring 2001 Laxmikant Kale.
The University of Adelaide, School of Computer Science
CompSci 725 Presentation by Siu Cho Jun, William.
Parallel Algorithm Design
runtime verification Brief Overview Grigore Rosu
Indranil Roy High Performance Computing (HPC) group
Faculty Ganesh Gopalakrishnan and Robert M. Kirby Students
Over-Approximating Boolean Programs with Unbounded Thread Creation
Reachability testing for concurrent programs
Dr. Tansel Dökeroğlu University of Turkish Aeronautical Association Computer Engineering Department Ceng 442 Introduction to Parallel.
Testing and Debugging Concurrent Code
Distributed Dynamic Channel Allocation in Wireless Network
Presentation transcript:

Scheduling Considerations for building Dynamic Verification Tools for MPI Sarvani Vakkalanka, Michael DeLisi Ganesh Gopalakrishnan, Robert M. Kirby School of Computing, University of Utah, Salt Lake City Supported by Microsoft HPC Institutes, NSF CNS

2 (BlueGene/L - Image courtesy of IBM / LLNL) (Image courtesy of Steve Parker, CSAFE, Utah) The scientific community is increasingly employing expensive supercomputers that employ distributed programming libraries…. …to program large-scale simulations in all walks of science, engineering, math, economics, etc. Background

Current Programming Realities Code written using mature libraries (MPI, OpenMP, PThreads, …) Code written using mature libraries (MPI, OpenMP, PThreads, …) API calls made from real programming languages (C, Fortran, C++) API calls made from real programming languages (C, Fortran, C++) Runtime semantics determined by realistic Compilers and Runtimes How best to verify codes that will run on actual platforms? 3

Classical Model Checking Finite State Model of Concurrent Program Check Properties Extraction of Finite State Models for realistic programs is difficult. 4

Dynamic Verification Actual Concurrent Program Check Properties Avoid model extraction which can be tedious and imprecise Program serves as its own model Reduce Complexity through Reduction of interleavings (and other methods) 5

Dynamic Verification Actual Concurrent Program Check Properties One Specific Test Harness  Need test harness in order to run the code.  Will explore ONLY RELEVANT INTERLEAVINGS (all Mazurkeiwicz traces) for the given test harness  Conventional testing tools cannot do this !!  E.g. 5 threads, 5 instructions each  10  interleavings !! 6

Dynamic Verification Actual Concurrent Program Check Properties One Specific Test Harness Need to consider all test harnesses FOR MANY PROGRAMS, this number seems small (e.g. Hypergraph Partitioner) 7

Related Work Dynamic Verification tool: – CHESS – Verisoft (POPL ’97) – DPOR (POPL’ 05) – JPF ISP is similar to CHESS and DPOR 8

Dynamic Partial Order Reduction (DPOR) P0 P1 P2 lock(x) ………….. unlock(x) lock(x) ………….. unlock(x) lock(x) ………….. unlock(x) L0L0 U0U0 L1L1 L2L2 U1U1 U2U2 L0L0 U0U0 L2L2 U2U2 L1L1 U1U1 9

Executable Proc 1 Proc 2 …… Proc n Scheduler Run MPI Runtime ISP  Manifest only/all relevant interleavings (DPOR)  Manifest ALL relevant interleavings of the MPI Progress Engine : - Done by DYNAMIC REWRITING of WILDCARD Receives. MPI Program Profiler 10

Using PMPI Scheduler MPI Runtime P0’s Call Stack User_Function MPI_Send SendEnvelope P0: MPI_Send TCP socket PMPI_Send In MPI Runtime PMPI_Send MPI_Send 11

DPOR and MPI  Implemented an Implicit deadlock detection technique form a single program trace.  Issues with MPI progress engine for wildcard receives could not be resolved.  More details can be found in our CAV’2008 paper: “Dynamic Verification of MPI Programs with Reductions in Presence of Split Operations and Relaxed Orderings” 12

POE P0 P1 P2 Barrier Isend(1, req) Wait(req) MPI Runtime Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) sendNext Barrier 13

P0 P1 P2 Barrier Isend(1, req) Wait(req) MPI Runtime Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) sendNext Barrier Irecv(*) Barrier 14 POE

P0 P1 P2 Barrier Isend(1, req) Wait(req) MPI Runtime Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) Barrier Irecv(*) Barrier Barrier Barrier 15 POE

P0 P1 P2 Barrier Isend(1, req) Wait(req) MPI Runtime Scheduler Irecv(*, req) Barrier Recv(2) Wait(req) Isend(1, req) Wait(req) Barrier Isend(1) Barrier Irecv(*) Barrier Wait (req) Recv(2) Isend(1) SendNext Wait (req) Irecv(2) Isend Wait No Match-Set No Match-Set Deadlock! 16 POE

MPI_Waitany + POE P0 P1 P2 MPI Runtime Scheduler Barrier Recv(0) Barrier Isend(1, req[0]) Waitany(2,req) Isend(2, req[1]) Barrier Isend(1, req[0]) sendNext Isend(2, req[0]) sendNext Waitany(2, req) Recv(0) Barrier 17

P0 P1 P2 MPI Runtime Scheduler Barrier Recv(0) Barrier Isend(1, req[0]) Waitany(2,req) Isend(2, req[1]) Barrier Isend(1, req[0]) Isend(2, req[0]) Waitany(2, req) Recv(0) Barrier Isend(1,req[0]) Recv Barrier Error! req[1] invalid Valid Invalid req[0] req[1] MPI_REQ_NULL 18 MPI_Waitany + POE

MPI Progress Engine Issues P0 P1 MPI Runtime Scheduler Isend(0, req) Wait(req) Irecv(1, req) Wait(req) Barrier Irecv(1, req) Barrier Wait Isend(0, req) Barrier sendNext PMPI_Wait Does not Return Scheduler Hangs PMPI_Irecv + PMPI_Wait 19

Experiments  ISP was run on 69 examples of the Umpire test suite.  Detected deadlocks in these examples where tools like Marmot cannot detect these deadlocks.  Produced far smaller number of interleavings compared to those without reduction.  ISP run on Game of Life ~ 500 lines code.  ISP run on Parmetis ~ 14k lines of code.  Widely used for parallel partitioning of large hypergraphs  ISP run on MADRE  (Memory aware data redistribution engine by Siegel and Siegel, EuroPVM/MPI 08) Found previously KNOWN deadlock, but AUTOMATICALLY within one second !  Results available at: 20

Concluding Remarks  Tool available (download and try)  Future work  Distributed ISP scheduler  Handle MPI + Threads  Do large-scale bug hunt now that ISP can execute large- scale codes. 21

Implicit Deadlock Detection P0 P1 P2 Irecv(*, req) Recv(2) Wait(req) Isend(0, req) Wait(req) Isend(0, req) Wait(req) MPI Runtime Scheduler P0 : Irecv(*) P1 : Isend(P0) P2 : Isend(P0) P0 : Recv(P2) P1 : Wait(req) P2 : Wait(req) P0 : Wait(req) No Matching Send Deadlock! 22