Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Slides:



Advertisements
Similar presentations
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community.
Advertisements

Practice Session 7 Synchronization Liveness Deadlock Starvation Livelock Guarded Methods Model Thread Timing Busy Wait Sleep and Check Wait and Notify.
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community.
Concurrency: Deadlock and Starvation Chapter 6. Deadlock Permanent blocking of a set of processes that either compete for system resources or communicate.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
Requirements on the Execution of Kahn Process Networks Marc Geilen and Twan Basten 11 April 2003 /e.
Toward Efficient Support for Multithreaded MPI Communication Pavan Balaji 1, Darius Buntinas 1, David Goodell 1, William Gropp 2, and Rajeev Thakur 1 1.
Chapter 6 Concurrency: Deadlock and Starvation
Multi-Object Synchronization. Main Points Problems with synchronizing multiple objects Definition of deadlock – Circular waiting for resources Conditions.
Threading Part 2 CS221 – 4/22/09. Where We Left Off Simple Threads Program: – Start a worker thread from the Main thread – Worker thread prints messages.
Avishai Wool lecture Introduction to Systems Programming Lecture 4 Inter-Process / Inter-Thread Communication.
Concurrent Processes Lecture 5. Introduction Modern operating systems can handle more than one process at a time System scheduler manages processes and.
Developing Verifiable Concurrent Software Tevfik Bultan Department of Computer Science University of California, Santa Barbara
Concurrency CS 510: Programming Languages David Walker.
OS Spring 2004 Concurrency: Principles of Deadlock Operating Systems Spring 2004.
Building Symbiotic Relationships Between Formal Verification and High Performance Computing Mike Kirby School of Computing and Scientific Computing and.
Argonne National Laboratory School of Computing and SCI Institute, University of Utah Practical Model-Checking Method For Verifying Correctness of MPI.
OS Fall’02 Concurrency: Principles of Deadlock Operating Systems Fall 2002.
The Problem  Rigorous descriptions for widely used APIs essential  Informal documents / Experiments not a substitute Goals / Benefits  Define MPI rigorously.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
1 Concurrency: Deadlock and Starvation Chapter 6.
Utah Verifier Group Research Overview Robert Palmer.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
Introduction In the process of writing or optimizing High Performance Computing software, mostly using MPI these days, designers can inadvertently introduce.
Monitors: An Operating System Structuring Concept
0 Deterministic Replay for Real- time Software Systems Alice Lee Safety, Reliability & Quality Assurance Office JSC, NASA Yann-Hang.
Concurrency: Deadlock and Starvation Chapter 6. Goal and approach Deadlock and starvation Underlying principles Solutions? –Prevention –Detection –Avoidance.
1 Concurrency: Deadlock and Starvation Chapter 6.
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic,
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
Institute e-Austria in Timisoara 1 Author: prep. eng. Calin Jebelean Verification of Communication Protocols using SDL ( )
Java Threads 11 Threading and Concurrent Programming in Java Introduction and Definitions D.W. Denbo Introduction and Definitions D.W. Denbo.
Verifying Autonomous Planning Systems Even the best laid plans need to be verified Prepared for the 2005 Software Assurance Symposium (SAS) DS1 MSL EO1.
Games Development 2 Concurrent Programming CO3301 Week 9.
1 Announcements The fixing the bug part of Lab 4’s assignment 2 is now considered extra credit. Comments for the code should be on the parts you wrote.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
15.1 Threads and Multi- threading Understanding threads and multi-threading In general, modern computers perform one task at a time It is often.
The shift from sequential to parallel and distributed computing is of fundamental importance for the advancement of computing practices. Unfortunately,
1 Deadlock. 2 Concurrency Issues Past lectures:  Problem: Safely coordinate access to shared resource  Solutions:  Use semaphores, monitors, locks,
1 "Workshop 31: Developing a Hands-on Undergraduate Parallel Programming Course with Pattern Programming SIGCSE The 44 th ACM Technical Symposium.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
COSC 3407: Operating Systems Lecture 9: Readers-Writers and Language Support for Synchronization.
C H A P T E R E L E V E N Concurrent Programming Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings.
Harnessing the Cloud for Securely Outsourcing Large- Scale Systems of Linear Equations.
1 Previous Lecture Overview  semaphores provide the first high-level synchronization abstraction that is possible to implement efficiently in OS. This.
Software Systems Verification and Validation Laboratory Assignment 4 Model checking Assignment date: Lab 4 Delivery date: Lab 4, 5.
What Makes Device Driver Development Hard Synthesizing Device Drivers Roumen Kaiabachev and Walid Taha Department of Computer Science, Rice University.
Implementing Lock. From the Previous Lecture  The “too much milk” example shows that writing concurrent programs directly with load and store instructions.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
3/17/2016cse synchronization-p2 © Perkins, DW Johnson and University of Washington1 Synchronization Part 2 CSE 410, Spring 2008 Computer.
Implementing Mutual Exclusion Andy Wang Operating Systems COP 4610 / CGS 5765.
Operating systems depend on device drivers to communicate with attached hardware. A device driver is a collection of subroutines written in a low-level.
NETW 3005 Monitors and Deadlocks. Reading For this lecture, you should have read Chapter 7. NETW3005 (Operating Systems) Lecture 06 - Deadlocks2.
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community.
Healing Data Races On-The-Fly
Multi Threading.
Background on the need for Synchronization
G.Anuradha Reference: William Stallings
Faster Data Structures in Transactional Memory using Three Paths
Threading And Parallel Programming Constructs
Background and Motivation
Implementing Mutual Exclusion
Implementing Mutual Exclusion
Synchronization These notes introduce:
Presentation transcript:

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication Salman Pervez, Ganesh Gopalakrishnan, Robert M. Kirby School of Computing University of Utah Rajeev Thakur, William Gropp Mathematics and Computer Science Division Argonne National Laboratory

Argonne National Laboratory School of Computing and SCI Institute, University of Utah The demand for concurrent software is increasing. Concurrent algorithms are notoriously hard to design and verify. Formal methods, and in particular finite-state model checking, provide a means of reasoning about concurrent algorithms. Principle advantages of modeling checking approach: - provides formal framework for reasoning - allows coverage – examination of all possible process interleavings Thesis of the Talk Thesis: If finite-state models are created and exhaustively analyzed for desired formal properties, robust algorithms and implementations will result.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah What is Model Checking? Navier-Stokes Equations are a mathematical model of fluid flow physics “V&V” – Validation and Verification “Validate Models, Verify Codes” “ Formal models” can be generated either automatically or by a modeler which translate and abstract algorithms and implementations.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Model Checking: History and Current Practice History –Approach invented around 1981 by: Clarke and Emerson, Queille and Sifakis –Widely used in Hardware Verification since the 90’s –Uses in Software Verification is the current rage Notable Successes –Bell Labs : Telephone Switch Software Verification –NASA : Concurrent Java Program Verification –Microsoft : Device Driver Verification Applications in HPC by others: –Siegel and Avrunin: MPI two-sided communication programs –Matlin, Lusk, McCune: Verifying parts of MPD

Argonne National Laboratory School of Computing and SCI Institute, University of Utah MPI One-Sided Communication MPI One-Sided Constructs Examined: –MPI_Win_lock –MPI_Win_unlock –MPI_Put –MPI_Get The desired atomicity is provided by the constructs MPI_Win_Lock / MPI_Win_Unlock Once the lock is relinquished, data values can no longer be trusted

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Test Case: Byte-Range Algorithm Algorithm implemented using MPI one-sided communication (with passive-target lock-unlock synchronization) for coordinating a collection of parallel processes contending for byte-range locks. Notes Concerning Algorithm: To acquire a lock, a process must checkpoint the global state by ‘simultaneously’ indicating its intent and reading others’ status. When the lock owner release the lock, he wakes up all conflicting ‘sleeping’ processes.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Lock Acquire lock_acquire (start, end) { Stage 1 1 val[0] = 1; /* flag */ val[1] = start; val[2] = end; 2 while(1) { 3 lock_win 4 place val in win 5 get values of other processes from win 6unlock_win 7 for all i, if (Pi conflicts with my range) 8 conflict = 1; Stage 2 9 if(conflict) { 10 val[0] = 0 11 lock_win 12 place val in win 13 unlock_win 14 MPI_Recv(ANY_SOURCE) 15 } 16 else{ 17 /* lock is acquired */ 18break; 19} 20 }//end while Window: P0P0 P1P1 flag start end

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Lock Release lock_release (start, end) { val[0] = 0; /* flag */ val[1] = -1; val[2] = -1; lock_win place val in win get values of other processes from win unlock_win for all i, if (P i conflicts with my range) MPI_Send(P i ); } Window: P0P0 P1P1 flag start end

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Example 1: Demonstration of Lock Acquire/Release Strategy

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 1 Example 1: Demonstration of Lock Acquire/Release Strategy

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 2 Blocks on Receive Example 1: Demonstration of Lock Acquire/Release Strategy

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 2 Blocks on Receive Send Signal to P1 Example 1: Demonstration of Lock Acquire/Release Strategy

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Receives Signal Retry Stage 1 Send Signal to P1 Example 1: Demonstration of Lock Acquire/Release Strategy

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) lock_release() Example 1: Demonstration of Lock Acquire/Release Strategy

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) lock_release() Example 1: Demonstration of Lock Acquire/Release Strategy

Argonne National Laboratory School of Computing and SCI Institute, University of Utah inlineMPI_Win_lock(proc_i) { /* try sending a message on a channel of size 1, will block if a message is already in the queue. */ lock_chan!proc_id; } Modeling in Promela Example Promela Code for lock_release C-like structure Powerful abstractions like channels

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Example 2: Demonstration of Lock Acquire/Release Limitation

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 1 Example 2: Demonstration of Lock Acquire/Release Limitation

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 1 Example 2: Demonstration of Lock Acquire/Release Limitation Send Signal to P1

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 1 Example 2: Demonstration of Lock Acquire/Release Limitation

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 1 Example 2: Demonstration of Lock Acquire/Release Limitation Deduces Conflict – Stage 2 Block on Receive

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Receive Signal Retry Stage 1 Deduces Conflict – Stage 1 Example 2: Demonstration of Lock Acquire/Release Limitation

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 1 Example 2: Demonstration of Lock Acquire/Release Limitation Deduces Conflict – Stage 1

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 1Deduces Conflict – Stage 2 Block on Receive Example 2: Demonstration of Lock Acquire/Release Limitation

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 2 Block on Receive Deduces Conflict – Stage 2 Block on Receive Example 2: Demonstration of Lock Acquire/Release Limitation

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 2 Block on Receive Deduces Conflict – Stage 2 Block on Receive Example 2: Demonstration of Lock Acquire/Release Limitation DEADLOCK

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Observations After Model Checking P0 releases lock before it can see that P1 will be blocked. There is no way for P0 to figure out whether P1 merely wants the lock or is actually blocked. Multiple unmatched sends can occur (example to follow)

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1Process 2 lock_acquire(3,5) lock_release() lock_acquire(6,8) lock_release() lock_acquire(5,6) P2P2 Example 3: Demonstration of Lock Acquire/Release Limitation

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1Process 2 lock_acquire(3,5) lock_release() lock_acquire(6,8) lock_release() lock_acquire(5,6) P2P2 Deduces Conflict – Stage 1 Example 2: Demonstration of Lock Acquire/Release Limitation

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1Process 2 lock_acquire(3,5) lock_release() lock_acquire(6,8) lock_release() lock_acquire(5,6) P2P2 Deduces Conflict – Stage 2 Block on Receive Example 2: Demonstration of Lock Acquire/Release Limitation

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1Process 2 lock_acquire(3,5) lock_release() lock_acquire(6,8) lock_release() lock_acquire(5,6) P2P2 Deduces Conflict – Stage 2 Block on Receive Send Signal to P2 Example 2: Demonstration of Lock Acquire/Release Limitation

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Proposed Solution 1 Main idea: Distinguish between processes that want the lock and those that are blocked. Three possible flag values: –0 = I do not have the lock –1 = I have the lock –2 = I am trying for the lock If a process wants the lock, but finds another conflicting process with a flag value of 2, it must wait until this value changes to either 1 or 0. We have added more certainty to the algorithm but taken a possible performance hit and possible livelock.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Proposed Solution 2 Main Idea: The process about to be blocked picks who will wake it up and indicates so by writing to shared memory Once processes declare their intentions globally, deadlock can be avoided. For there to be deadlock, a dependency cycle must exist. The last process to complete this cycle will know about it and must not do so. Window: P0P0 P1P1 flag start end pick

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 1 Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 1 Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 1 Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 1Deduces Conflict – Stage 2 Block on Receive Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 2 Block on Receive No Conflict – Stage 1 Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Window: P0P0 P1P Process 0Process 1 lock_acquire(3,5) lock_release() lock_acquire(3,5) Deduces Conflict – Stage 2 Block on Receive Deduces Deadlock – Stage 2 Reset to Stage 1 Example 3: Demonstration of Lock Acquire/Release Proposed Solution 2

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Discussion and Future Work “Execution Checking” “Model Checking” In current practice, concrete executions on a few diverse platforms are often used to verify algorithms/codes. Consequence: Many feasible executions might not be manifested. Model checking forces all executions of a judiciously down-scaled model to be examined. Current focus of our research: minimize modeling effort and error.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Funding Acknowledgements: NSF (CSR–SMA: Toward Reliable and Efficient Message Passing Software Through Formal Analysis) Microsoft (Formal Analysis and Code Generation Support for MPI) Office of Science – Department of Energy Summary Paradigms such as one-sided MPI and threading creates a plethora of execution possibilities – many of which might be algorithmically fatal yet lay dormant at testing time. Model checking provides a formal and practical means of reasoning about all possible executions as part of the design, verification and optimization process. Closing Question (“Food for Thought”): Can one come up with safe usages (i.e. easier to verify yet not overly restrictive) of one-sided communication?