VyrdMC: Driving Runtime Refinement Checking Using Model Checkers

VyrdMC: Driving Runtime Refinement Checking Using Model Checkers
Tayfun Elmas, Serdar Tasiran Koç University, Istanbul, Turkey Hi all. I’m Tayfun Elmas from Koc University. In this talk I’ll present you a technique for detecting concurrency errors. In this technique we watch for refinement violations at runtime. This is joint work with my advisor Serdar Tasiran and Shaz Qadeer, from Microsoft Research. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Verifying Concurrent Data Structures
Motivation Widely-used software systems are built on concurrent data structures File systems, databases, internet services Standard Java and C# class libraries Intricate synchronization mechanisms to improve performance Prone to concurrency errors Concurrency errors Data loss/corruption Difficult to detect, reproduce through testing Well, Many widely-used software applications are built on concurrent data structures. Examples are file systems, databases, internet services and some standard Java and C# class libraries. These systems frequently use intricate synchronization mechanisms to get better performance in a concurrent environment. This makes them prone to concurrency errors. Concurrency errors can have serious consequences, such as data loss or corruption. Unfortunately, these errors are typically hard to detect and reproduce through pure testing-based techniques. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Our Approach Runtime Checking of Refinement
Use refinement as correctness criterion More thorough than assertions More observability than pure testing Linearizability, atomicity For each execution of Impl there exists an “equivalent” atomic execution of Impl Refinement there exists an “equivalent”, atomic execution of Spec Refinement uses separate, more permissive Spec Example: Allow methods to terminate exceptionally Model failure due to resource contention Runtime verification: Check refinement using execution traces Can handle industrial-scale programs Intermediate between testing & exhaustive verification Keywords: Linerizability and atomicy are more restrictive. The flexibility in spec gives us a more powerful method to prove correctness of some tricky implmentations. In our approach to verifying concurrent data structures we use refinement as the correctness criterion. The benefits of this choice are that refinement is a more thorough condition than method local assertions and that it provides more observability than pure testing. Correctness conditions like Linearizability and atomicity require that for each execution of impl in a concurrent environment there exists an equivalent atomic execution of the same Impl. However Refinement uses a separate specification and for each execution of the impl refinement requires existence of an equivalent atomic execution of this spec. The specification we use is more permissive than the impl. For example the spec allows methods to terminate exceptionally to model failure due to resource contention in a concurrent environment. However the impl would not allow some of the method executions to fail. We check refinement at runtime using execution traces of the implementation. We do this in order to be able to handle industrial-scale programs. Our approach can be regarded as intermediate between testing and exhaustive verification with respect to the coverage of the whole execution space explored. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Outline Motivating example Refinement The VYRD tool Experience
I/O-refinement View-refinement The VYRD tool Experience Conclusions Here is the outline of my talk. First I’ll give a motivating data structure example and and explain how our technique applies to the motivating example Then I’ll talk about two different notions of refinement called ... I’ll introduce our runtime verification tool Vyrd and the experience we had by applying Vyrd on industrial scale software. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Multiset I/O Refinement Witness ordering Spec trace Spec trace M=Ø M=Ø
Call Insert(3) Unlock A[0] A[0].elt=3 Call LookUp(3) Return“success” Unlock A[1] A[1].elt=4 Call Insert(4) read A[0] Return “true” Call Delete(3) A[0].elt=null Call Insert(3) Call LookUp(3) Return“success” Call Insert(4) Return “true” Call Delete(3) M=Ø {3} {3, 4} {4} Spec trace Call Insert(3) Return “success” Call LookUp(3) Call Insert(4) Call Delete(3) Return “true” M=Ø {3} {3, 4} {4} Spec trace Call Insert(3) Return “success” Call LookUp(3) Call Insert(4) Call Delete(3) M = M U {3} Check 3  M Return “true” M = M U {4} M = M \ {3} Commit Insert(3) Commit LookUp(3) Commit Insert(4) Commit Delete(3) Witness ordering In this slide we’ll explain how we check IO refinement. Again we use the insert operation instead of insertpair to simplify the picture. On the right you see 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Replicated Disk Manager
The Boxwood Project Experience BLINKTREE MODULE Root Pointer Node Internal Pointer Node Level n Level n Root Level Leaf Pointer Node ... Level 0 ... Data Node ... Data Nodes ... GlobalDiskAllocator CHUNK MANAGER MODULE Replicated Disk Manager Write Read Read Write CACHE MODULE Dirty Cache Entries ... Clean Cache Entries Cache We verified all modules of this system. We caught an interesting difficult tricky error that has gone undetected. We run Vyrd on the Boxwood Project from Microsoft. The goal of the Boxwood project is building a distributed abtract storage infrastructure for applications with high data storage and retrieval requirements. Here, you see a high level picture of Boxwood. Boxwood has a concurrent blinktree implementation in Blinktree module. The blinktree module uses a cache module to store and retrieve its data at tree nodes quickly. The cache module makes its data persistent using a chunk manager module. The chunk manager implements distributed storage system that abstracts the storage system from the upper layers. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Write(handle,AB) starts
Concurrency Bug in Cache Experience Write(handle,AB) starts Flush() starts handle T Z Chunk Manager X Y Cache handle X Z Chunk Manager A Y Cache handle A Y Chunk Manager Cache Write(handle, AB) ends Flush() ends Different byte-arrays for the same handle Corrupted data in persistent storage handle A Y Chunk Manager A B Cache handle A Y Chunk Manager A B Cache In this slide we demonstrate you the error Vyrd caught in Cache module. Think of dirty cache containing the data X.Y. Let the persistent storage for the same handle contains T.Z. At the end corrupted data is written to the persistent storage. It breaks the invariant (i) because for the clean cache entry the byte arrays in cache and chunk manager are not the same. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Experience Concurrency Bug in Cache
Concurrent execution of Write and Flush on the same handle Write to a dirty entry not locked properly by Write Hard to catch through testing Cache entry is correct, permanent version wrong As long as Read’s hit in Cache, return value correct Caught through testing only if Cache fills, clean entry in Cache is evicted No “Write”s to entry in the meantime Entry read after eviction Very unlikely The error we showed in the previous slide is caused by a Flush method interleaved with a Write method which does not properly protecting a dirty entry it accesses. Thus the operations on the same handle by Flush can ge interleaved without consent of the Write method. This error is hard to catch through testing because the data in the cache entry is correct while the data in the persistent storage is not. Thus all the reads that hit in Cache return the correct value. It would be caught through testing only if the clean cache entry was evicted and no writes were performed in the meantime. Then a read would detect the errornous value and signal the refinement violation. However, this scenarios is very unlikely since cache is accessed intensively in a highly concurrent manner. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

View-refinement More Observability I/O-refinement may miss errors
Our solution: View-refinement I/O-refinement + “correspondence” between states of Impl and Spec at commit points Add new observed action to  ={call, commit and return action} Label commit actions with state information Catches state discrepancy right when it happens Early warnings for possible I/O refinement violations Abstraction function computes value of view variable at commit point Written by user An extra method of the component As we saw in the previous example with 2 insertpairs IO refinement is not that good at finding refinement errors.The problem is IO refinement relies on observer methods and if the observer methods do not get interleaved in the right place along the trace, IO refinement may miss errors. In the extreme case, if there are no observer methods, IO refinement trivially passes any executions. In another case, when a refinement violation is detected, the source of the bug may be too far in the past so there may need to be an analysis of the trace to the far back. Our solution is view-refinement. View refinement augments IO refinement with a new condition that seeks correspondence between states of the impl and the spec along at commit points. To accomplish this we add commit actions to the set lambda and label them with state information when they are executed. View-refinement catches state discrepancies right when it happens. In fact these state discrepancies are early warnings for future IO refinement violations. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

View-refinement Checking Refinement Witness ordering Spec trace M=Ø
Call Insert(3) viewImpl Call LookUp(3) Return“success” Call Insert(4) Return “true” Call Delete(3) Call Insert(3) Unlock A[0] A[0].elt=3 Call LookUp(3) Return“success” Unlock A[1] A[1].elt=4 Call Insert(4) read A[0] Return “true” Call Delete(3) A[0].elt=null Call Insert(3) Unlock A[0] A[0].elt=3 Call LookUp(3) Return“success” Unlock A[1] A[1].elt=4 Call Insert(4) read A[0] Return “true” Call Delete(3) A[0].elt=null Call Insert(3) viewImpl Call LookUp(3) Return“success” Call Insert(4) Return “true” Call Delete(3) Call Insert(3) viewImpl = {3} Call LookUp(3) Return“success” viewImpl = {3,4} Call Insert(4) Return “true” Call Delete(3) viewImpl = {4} M=Ø {3} {3, 4} {4} Spec trace Call Insert(3) Return “success” Call LookUp(3) Return “true” Call Insert(4) Call Delete(3) viewSpec = {3} viewSpec = {3,4} viewSpec = {4} Commit Witness ordering Say the view are computed by running abst func at this point. The checking procedure is similar to checking IO refinement. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Experience Experimental Results Applied to industrial-scale designs
Boxwood (30K LOC) Scan Filesystem Java Libraries Moderate instrumentation effort I/O-refinement Low logging and verification overhead View-refinement More effective in catching errors May take more logging overhead Remove tables. Explain whay cache bug is tricky. Here are the experimental results from application of Vyrd on the Blinktree and cache modules. The overall results show that Vyrd can handle industrial scale designs with modest logging and verification costs. IO refinement requires only method call commit and return actions to be logged so the logging overhead is much less than view refinement requires. Note that the logging overhead includes the logging for IO refinement. But view refinement is more effective in catching bugs the first table shows the big difference in time passes in terms of the number of method calls made before detecting the error. The overhead of logging the actions for view-refinement may take much time as the granularity of actions gets finer. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Observation, VyrdMC Weakness of runtime verification: Need many threads, method calls to trigger concurrency error View refinement improves observability, but no better if bug not triggered Small test case (e.g. two threads, one method call each) sufficient, but must start from non-trivial initial state must pick the right interleaving The interleaving that triggers bug not known a priori 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Observation, VyrdMC Weakness of runtime verification: Poor coverage
Need many threads, method calls to trigger concurrency error View refinement: More observability, but no better if bug not triggered Small test case (e.g. two threads, one method call each) sufficient, but must start from non-trivial initial state must pick the right interleaving (not known a priori) Idea: Lead design to interesting initial state first Run very small, multi-threaded test Explore all distinct thread interleavings of fixed test with execution-based model checker Computationally feasible Get good checking from small test Controllability, coverage: Execution-based model checker Observability, : View refinement 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Outline Motivating example The VYRD tool Experience Conclusions
After introducing the two notion of refinemenent, now it comes to our refinement checking tool Vyrd. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

The VYRD Tool ... Architecture Impl Spec Implreplay Instrument Impl
Write abstraction function Testing thread: Generates Impl trace Logs actions executed Verification thread: Replays Implreplay and Spec using the log Computes viewImpl and viewSpec Checks refinement for -traceImpl and -traceSpec Online/Offline checking Replay Mechanism Impl Implreplay Spec -traceImpl Call Insert(3) Unlock A[0] A[0].elt=3 Call LookUp(3) Return“success” Unlock A[1] A[1].elt=4 read A[0] Return “true” A[0].elt=null Call Insert(4) Call Delete(3) Test harness Refinement Checker ... Write to log Read from log -traceSpec Abstraction function (for checking view-refinement) An extra method of the data structure For the current data structure state, computes the current state of the view variable Vyrd analyzes execution traces of the impl generated by test programs. Vyrd uses two separate threads for the process. The testing thread runs a test harness that generates test programs. A test program makes concurrent method calls to the impl. During the run of the test program, the corresponding execution trace is recorded in a shared sequential log. The verification thread reads the execution trace from the log. Since the verification thread follows the testing thread from behind, it can not access the instantaneous state of the impl. Thus the replaying module re-executes actions from the log on a separate instance of the impl called impl-replay and executes atomic methods on the spec at commit points. During replaying, the replaying mechanism also computes the view variables when it reaches a commit point and annotates the commit actions along the traces with view variables. The refinement checker module checks the resulting lambda traces of impl and the spec for IO and view refinement. These threads can run in online or offline setting. In online checking both threads simultaneously while in offline checking the verification thread runs after the whole test program finishes its work. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

The VYRD Tool Each thread Pick arguments so that
Test Harness The VYRD Tool Thread 1 Thread 2 Thread 3 LookUp(3) Insert(3) Delete(3) Each thread issues a small number (one?) of method calls consecutively Pick arguments so that threads contend over same part of data structure The goal of the test harness is to generate concurrent execution traces of impl. To accomplish this, it forks a number of threads. Each thread issues a number of method calls consecutively. The arguments to these methods are picked so that the thread contend over the same region of the data structure to reveal concurrency errors. The runtime information regarding to each action that is run atomically is recorded to shared log so that at the end we have a linear sequence of actions in the log. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

The VYRD Tool Source code/bytecode instrumentation
in order to log actions in the order they happen Commit actions annotated by user Call Insert(3) Unlock A[0] A[0].elt=3 Call LookUp(3) Return“success” Unlock A[1] A[1].elt=4 read A[0] Return “true” A[0].elt=null Call Insert(4) Call Delete(3) Note: do not describe the figure. To extract the execution trace during testing, we do source code instrumentation on the impl. The instrumented code logs relevant runtime information for actions in the order they are executed. The source code is also annotated with commit actions so that the commit points are identified at runtime. On the right you see an example log contents. The log is a linear order of what actions run by the 4 threads. It includes variable updates, reads as well as method call and return actions. Notice that for each method execution one action is marked as the commit action. Here they are marked with blue arrows. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Guiding Vyrd with a Model Checker
Call InsertP(5,7) Call InsertP(5,7) Call InsertP(6,8) Call InsertP(6,8) Read A[0].elt Read A[0].elt Read A[0].elt A[0].content=5 A[0].content=5 Read A[1].elt A[1].content=7 A[1].content=7 ATOMIC ATOMIC A[0].valid=true A[0].valid=true A[1].valid=true A[1].valid=true A[0].content=6 InsertP(5,7) Returns “success” A[2].content=8 Read A[2].elt ATOMIC Read A[2].elt A[2].content=6 A[0].valid=true A[2].valid=true Read A[3].elt A[3].content=8 ATOMIC InsertP(5,7) Returns “success” A[2].valid=true A[3].valid=true InsertP(6,8) Returns “success” InsertP(6,8) Returns “success”

ExplicitMC (like Java PathFinder)
Important side benefit: Automates, streamlines logging and replaying or implementation Makes it program independent Logged action: Bytecode executed in VM Model checker’s VM has hooks for calling appropriate methods of refinement checker, logger after each bytecode instruction Semantics of replay for VM’s actions (bytecodes) defined, already implemented in model checker’s VM Refinement checker simply uses it Automates most labor-intensive part of runtime refinement checking Programmer must only insert commit point annotations, write abstraction function 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Handling Backtracking
Call InsertP(5,7) Special log entries indicate parent-child relationships in tree, jumps to ancestors Vyrd resets to initial state and replays actions from root to back-tracked ancestor Call InsertP(6,8) Read A[0].elt Read A[0].elt Read A[1].elt A[0].content=5 A[1].content=7 ATOMIC A[1].content=7 A[0].valid=true ATOMIC A[1].valid=true A[0].valid=true A[1].valid=true InsertP(5,7) Returns “success” A[0].content=6 Read A[2].elt A[2].content=8 A[2].content=6 ATOMIC Read A[2].elt Read A[3].elt A[0].valid=true A[3].content=8 ATOMIC A[2].valid=true A[2].valid=true InsertP(5,7) Returns “success” A[3].valid=true InsertP(6,8) Returns “success” InsertP(6,8) Returns “success”

Stateless MC (like Verisoft)
No state caching, just store path from root to node Terminates because state-transition diagram is a DAG/tree. Benefits: Efficient Vyrd is re-started anyway At re-starts, Vyrd resets state, No need to rewind log Disadvantages Must manually implement logging, replay Must instrument implementation source or byte code 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Partial order reductions
Pure blocks, annotations If pure block terminates normally, no net modification of program state Example: Lock node, Check if data is there, Release lock if it isn’t Many industrial-scale programs make extensive use of pure blocks Enumerating all distinct interleavings of even two concurrent methods infeasible if purity is not exploited Solution: Use non-deterministic abstraction of pure block Makes pure execution of pure block independent of any other action 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Experience Verifying Cache+Chunk Manager
Highly concurrent access to each module Methods for manipulating (handle,byte-array) pairs View for Cache+ChunkManager: set of (handle,byte-array) pairs For each handle If Cache hit and dirty entry: Get byte-array from Cache If Cache hit and clean entry: Get byte-array from Cache Cache and Chunk Manager must have same byte-array If Cache miss: Get byte-array from Chunk Manager BLinkTree Cache Chunk Manager Read, Write, Flush, Revoke Allocate, Deallocate We verified the storage modules of Boxwood that consists of Cache and Chunk manager. Both modules are accessed in a highly concurrent manner. They provide public methods for manipulating handle,byte-array pairs where handle is an abtract address for the data encoded into the byte array. We decided the view variable for these modules as the set of handle, byte-array pairs managed by the modules. Both cache and chunk manager manages the same set of handles but the byte-array stored by cache and chunk manager may change as cache may store the last version of the bytearray but not chunk manager. Thus for each handle managed by them, we first look at cache to see if there is any entry with the same handle. If there is a dirty cache entry for the handle we get the bytearray from cache. If there is a clean entry in the cache we again read the bytearray and to make sure that the state being abstracted is valid, we require that the byte arrays in cache and the chunk manager are the same. If cache has no entry for the handle, we fetch the bytearray from chunk manager. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Assume-Guarantee Reasoning for Multi-layered Architectures
Each layer uses layer immediately below Layers do not call methods from components above One-pass verification: Run C1 || C2 || ... || Cn all together at once When replaying Ci run it in conjunction with a separate instance of Si+1 || Si+2 || ... || Sn Models the assumption that the environment of Ci respects its spec for this execution Assumption verified when Ci+1 Ci+2, ..., Cn checked for refinement violations on the same execution C1 S1 C2 S2 Components Specifications (Atomized Versions) C3 S3 ... ... Cn Sn 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

Future Work Implementing VyrdMC around Java PathFinder
Including automatic program abstraction to exploit “pure” annotations Stateless exploration in Java PathFinder? In this part of the talk I’ll tell you about our experience using the Vyrd tool. 14/01/19 PLDI 2005, June 12-15, Chicago, U.S.

VyrdMC: Driving Runtime Refinement Checking Using Model Checkers

Similar presentations

Presentation on theme: "VyrdMC: Driving Runtime Refinement Checking Using Model Checkers"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

VyrdMC: Driving Runtime Refinement Checking Using Model Checkers

Similar presentations

Presentation on theme: "VyrdMC: Driving Runtime Refinement Checking Using Model Checkers"— Presentation transcript:

Similar presentations

About project

Feedback