Tayfun Elmas, Serdar Tasiran Koç University, Istanbul, Turkey


Similar presentations
Transaction Management: Concurrency Control CS634 Class 17, Apr 7, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.

Reduction, abstraction, and atomicity: How much can we prove about concurrent programs using them? Serdar Tasiran Koç University Istanbul, Turkey Tayfun.
Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets Tayfun Elmas 1, Shaz Qadeer 2, Serdar Tasiran 1 1 Koç University, İstanbul,
Chapter 4: Trees Part II - AVL Tree
CS 267: Automated Verification Lecture 10: Nested Depth First Search, Counter- Example Generation Revisited, Bit-State Hashing, On-The-Fly Model Checking.
Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
CSE 486/586 CSE 486/586 Distributed Systems PA Best Practices Steve Ko Computer Sciences and Engineering University at Buffalo.
Runtime Refinement Checking of Concurrent Data Structures (the VYRD project) Serdar Tasiran Koç University, Istanbul, Turkey Shaz Qadeer Microsoft Research,
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Design Principles and Common Security Related Programming Problems
/ PSWLAB Thread Modular Model Checking by Cormac Flanagan and Shaz Qadeer (published in Spin’03) Hong,Shin Thread Modular Model.
Simplifying Linearizability Proofs Using Reduction and Abstraction Serdar Tasiran Koc University, Istanbul, Turkey Tayfun Elmas, Ali Sezgin, Omer Subasi.
FILE I/O: Low-level 1. The Big Picture 2 Low-Level, cont. Some files are mixed format that are not readable by high- level functions such as xlsread()
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Threads prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University 1July 2016Processes.
Disk Cache Main memory buffer contains most recently accessed disk sectors Cache is organized by blocks, block size = sector’s A hash table is used to.
File System Consistency
Memory Management.
Database Recovery Techniques
Non Contiguous Memory Allocation
User-Written Functions
Module 11: File Structure
Transactions and Reliability
CHP - 9 File Structures.
Transaction Management and Concurrency Control
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Database Management System
CS522 Advanced database Systems
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Faster Data Structures in Transactional Memory using Three Paths
Multiple Writers and Races
Reasoning About Code.
Reasoning about code CSE 331 University of Washington.
Heap Sort Example Qamar Abbas.
Chapter 12: Query Processing
Specifying Multithreaded Java semantics for Program Verification
Stack Data Structure, Reverse Polish Notation, Homework 7
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
Design by Contract Fall 2016 Version.
Optimizing Malloc and Free
Fault Injection: A Method for Validating Fault-tolerant System
Filesystems 2 Adapted from slides of Hank Levy
Design and Programming
Chapter 6 Intermediate-Code Generation
Serdar Tasiran, Tayfun Elmas Koç University, Istanbul, Turkey
Introduction to Database Systems
Over-Approximating Boolean Programs with Unbounded Thread Creation
Indexing and Hashing Basic Concepts Ordered Indices
A Robust Data Structure
VyrdMC: Driving Runtime Refinement Checking Using Model Checkers
VyrdMC: Driving Runtime Refinement Checking Using Model Checkers
Exception Handling Imran Rashid CTO at ManiWeber Technologies.
Runtime Checking of Refinement for Concurrent Software Components
Serdar Tasiran, Tayfun Elmas Koç University, Istanbul, Turkey
Serdar Tasiran, Tayfun Elmas, Guven Bolukbasi, M
CSE451 Virtual Memory Paging Autumn 2002
Tayfun Elmas, Serdar Tasiran Koç University, Istanbul, Turkey
Programming with Shared Memory Specifying parallelism
Concurrent Cache-Oblivious B-trees Using Transactional Memory
Cache writes and examples
Transactions, Properties of Transactions
Presentation transcript:

VYRD: VerifYing Concurrent Programs by Runtime Refinement-Violation Detection Tayfun Elmas, Serdar Tasiran Koç University, Istanbul, Turkey Shaz Qadeer Microsoft Research, Redmond, WA Hi all. I’m Tayfun Elmas from Koc University. In this talk I’ll present you a technique for detecting concurrency errors. In this technique we watch for refinement violations at runtime. This is joint work with my advisor Serdar Tasiran and Shaz Qadeer, from Microsoft Research. 21/05/19 PLDI 2005

Verifying Concurrent Data Structures Motivation Widely-used software systems are built on concurrent data structures File systems, databases, internet services Standard Java and C# class libraries Intricate synchronization mechanisms to improve performance Prone to concurrency errors Concurrency errors Data loss/corruption Difficult to detect, reproduce through testing Well, Many widely-used software applications are built on concurrent data structures. Examples are file systems, databases, internet services and some standard Java and C# class libraries. These systems frequently use intricate synchronization mechanisms to get better performance in a concurrent environment. This makes them prone to concurrency errors. Concurrency errors can have serious consequences, such as data loss or corruption. Unfortunately, these errors are typically hard to detect and reproduce through pure testing-based techniques. PLDI 2005

Our Approach Refinement as Correctness Criterion Refinement For each execution of the implementation (Impl) there exists an “equivalent”, atomic execution of Spec Linearizability, atomicity (by reduction) For each execution of Impl there exists an “equivalent” atomic execution of Impl Refinement less restrictive Rules out fewer implementations Example: more permissive Spec allows exceptional method termination in a way not possible in an atomic execution of Impl Keywords: Linerizability and atomicy are more restrictive. The flexibility in spec gives us a more powerful method to prove correctness of some tricky implmentations. In our approach to verifying concurrent data structures we use refinement as the correctness criterion. The benefits of this choice are that refinement is a more thorough condition than method local assertions and that it provides more observability than pure testing. Correctness conditions like Linearizability and atomicity require that for each execution of impl in a concurrent environment there exists an equivalent atomic execution of the same Impl. However Refinement uses a separate specification and for each execution of the impl refinement requires existence of an equivalent atomic execution of this spec. The specification we use is more permissive than the impl. For example the spec allows methods to terminate exceptionally to model failure due to resource contention in a concurrent environment. However the impl would not allow some of the method executions to fail. We check refinement at runtime using execution traces of the implementation. We do this in order to be able to handle industrial-scale programs. Our approach can be regarded as intermediate between testing and exhaustive verification with respect to the coverage of the whole execution space explored. PLDI 2005

Our Approach Runtime Checking of Refinement Refinement For each execution of Impl there exists an “equivalent”, atomic execution of Spec Use refinement as correctness criterion More thorough than assertions More observability than pure testing Runtime verification: Check refinement using execution traces Can handle industrial-scale programs Intermediate between testing & exhaustive verification Keywords: Linerizability and atomicy are more restrictive. The flexibility in spec gives us a more powerful method to prove correctness of some tricky implmentations. In our approach to verifying concurrent data structures we use refinement as the correctness criterion. The benefits of this choice are that refinement is a more thorough condition than method local assertions and that it provides more observability than pure testing. Correctness conditions like Linearizability and atomicity require that for each execution of impl in a concurrent environment there exists an equivalent atomic execution of the same Impl. However Refinement uses a separate specification and for each execution of the impl refinement requires existence of an equivalent atomic execution of this spec. The specification we use is more permissive than the impl. For example the spec allows methods to terminate exceptionally to model failure due to resource contention in a concurrent environment. However the impl would not allow some of the method executions to fail. We check refinement at runtime using execution traces of the implementation. We do this in order to be able to handle industrial-scale programs. Our approach can be regarded as intermediate between testing and exhaustive verification with respect to the coverage of the whole execution space explored. PLDI 2005

Outline Example Refinement The VYRD tool Experience Conclusions I/O-refinement View-refinement The VYRD tool Experience Conclusions Here is the outline of my talk. First I’ll give a motivating data structure example and and explain how our technique applies to the this example. Then I’ll talk about two different notions of refinement called ... I’ll introduce our runtime verification tool Vyrd and the experience we had by applying Vyrd on industrial scale software. PLDI 2005

Multiset Multiset data structure Represented by A[1..n] Implementation: LookUp Multiset Multiset data structure M = { 2, 3, 3, 3, 9, 8, 8, 5 } Represented by A[1..n] content: the element valid: Is it in the set? LookUp (x) for i = 1 to n acquire(A[i]) if (A[i].content==x && A[i].valid) release(A[i]) return true else release(A[i]) return false A 9  8 6  5 3  2 content valid Our motivating data structure is a multiset. Here is an example of a multiset. Notice that several copies of the same integer can be in the multiset like 3 and 8 in this example. The implementation represents the multiset by an array A with two fields. The content field stores the integer element and the Boolean valid field tells us whether the element is to be included in the multiset or not. For example one representation of the multiset above could be like as the bottom one. On the right you see the implementation for the lookup method. Lookup queries whether a given integer x is in the multiset. It traverses the array A linearly by locking elements one by one and checking if the content is x and the valid field is set. PLDI 2005

Multiset FindSlot: Helper routine for InsertPair For space allocation Implementation: FindSlot Multiset FindSlot: Helper routine for InsertPair For space allocation Does not set valid field x not in multiset yet FindSlot (x) for i = 1 to n acquire(A[i]) if (A[i].content==null) A[i].content = x release(A[i]) return i else return 0 FindSlot is a helper method for an insertion method I will tell you about in the next slide. Given an integer x, it looks for an empty slot to put x in. If it finds one, it allocates the slot for x by setting its content field to x and returns the index, otherwise it returns 0. Notice that it doesn’t set the valid field, so x is not in the multiset yet. Thus it will not be treated as in the set by a Lookup metod that will check this slot. PLDI 2005

Multiset Implementation: InsertPair InsertPair(x,y) Refinement violation if only one of x, y inserted Two separate calls to FindSlot To allocate space for x and y InsertPair allows exceptional termination Example: MS array of size 2 2 concurrent InsertPair’s both find slots for x’s both fail to find slots for y’s Not possible in atomic execution InsertPair (x, y) i = FindSlot (x) if (i == 0) return failure j = FindSlot (y) if (j == 0) A[i].content = null acquire(A[i]) acquire(A[j]) A[i].valid = true A[j].valid = true release(A[i]) release(A[j]) return success Insertpair has an interesting except. term that is not possible in seq case. Using a sep spec we do not rule out this excep execution. Multiset has an InsertPair method to insert a pair of integers x, y into the contents. The implementation of InsertPair is given on the right. InsertPair makes the multiset example interesting because InsertPair demonstrates the methods in real concurrent systems that first hold up several resources and then completes its operation on all the resources atomically. It is considered an error if one of x or y is inserted and but not the other. To prevent this error, it makes two calls to FindSlot to first allocate slots for x and y. If both FindSlot’s succeed, in a protected block, it includes x and y into the multiset atomically by setting their corresponding valid bits to true. Then it returns success. InsertPair returns failure if either of the FindSlot calls fail. This can happen because of resource contention with other concurrent InsertPair routines. For example, imagine we have an empty multiset of size n. n concurrent InsertPair’s running on this multiset can all find free slots for their x’s but then they may be unable to find slots for their y’s if there is no more empty slots. This causes all the InsertPairs to return failure even though at the beginning there is space for some of them to succeed. PLDI 2005

Multiset Specification Spec state M: set of integers Each method Atomic deterministic state update/observation Given current state, arguments and method return value (if one exists) specifies new Spec state INSERTPAIR (x, y, retval) if (retval == success) M = M U {x, y} return retval LOOKUP (x) if (x  M) return true else return false DELETE (x) M = M \ {x} NOTE: Coordinate the bullt “Given...” with the InsertPair method. Here we give the specification for multiset. The state of the spec is represented by a set M of integers. Each method of the specification specifies an atomic deterministic update or observation of the specification state. A mutator method, given the current state and the arguments, specifies what will the next state be. Notice that some methods also take a return value that affects the behavior of the method. For example InsertPair takes two integers and a return value. If the return value is success it specifies a new state with x and y included. Other return values causes InsertPair to keep the existing state. If the return value is not success it leaves the current state unchanged. The reason for us to let the return value affect the state transition is to model the InsertPairs in the impl that fails due to concurrency. Also there are the Delete method that removes an integer from the multiset and the lookup method that queries the set for a given integer. PLDI 2005

Outline Example Refinement The VYRD tool Experience Conclusions I/O-refinement View-refinement The VYRD tool Experience Conclusions After the introducing multiset now I’ll explain the first notion of refinement called I/O refinement. PLDI 2005

Multiset I/O Refinement Witness ordering Spec trace M=Ø Call Insert(3) Call LookUp(3) Return“success” Call Insert(4) Return “true” Call Delete(3) Unlock A[0] A[0].elt=3 Unlock A[1] A[1].elt=4 read A[0] A[0].elt=null M=Ø {3} {3, 4} {4} Spec trace Call Insert(3) Return “success” Call LookUp(3) Call Insert(4) Call Delete(3) M = M U {3} Check 3  M Return “true” M = M U {4} M = M \ {3} Commit Insert(3) Commit LookUp(3) Commit Insert(4) Commit Delete(3) Witness ordering Unlock A[0] A[0].elt=3 Unlock A[1] A[1].elt=4 read A[0] A[0].elt=null M = M U {3} Check 3  M M = M U {4} M = M \ {3} In this slide we’ll explain how we check IO refinement. Again we use the insert operation instead of insertpair to simplify the picture. On the right you see PLDI 2005

I/O-refinement Selecting Commit Actions Commit points: Determines witness ordering Drives Spec Hints to refinement checking tools For each method Designate lines in source code Multiple lines annotated as commit For each method execution Only one line should get executed as commit action No formal procedure Intuitively, where new data structure state becomes visible to other threads Example: InsertPair InsertPair (x, y) i = FindSlot (x) if (i == 0) return failure j = FindSlot (y) if (j == 0) A[i].content = null acquire(A[i]) acquire(A[j]) A[i].valid = true A[j].valid = true release(A[i]) release(A[j]) return success Put IO refinement slide with commit points beforehand. Commit points are really hits to refinement checking tools by the user that helps in determining the witness ordering in which the spec trace is constructed. For each public method of the Impl, we designate lines in the source code so that their execution correspond to commit actions. There may be multiple lines annotated as commit. However, for each execution of a method there must be a single action executed as the commit action and its execution brings the method execution to its commit point There is no formal procedure for deciding on the commit points. Intuitively, where the modified state of the data structure becomes visible to other threads should be the primary candidate for a commit point. For example the commit point for the insertpair is where the lock of A[i] is released after inserting both x and y to the set. Even though insertpair changes some shared state by calling findslot beforehand, the elements in allocated slots are not observed as in the set by other threads so it is the commit point where methods by other threads can see x in the set. release(A[i]) // commit PLDI 2005

Outline Example Refinement The VYRD tool Experience Conclusions I/O-refinement View-refinement The VYRD tool Experience Conclusions Now it comes to introducing another notion of refinement, called view-refinement. PLDI 2005

LookUp(5)=true, LookUp(7)=true LookUp(6)=true, LookUp(8)=true Need for more observability View-refinement T1: InsertPair(5,7) T2: InsertPair(6,8) Read A[0].elt = null FINDSLOT (x) // Buggy for i  1 to n if (A[i].content == null) acquire(A[i]) A[i].content = x release(A[i]) return i return 0 Read A[0].elt = null Read A[1].elt = null 1 2 3 elt     valid F F F F elt 5 7   valid F F F F elt 5 7   Overwrites 5! valid T T F F LookUp(5)=true, LookUp(7)=true 1 2 3 elt 6 7   It would be caught is lookup5 would get interleaved here. IO refinement is still not sufficient to catch some errors that does not appear in method calls and return values. Consider the buggy impl of FindSlot on the left. It does not lock the elements before reading from their content fields. It acquires the lock just before starting the modification. As a result as you see on the right, two concurrently executed insertpairs can get interleaved in a way so that they read the first slot in the array as empty and think that it is available for an insertion but only one of them succeeds in inserting its x into this slot. In this case, the integer 5 inserted by the first thread is overwritten by the second thread. Each thread checks the insertpair it runs by calling lookup methods with the same arguments as the insertpair had after the insertpair finishes. If the lookups are scheduled in this way, they all return true although the last state is inconsistent with the termination status of the methods. The error is there and is observable from the state but IO refinement is unable to detect it in this senario due to the lookup not being scheduled in the right places. valid T T F F Read A[2].elt = null elt 6 7 8  valid T F elt 6 7 8  valid T F LookUp(5)=false LookUp(6)=true, LookUp(8)=true PLDI 2005

LookUp(5)=true, LookUp(7)=true LookUp(6)=true, LookUp(8)=true I/O-refinement may miss errors View-refinement T1: InsertPair(5,7) T2: InsertPair(6,8) If observer methods don’t get interleaved in the right place Source of bug too far in the past when I/O refinement violation happens Read A[0].elt = null Read A[0].elt = null Read A[1].elt = null 1 2 3 elt     valid F F F F elt 5 7   valid F F F F elt 5 7   Overwrites 5! valid T T F F LookUp(5)=true, LookUp(7)=true 1 2 3 elt 6 7   Do not say about the first bullet. IO refinement is still not sufficient to catch some errors that does not appear in method calls and return values. Consider the buggy impl of FindSlot on the left. It does not lock the elements before reading from their content fields. It acquires the lock just before starting the modification. As a result as you see on the right, two insertpairs can get interleaved in a way so that they read the first slot in the array as empty but only one of them succeeds in inserting its x into this slot. In this case, the integer 5 inserted by the first thread is overwritten by the second thread. Each thread checks the insertpair it runs by calling lookup methods with the same arguments as the insertpair had. If the lookups are scheduled in this way, they all return true although the last state is inconsistent with the termination status of the methods. The error is there and is observable from the state but IO refinement is unable to detect it in this senario due to the lookup not being scheduled in the right places. valid T T F F Read A[2].elt = null elt 6 7 8  valid T F elt 6 7 8  valid T F LookUp(6)=true, LookUp(8)=true PLDI 2005

View-refinement More Observability I/O-refinement may miss errors Our solution: View-refinement I/O-refinement + “correspondence” between states of Impl and Spec at commit points Catches state discrepancy right when it happens Early warnings for possible I/O refinement violations As we saw in the previous example with 2 insertpairs IO refinement is not that good at finding refinement errors.The problem is IO refinement relies on observer methods and if the observer methods do not get interleaved in the right place along the trace, IO refinement may miss errors. In the extreme case, if there are no observer methods, IO refinement trivially passes any executions. In another case, when a refinement violation is detected, the source of the bug may be too far in the past so there may need to be an analysis of the trace to the far back. Our solution is view-refinement. View refinement augments IO refinement with a new condition that seeks correspondence between states of the impl and the spec along at commit points. To accomplish this we add commit actions to the set lambda and label them with state information when they are executed. View-refinement catches state discrepancies right when it happens. In fact these state discrepancies are early warnings for future IO refinement violations. PLDI 2005

View-refinement   View Variables State correspondence Hypothetical “view” variables must match at commit points “view” variable: Extracts abstract data structure state Updated atomically once by each method viewImpl : state information for Impl For A[1..n] Extract content if valid=true viewSpec: state information for Spec Elements of the multiset viewSpec  M (nothing to abstract) Other Spec’s may have state to be abstracted viewImpl={3, 3, 5, 5, 8, 8, 9} 3  5   content valid A 9 8 6 The state correspondence is obtained by matching view variables from the impl and the spec at commit points. A view variable is a hypothetical variable that extracts an abstract state of the data structure. This abstract state is updated or observed atomically by each method. The view variables that carry state information of the impl and the spec are denoted by viewimpl and viewspec respectively. The view for multiset data structure is the set of integers stored in the multiset. The view variable for the multiset impl extracts the elements in content fields whose corresponding valid fields are true. Thus the view variable for the multiset in the figure does not contain the first 5 and 6 in the view variable. The view var for the spec gets elements from the set M. PLDI 2005

View-refinement     View Variables for Multiset viewImpl: Computed using abstraction function View is a canonical representation Canonizes state for view: Exact match not required AbstractionFunction (A) view = Ø for i = 1 to n if (A[i].content != null && A[i].valid == true) view = view U {A[i].content} return view content valid A 1  3 7  6 5 Abstraction function (for checking view-refinement) An extra method of the data structure For the current data structure state, computes the current state of the view variable There may be state variables to be abstracted away in the spec. Later we will see example in which the spec is also a program. The abst func is given by the user. View is a canonical representation of the abstract state. View computation canonize the state so So even though the internal representation of the data structure state are different for two multiset instances the view variable may be identical for both of them and exact match between the data structure states are not required. For example the view for the representations in the figure are the same although the order of elements are different with extra allocated slots. Since the spec has already an abstact representation, its states has nothing to abstract so the view variable for a spec instance is canonized version of the spec state. However this does not mean spec cannot have details to abstract. Our method accepts specs in different levels of detail so as we will see later, the spec can be a program that requires an abstraction function. As for multiset example, the view variable carry information about what integers are stored in the multiset. The canonical representation of view variables for multiset discards the order of elements. For multiset spec, you do not need to do any abstraction, the view is the entire spec state. For the multiset impl the view variable must be extracted from the data structure state. The abstraction function for the multiset impl is given on the right. The abstraction function traverses the array A and abstracts away the valid fields of the elements. It only includes into the view the content field for which the valid field is true into the view. For example if abstraction function traversed the multiset whose state is given below on the right it would not include this element with content 5 and the element with content 6 because their corresponding valid fields are not set. viewImpl={1, 3, 5, 6} content valid A 6  1   5 3 PLDI 2005

View-refinement Checking Refinement Witness ordering Spec trace M=Ø Call Insert(3) Unlock A[0] A[0].elt=3 Call LookUp(3) Return“success” Unlock A[1] A[1].elt=4 Call Insert(4) read A[0] Return “true” Call Delete(3) A[0].elt=null M=Ø {3} {3, 4} {4} Spec trace Call Insert(3) Return “success” Call LookUp(3) Return “true” Call Insert(4) Call Delete(3) M = M U {3} Check 3  M M = M U {4} M = M \ {3} Commit Insert(3) Commit LookUp(3) Commit Insert(4) Commit Delete(3) Witness ordering viewImpl = {3} viewImpl = {3,4} viewImpl = {4} A[0].elt=3 A[1].elt=4 A[0].elt=null viewSpec = {3} viewSpec = {3,4} viewSpec = {4} Say the view are computed by running abst func at this point. The checking procedure is similar to checking IO refinement. PLDI 2005

LookUp(5)=true, LookUp(7)=true LookUp(6)=true, LookUp(8)=true Catching FindSlot Bug View-refinement T1: InsertPair(5,7) T2: InsertPair(6,8) Read A[0].elt = null FINDSLOT (x) // Buggy for i  1 to n if (A[i].content == null) acquire(A[i]) A[i].content = x release(A[i]) return i return 0 Read A[0].elt = null Read A[1].elt = null 1 2 3 elt     valid F F F F elt 5 7   valid F F F F elt 5 7   Overwrites 5! valid T T F F LookUp(5)=true, LookUp(7)=true 1 2 3 elt 6 7   IO refinement is still not sufficient to catch some errors that does not appear in method calls and return values. Consider the buggy impl of FindSlot on the left. It does not lock the elements before reading from their content fields. It acquires the lock just before starting the modification. As a result as you see on the right, two insertpairs can get interleaved in a way so that they read the first slot in the array as empty but only one of them succeeds in inserting its x into this slot. In this case, the integer 5 inserted by the first thread is overwritten by the second thread. Each thread checks the insertpair it runs by calling lookup methods with the same arguments as the insertpair had. If the lookups are scheduled in this way, they all return true although the last state is inconsistent with the termination status of the methods. The error is there and is observable from the state but IO refinement is unable to detect it in this senario due to the lookup not being scheduled in the right places. valid T T F F Read A[2].elt = null elt 6 7 8  valid T F elt 6 7 8  valid T F LookUp(6)=true, LookUp(8)=true PLDI 2005

InsertP(5,7) Returns “success” InsertP(6,8) Returns “success” Catching FindSlot Bug View-refinement Specification {5, 7} {5, 6, 7, 8} M = Ø Call InsertPair(5,7) Return “success” Call InsertPair(6,8) viewSpec = Ø {5, 7} {5, 6, 7, 8} viewImpl = Ø {6, 7, 8} Commit InsertPair(5,7) Commit InsertPair(6,8) Suppose we are checking the execution trace with buggy FindSlot implementation. First we drive the spec according to the witness ordering of the commit points. Then we track the valuations of the view variables for the impl and the spec at commit points and compare them for equivalence. Here at the commit point of the second insertpair, 5 disappears from the view variable of the impl but it is there in the view var of the spec. Then a refinement error is signalled that says an element is overwritten betweeen last two commit actions. InsertP(5,7) Returns “success” Call InsertP(5,7) InsertP(6,8) Returns “success” Call InsertP(6,8) Read A[0].elt Read A[0].elt Read A[0].elt A[0].content=5 A[1].content=7 A[0].valid=true A[0].valid=true A[1].valid=true A[0].content=6 A[0].valid=true Read A[2].elt A[2].content=8 A[2].valid=true Implementation PLDI 2005

Outline Example Refinement The VYRD tool Experience Conclusions I/O-refinement View-refinement The VYRD tool Experience Conclusions After introducing the two notion of refinemenent, now it comes to our refinement checking tool Vyrd. PLDI 2005

The VYRD Tool ... Architecture Impl Implreplay Spec Instrument Impl in order to log actions in the order they happen Commit actions annotated by user Write abstraction function Test harness Impl Write to log Enables online/offline checking ... Call LookUp(3) Call Insert(3) A[0].elt=3 Unlock A[0] Call Delete(3) Call Insert(4) read A[0] A[1].elt=4 Return “true” Unlock A[1] Return“success” Return“success” A[0].elt=null Unlock A[0] Return“success” Read from log Abstraction function (for checking view-refinement) An extra method of the data structure For the current data structure state, computes the current state of the view variable Vyrd analyzes execution traces of the impl generated by test programs. Vyrd uses two separate threads for the process. The testing thread runs a test harness that generates test programs. A test program makes concurrent method calls to the impl. During the run of the test program, the corresponding execution trace is recorded in a shared sequential log. The verification thread reads the execution trace from the log. Since the verification thread follows the testing thread from behind, it can not access the instantaneous state of the impl. Thus the replaying module re-executes actions from the log on a separate instance of the impl called impl-replay and executes atomic methods on the spec at commit points. During replaying, the replaying mechanism also computes the view variables when it reaches a commit point and annotates the commit actions along the traces with view variables. The refinement checker module checks the resulting lambda traces of impl and the spec for IO and view refinement. These threads can run in online or offline setting. In online checking both threads simultaneously while in offline checking the verification thread runs after the whole test program finishes its work. Execute logged actions Replay Mechanism Run methods in witness ordering Implreplay Spec Refinement Checker traceImpl traceSpec PLDI 2005

The VYRD Tool Atomized Impl as Spec Spec : atomized version of Impl INSERTPAIR (x, y, retval) acquire(global_lock) if (retval == failure) return failure i = FindSlot (x) .......... j = FindSlot (y) acquire(A[i]) acquire(A[j]) A[i].valid = true A[j].valid = true release(A[i]) release(A[j]) release(global_lock) return success Spec : atomized version of Impl Fully synchronized methods Use single global lock Separates checking concurrency errors from sequential verification Slight modification: Return value from Impl method additional argument to Spec methods More permissive than Impl Can handle failure return values Exact state match at commit points not required Match view variables only Different from “commit atomicity” Global lock serializes the methods, this separates from sequential verification checking concurrency errors. Easily usable, no need to write a separate spec. Our approach can employ specifications in different forms and in different levels of detail. However, it is straightforward to obtain an executable specification from the atomized version of the impl. To accomplish this, we use a single global lock to fully synchronize the method bodies. You can see the modified version of the ınsertpair method for the spec. Slight modification is needed to make the methods to model nondeterministic failures of the impl due to concurrency. The spec methods accept as an input parameter, return values from the impl trace. When a method takes the return value “failure”, it does nothing even though without this check it can complete its job with a “success” as the return value. Although they use the same method impl, the impl and spec can have diff states at a commit point. But we use the canonicized forms of view and exact match between the impl and the spec states is not required. PLDI 2005

Outline Example Refinement The VYRD tool Experience Conclusions I/O-refinement View-refinement The VYRD tool Experience Conclusions In this part of the talk I’ll tell you about our experience using the Vyrd tool. PLDI 2005

Replicated Disk Manager The Boxwood Project Experience BLINKTREE MODULE Root Pointer Node Internal Pointer Node Level n+1 ................ Level n .................. Root Level Leaf Pointer Node ... Level 0 ... ...... ..... ..... ..... ..... ............... .... .... ......... ........ ...... ...... ..... ..... ..... ..... ............... .... .... ......... ........ ...... Data Node ... Data Nodes ... GlobalDiskAllocator CHUNK MANAGER MODULE Replicated Disk Manager Write Read Read Write CACHE MODULE Dirty Cache Entries ... Clean Cache Entries Cache We verified all modules of this system. We caught an interesting difficult tricky error that has gone undetected. We run Vyrd on the Boxwood Project from Microsoft. The goal of the Boxwood project is building a distributed abtract storage infrastructure for applications with high data storage and retrieval requirements. Here, you see a high level picture of Boxwood. Boxwood has a concurrent blinktree implementation in Blinktree module. The blinktree module uses a cache module to store and retrieve its data at tree nodes quickly. The cache module makes its data persistent using a chunk manager module. The chunk manager implements distributed storage system that abstracts the storage system from the upper layers. PLDI 2005

Experience Experimental Results Scalable method: Caught bugs in industrial-scale designs Boxwood (30K LOC) Scan Filesystem (Windows NT) Java Libraries with known bugs Moderate instrumentation effort Several lines for each method I/O-refinement Low logging and verification overhead: BLinkTree: Logging 17% over testing, refinement check 27% View-refinement BLinkTree: Logging 20% over testing, refinement check 137% More effective in catching errors Cache: View-refinement: 26 random methods before error I/O-refinement: 539 random methods before error Remove tables. Explain why cache bug is tricky. Here are the experimental results from application of Vyrd on the Blinktree and cache modules. The overall results show that Vyrd can handle industrial scale designs with modest logging and verification costs. IO refinement requires only method call commit and return actions to be logged so the logging overhead is much less than view refinement requires. Note that the logging overhead includes the logging for IO refinement. But view refinement is more effective in catching bugs the first table shows the big difference in time passes in terms of the number of method calls made before detecting the error. The overhead of logging the actions for view-refinement may take much time as the granularity of actions gets finer. PLDI 2005

Experience Concurrency Bug in Cache Very similar to bug found in Scan file system Had not been caught by developers Current version does not contain bug Bug manifestation: Cache entry is correct Permanent storage has corrupted data Cause of bug: Concurrent execution of Write and Flush on the same entry Write to a dirty entry not locked properly Flush writes corrupted data to Chunk Manager Marks entry clean Hard to catch through testing As long as Read’s hit in Cache, return value correct Caught through testing only if Cache fills, clean entry in Cache is evicted No “Write”s to entry in the meantime Entry read after eviction Very unlikely PLDI 2005

Conclusions Runtime refinement checking Powerful technique with reasonable computational cost Effective for complex industrial-scale software Key novelty: Improves observability of testing Future work: Improving coverage/controllability Reducing manual instrumentation by limited use of model checking Tayfun Elmas, Serdar Tasiran VyrdMC: Driving Runtime Refinement Checking with Model Checkers (To appear in) Fifth Workshop on Runtime Verification (RV'05). The University of Edinburgh, Scotland, UK. July 12, 2005. In this talk we introduced a runtime checking technique for refinement. It is a powerful verification technique with reasonable computation cost. Although it imay not be exhaustive, complex industrial scale software can be effectively checked by our method. As a future work we plan to improve the coverage of testing by using model checkers to explore the interleavings more structurally. Our current work in this approach will appear in RV 2005. We have an upcoming paper at runtime verification workshop. PLDI 2005

Tayfun Elmas, Serdar Tasiran Questions VYRD: VerifYing Concurrent Programs by Runtime Refinement-Violation Detection Tayfun Elmas, Serdar Tasiran College of Engineering Koç University, Istanbul, Turkey Shaz Qadeer Microsoft Research Redmond, U.S. PLDI 2005