Runtime Checking of Refinement for Concurrent Software Components

Runtime Checking of Refinement for Concurrent Software Components
Thesis Defense by Tayfun Elmas Advisor: Assis. Prof. Serdar Taşıran Hi all. I’m Tayfun Elmas from Koc University. In this talk I’ll present you a technique for detecting concurrency errors. In this technique we watch for refinement violations at runtime. This is joint work with my advisor Serdar Tasiran and Shaz Qadeer, from Microsoft Research. KOÇ University Graduate School of Sciences and Engineering

Verifying concurrent software is hard !
Widely-used software systems are highly concurrent File systems, databases, internet services Java 1.5 and .Net class libraries Intricate synchronization mechanisms to improve performance Prone to concurrency errors Data loss or corruption Concurrency errors Example: Wrong synchronization of accesses to set of variables Low and high level data races Testing: Difficult to detect and reproduce errors Model checking: State space explosion due to thread interleavings Static analysis: Imprecise for complex software (false negatives) Well, Many widely-used software applications are built on concurrent data structures. Examples are file systems, databases, internet services and some standard Java and C# class libraries. These systems frequently use intricate synchronization mechanisms to get better performance in a concurrent environment. This makes them prone to concurrency errors. Concurrency errors can have serious consequences, such as data loss or corruption. Unfortunately, these errors are typically hard to detect and reproduce through pure testing-based techniques. 2/15/2019 Tayfun Elmas - Thesis Defense

Concurrent software components
Implements an abstract data structure (ADS) Public methods to modify/observe ADS Example: A vector-based class that implements a binary tree Methods to look up, manipulate the tree Building blocks of widely-used software systems A correctly implemented component provides error-free services to rest of the system Important for modular reasoning about the entire system 5 3 8 1 Ø Ø root Clients’ view Component AddNode(4) RemoveNode(1) GetChildren(3) Thread 1 Thread 2 Thread 3 Work on it more 2/15/2019 Tayfun Elmas - Thesis Defense

Tayfun Elmas - Thesis Defense
Abstract atomicity Methods of the component run as if Operations on ADS run as if they update ADS atomically Observed by clients (threads) Through method calls and return values AddNode (4) 5 3 8 4 GetChildren (3) : {1,4} RemoveNode (1) 1 Interleaved execution Clients’ view of execution 2/15/2019 Tayfun Elmas - Thesis Defense

Abstract atomicity Guarantees absence of common concurrency errors Conformance to sequential specification More extensive checking than assertions Criteria in literature (linearizability, reducability) Sometimes too restrictive for industrial systems Declare Boxwood, the Scan file system as incorrect 2/15/2019 Tayfun Elmas - Thesis Defense

Refinement Criterion: For each execution of Impl there exists an “equivalent”, atomic execution of Spec Spec specifies sequential behaviors of interface of Impl Equivalence is per thread basis Observed through method calls and return values. Interleaved execution of Impl Keywords: Linerizability and atomicy are more restrictive. The flexibility in spec gives us a more powerful method to prove correctness of some tricky implmentations. In our approach to verifying concurrent data structures we use refinement as the correctness criterion. The benefits of this choice are that refinement is a more thorough condition than method local assertions and that it provides more observability than pure testing. Correctness conditions like Linearizability and atomicity require that for each execution of impl in a concurrent environment there exists an equivalent atomic execution of the same Impl. However Refinement uses a separate specification and for each execution of the impl refinement requires existence of an equivalent atomic execution of this spec. The specification we use is more permissive than the impl. For example the spec allows methods to terminate exceptionally to model failure due to resource contention in a concurrent environment. However the impl would not allow some of the method executions to fail. We check refinement at runtime using execution traces of the implementation. We do this in order to be able to handle industrial-scale programs. Our approach can be regarded as intermediate between testing and exhaustive verification with respect to the coverage of the whole execution space explored. Atomic execution of Spec AddNode (4) GetChildren (3) : {1, 4} RemoveNode (1) 2/15/2019 Tayfun Elmas - Thesis Defense

Sequential specification
Criteria in literature: Spec in terms of atomic execution of Impl Separate executable specification (Spec) Makes refinement less restrictive Rules out fewer implementations Example: Exceptional method termination due to resource contention between threads in a way not possible in an atomic execution of Impl Decompose verification goal for the component Component satisfies abstract atomicity Criterion: Implementation refines sequential specification Method: Runtime refinement checking Sequential specification is correct May be as complex as the implementation Criterion: Pre- and post-conditions etc. Method: Static analysis, model checking etc. 2/15/2019 Tayfun Elmas - Thesis Defense

Our contributions Theory of refinement for concurrent components Implementing a tool for runtime checking refinement Applying tool on industrial systems Replay Mechanism Implementation Implementation* Specification Impl. trace Test program Refinement Checker Call Insert(3) Unlock A[0] A[0].elt=3 Call LookUp(3) Return“success” Unlock A[1] A[1].elt=4 read A[0] Return “true” A[0].elt=null Call Insert(4) Call Delete(3) ... Write execution trace to log Read execution trace from log Spec. trace The VYRD tool 2/15/2019 Tayfun Elmas - Thesis Defense

Outline Example: Multiset Refinement criteria I/O refinement View refinement Checking refinement at runtime The VYRD tool Experience The Boxwood system Evaluation Conclusions Here is the outline of my talk. First I’ll give a motivating data structure example and and explain how our technique applies to the this example. Then I’ll talk about two different notions of refinement called ... I’ll introduce our runtime verification tool Vyrd and the experience we had by applying Vyrd on industrial scale software. 2/15/2019 Tayfun Elmas - Thesis Defense

Multiset implementation
Multiset data structure M = { 2, 3, 3, 5, 8, 8, 9 } Represented by A[1..n] elt: the integer element valid: Is the element in the set? LookUp (x) for i = 1 to n acquire (A[i]) if (A[i].elt == x && A[i].valid) release (A[i]) return true else return false A 9  8 6  5 3  2 elt valid Our motivating data structure is a multiset. Here is an example of a multiset. Notice that several copies of the same integer can be in the multiset like 3 and 8 in this example. The implementation represents the multiset by an array A with two fields. The content field stores the integer element and the Boolean valid field tells us whether the element is to be included in the multiset or not. For example one representation of the multiset above could be like as the bottom one. On the right you see the implementation for the lookup method. Lookup queries whether a given integer x is in the multiset. It traverses the array A linearly by locking elements one by one and checking if the content is x and the valid field is set. 2/15/2019 Tayfun Elmas - Thesis Defense

Multiset: FindSlot & Insert
FindSlot: Helper routine for InsertPair For space allocation Does not set valid field x not in multiset yet Insert Add x into the multiset FindSlot (x) for i = 1 to n acquire (A[i]) if (A[i].elt == null) A[i].elt = x release (A[i]) return i else return 0 Insert (x) i = FindSlot (x) if (i == 0) return failure A[i].valid = true return success FindSlot is a helper method for an insertion method I will tell you about in the next slide. Given an integer x, it looks for an empty slot to put x in. If it finds one, it allocates the slot for x by setting its content field to x and returns the index, otherwise it returns 0. Notice that it doesn’t set the valid field, so x is not in the multiset yet. Thus it will not be treated as in the set by a Lookup metod that will check this slot. 2/15/2019 Tayfun Elmas - Thesis Defense

Multiset: Delete Delete: Removes an element from the set Delete (x) for i = 1 to n acquire (A[i]) if (A[i].elt == x && A[i].valid) A[i].elt = null A[i].valid = false release(A[i]) return success return failure The last method Delete removes the first occurence of a given integer x from the multiset. It resets the content field of the element to null and and the valid field to false. 2/15/2019 Tayfun Elmas - Thesis Defense

Multiset: InsertPair InsertPair (x, y) i = FindSlot (x) if (i == 0) return failure j = FindSlot (y) if (j == 0) A[i].elt = null acquire (A[i]) acquire (A[j]) A[i].valid = true A[j].valid = true release (A[i]) release (A[j]) return success InsertPair(x,y) Refinement violation if only one of x, y inserted Two separate calls to FindSlot To allocate space for x and y Insertpair has an interesting except. term that is not possible in seq case. Using a sep spec we do not rule out this excep execution. Multiset has an InsertPair method to insert a pair of integers x, y into the contents. The implementation of InsertPair is given on the right. InsertPair makes the multiset example interesting because InsertPair demonstrates the methods in real concurrent systems that first hold up several resources and then completes its operation on all the resources atomically. It is considered an error if one of x or y is inserted and but not the other. To prevent this error, it makes two calls to FindSlot to first allocate slots for x and y. If both FindSlot’s succeed, in a protected block, it includes x and y into the multiset atomically by setting their corresponding valid bits to true. Then it returns success. InsertPair returns failure if either of the FindSlot calls fail. This can happen because of resource contention with other concurrent InsertPair routines. For example, imagine we have an empty multiset of size n. n concurrent InsertPair’s running on this multiset can all find free slots for their x’s but then they may be unable to find slots for their y’s if there is no more empty slots. This causes all the InsertPairs to return failure even though at the beginning there is space for some of them to succeed. 2/15/2019 Tayfun Elmas - Thesis Defense

Multiset: InsertPair InsertPair allows exceptional termination Example: Multiset array of size 2 2 concurrent InsertPair’s both find slots for x and z both fail to find slots for y and t Not possible in atomic execution First InsertPair must succeed Linearizability Each execution must be equivalent to an atomic, linear one that satisfies the sequential specification Execution not buggy, but violates linearizability InsertPair(x,y) InsertPair(z,t)   elt valid   elt valid FindSlot(x) : 1   x elt valid   x elt valid FindSlot(z) : 2 z  x elt valid FindSlot(y) : 0 FindSlot(t) : 0 return failure return failure 2/15/2019 Tayfun Elmas - Thesis Defense

Multiset specification
Spec state M: set of integers Each method Atomic deterministic update/observation of state Given current state, arguments, method return value (if one exists) specifies new Spec state retval: Return value from Impl INSERTPAIR (x, y, retval) if (retval == success) M = M U {x, y} return retval DELETE (x, retval) if (x  M) M = M \ {x} return success return failure LOOKUP (x) return (x  M) ? true : false NOTE: Coordinate the bullt “Given...” with the InsertPair method. Here we give the specification for multiset. The state of the spec is represented by a set M of integers. Each method of the specification specifies an atomic deterministic update or observation of the specification state. A mutator method, given the current state and the arguments, specifies what will the next state be. Notice that some methods also take a return value that affects the behavior of the method. For example InsertPair takes two integers and a return value. If the return value is success it specifies a new state with x and y included. Other return values causes InsertPair to keep the existing state. If the return value is not success it leaves the current state unchanged. The reason for us to let the return value affect the state transition is to model the InsertPairs in the impl that fails due to concurrency. Also there are the Delete method that removes an integer from the multiset and the lookup method that queries the set for a given integer. 2/15/2019 Tayfun Elmas - Thesis Defense

Outline Example: Multiset Refinement criteria I/O refinement View refinement Checking refinement at runtime The VYRD tool Experience The Boxwood system Evaluation Conclusions 2/15/2019 Tayfun Elmas - Thesis Defense

Runtime information Call Insert(3) Call LookUp(3) Return“success” Call Insert(4) Return “true” Call Delete(3) Unlock A[1] A[1].elt=3 Unlock A[2] A[2].elt=4 read A[1] A[1].elt=null Execution trace: Sequence of actions Action: Atomically executed code fragment Operations on program variables Reads/Writes to shared variables Data structure-specific operations i.e., increasing a semaphore. Method call and return actions Different choice of actions to be observed Different notions of refinement I/O refinement: call and return actions Vyrd has two prototype implementations in Java and C# languages. It allows the programmer to work on different levels of granularities for actions. She can treat fine-grained operations like single varible assignment or coarse-grained application specific operations. We use the atomized version of the original data structure impl. As the executable spec by fully synch the impl method bodies and adding nondeterm failure for some methods. Vyrd allows incremental view computation to avoid reading the whole state of the data structure for each commit point. To do incremental update on the view variables it for each action executed on the impl and each method performed on the spec figures out which parts of the view variable to be updated. In addition the view itself can be huge, so the view comparison may have to be done incrementally by comparing only the parts just affected. 2/15/2019 Tayfun Elmas - Thesis Defense

Testing ? Call Insert(3) Call LookUp(3) Return“success” Call Insert(4) Return “true” Call Delete(3) Unlock A[0] A[1].elt=3 Unlock A[2] A[2].elt=4 read A[1] A[1].elt=null Unlock A[1] Common practice: Run long multi-threaded test Perform sanity checks on final state Try possible sequential executions Don’t know which happened first Insert(3) or Delete(3) ? Should 3 be in the multiset at the end? Must accept both possibilities as correct n methods: n! possible interleavings Add animation to coninsiding methods 2/15/2019 Tayfun Elmas - Thesis Defense

I/O refinement Call Insert(3) Call LookUp(3) Return“success” Call Insert(4) Return “true” Call Delete(3) Unlock A[1] A[1].elt=3 Unlock A[2] A[2].elt=4 read A[1] A[1].elt=null M=Ø {3} {3, 4} {4} Spec trace Call Insert(3) Return “success” Call LookUp(3) Call Insert(4) Call Delete(3) M = M U {3} Check 3  M Return “true” M = M U {4} M = M \ {3} Commit Insert(3) Commit LookUp(3) Commit Insert(4) Commit Delete(3) Witness ordering Unlock A[1] A[1].elt=3 Unlock A[2] A[2].elt=4 read A[1] A[1].elt=null M = M U {3} Check 3  M M = M U {4} M = M \ {3} In this slide we’ll explain how we check IO refinement. Again we use the insert operation instead of insertpair to simplify the picture. On the right you see 2/15/2019 Tayfun Elmas - Thesis Defense

Selecting commit points
Hints to refinement checking tools Determine witness ordering For each method Designate lines in source code Multiple lines annotated as commit For each method execution Only one line gets executed as commit action No formal procedure Intuitively, where new data structure state becomes visible to other threads Example: InsertPair I/O refinement violation may indicate bad choice of commit points InsertPair (x, y) i = FindSlot (x) if (i == 0) return failure j = FindSlot (y) if (j == 0) A[i].content = null acquire (A[i]) acquire (A[j]) A[i].valid = true A[j].valid = true release (A[j]) return success Put IO refinement slide with commit points beforehand. Commit points are really hits to refinement checking tools by the user that helps in determining the witness ordering in which the spec trace is constructed. For each public method of the Impl, we designate lines in the source code so that their execution correspond to commit actions. There may be multiple lines annotated as commit. However, for each execution of a method there must be a single action executed as the commit action and its execution brings the method execution to its commit point There is no formal procedure for deciding on the commit points. Intuitively, where the modified state of the data structure becomes visible to other threads should be the primary candidate for a commit point. For example the commit point for the insertpair is where the lock of A[i] is released after inserting both x and y to the set. Even though insertpair changes some shared state by calling findslot beforehand, the elements in allocated slots are not observed as in the set by other threads so it is the commit point where methods by other threads can see x in the set. release (A[i]) // commit 2/15/2019 Tayfun Elmas - Thesis Defense

Need for more observability
Read A[3].elt = null T1: InsertPair(5,7) Read A[1].elt = null LookUp(5)=true, LookUp(7)=true T2: InsertPair(6,8) 6 7  elt valid 8 Overwrites 5! 5 1 2 3 F T Read A[2].elt = null LookUp(6)=true, LookUp(8)=true FINDSLOT (x) // Buggy for i  1 to n if (A[i].elt == null) acquire (A[i]) A[i].elt = x release(A[i]) return i return 0 It would be caught is lookup5 would get interleaved here. IO refinement is still not sufficient to catch some errors that does not appear in method calls and return values. Consider the buggy impl of FindSlot on the left. It does not lock the elements before reading from their content fields. It acquires the lock just before starting the modification. As a result as you see on the right, two concurrently executed insertpairs can get interleaved in a way so that they read the first slot in the array as empty and think that it is available for an insertion but only one of them succeeds in inserting its x into this slot. In this case, the integer 5 inserted by the first thread is overwritten by the second thread. Each thread checks the insertpair it runs by calling lookup methods with the same arguments as the insertpair had after the insertpair finishes. If the lookups are scheduled in this way, they all return true although the last state is inconsistent with the termination status of the methods. The error is there and is observable from the state but IO refinement is unable to detect it in this senario due to the lookup not being scheduled in the right places. 2/15/2019 Tayfun Elmas - Thesis Defense

LookUp(5)=false Read A[3].elt = null T1: InsertPair(5,7) Read A[1].elt = null LookUp(5)=true, LookUp(7)=true T2: InsertPair(6,8) 6 7  elt valid 8 Overwrites 5! 5 1 2 3 F T Read A[2].elt = null LookUp(6)=true, LookUp(8)=true If observer methods don’t get interleaved in the right place Source of bug too far in the past when I/O refinement violation happens Extreme case: No observer methods at all Do not say about the first bullet. IO refinement is still not sufficient to catch some errors that does not appear in method calls and return values. Consider the buggy impl of FindSlot on the left. It does not lock the elements before reading from their content fields. It acquires the lock just before starting the modification. As a result as you see on the right, two insertpairs can get interleaved in a way so that they read the first slot in the array as empty but only one of them succeeds in inserting its x into this slot. In this case, the integer 5 inserted by the first thread is overwritten by the second thread. Each thread checks the insertpair it runs by calling lookup methods with the same arguments as the insertpair had. If the lookups are scheduled in this way, they all return true although the last state is inconsistent with the termination status of the methods. The error is there and is observable from the state but IO refinement is unable to detect it in this senario due to the lookup not being scheduled in the right places. 2/15/2019 Tayfun Elmas - Thesis Defense

View refinement I/O-refinement may miss errors Our solution: View-refinement I/O-refinement + “correspondence” between states of Impl and Spec at commit points Catches state discrepancy right when it happens Early warnings for possible I/O refinement violations As we saw in the previous example with 2 insertpairs IO refinement is not that good at finding refinement errors.The problem is IO refinement relies on observer methods and if the observer methods do not get interleaved in the right place along the trace, IO refinement may miss errors. In the extreme case, if there are no observer methods, IO refinement trivially passes any executions. In another case, when a refinement violation is detected, the source of the bug may be too far in the past so there may need to be an analysis of the trace to the far back. Our solution is view-refinement. View refinement augments IO refinement with a new condition that seeks correspondence between states of the impl and the spec along at commit points. To accomplish this we add commit actions to the set lambda and label them with state information when they are executed. View-refinement catches state discrepancies right when it happens. In fact these state discrepancies are early warnings for future IO refinement violations. 2/15/2019 Tayfun Elmas - Thesis Defense

View variables State correspondence Hypothetical “view” variables must match at commit points “view” variable: Extracts abstract data structure state Updated atomically once by each method viewImpl : state information for Impl For A[1..n] Extract elt if valid == true viewSpec: state information for Spec Elements of the multiset viewSpec  M (nothing to abstract) Another Spec may have state to be abstracted 3  5   elt valid A 9 8 6 viewImpl={3, 3, 5, 5, 8, 8, 9} The state correspondence is obtained by matching view variables from the impl and the spec at commit points. A view variable is a hypothetical variable that extracts an abstract state of the data structure. This abstract state is updated or observed atomically by each method. The view variables that carry state information of the impl and the spec are denoted by viewimpl and viewspec respectively. The view for multiset data structure is the set of integers stored in the multiset. The view variable for the multiset impl extracts the elements in content fields whose corresponding valid fields are true. Thus the view variable for the multiset in the figure does not contain the first 5 and 6 in the view variable. The view var for the spec gets elements from the set M. 2/15/2019 Tayfun Elmas - Thesis Defense

Abstraction for view viewImpl: Computed using abstraction function Defined only at clean states: easy to write the Clean state: No methods in progress We extend it to dirty states View is a canonical representation Canonizes state for view: Exact match not required AbstractionFunction (A) view = Ø for i = 1 to n if (A[i].valid == true) view = view U {A[i].elt} return view elt valid A 1  3 7  6 5  viewImpl={1, 3, 5, 6} Exact state match at commit points not required Match view variables only Different from “commit atomicity” Abstraction function (for checking view-refinement) An extra method of the data structure For the current data structure state, computes the current state of the view variable There may be state variables to be abstracted away in the spec. Later we will see example in which the spec is also a program. The abst func is given by the user. View is a canonical representation of the abstract state. View computation canonize the state so So even though the internal representation of the data structure state are different for two multiset instances the view variable may be identical for both of them and exact match between the data structure states are not required. For example the view for the representations in the figure are the same although the order of elements are different with extra allocated slots. Since the spec has already an abstact representation, its states has nothing to abstract so the view variable for a spec instance is canonized version of the spec state. However this does not mean spec cannot have details to abstract. Our method accepts specs in different levels of detail so as we will see later, the spec can be a program that requires an abstraction function. As for multiset example, the view variable carry information about what integers are stored in the multiset. The canonical representation of view variables for multiset discards the order of elements. For multiset spec, you do not need to do any abstraction, the view is the entire spec state. For the multiset impl the view variable must be extracted from the data structure state. The abstraction function for the multiset impl is given on the right. The abstraction function traverses the array A and abstracts away the valid fields of the elements. It only includes into the view the content field for which the valid field is true into the view. For example if abstraction function traversed the multiset whose state is given below on the right it would not include this element with content 5 and the element with content 6 because their corresponding valid fields are not set. 2/15/2019 Tayfun Elmas - Thesis Defense

View refinement Call Insert(3) Unlock A[1] A[1].elt=3 Call LookUp(3) Return“success” Unlock A[2] A[2].elt=4 Call Insert(4) read A[1] Return “true” Call Delete(3) A[1].elt=null M=Ø {3} {3, 4} {4} Spec trace Call Insert(3) Return “success” Call LookUp(3) Return “true” Call Insert(4) Call Delete(3) M = M U {3} Check 3  M M = M U {4} M = M \ {3} Commit Insert(3) Commit LookUp(3) Commit Insert(4) Commit Delete(3) Witness ordering viewImpl = {3} viewImpl = {3,4} viewImpl = {4} A[1].elt=3 A[2].elt=4 A[0].elt=null viewSpec = {3} viewSpec = {3,4} viewSpec = {4} Say the view are computed by running abst func at this point. The checking procedure is similar to checking IO refinement. 2/15/2019 Tayfun Elmas - Thesis Defense

Read A[3].elt = null T1: InsertPair(5,7) Read A[1].elt = null LookUp(5)=true, LookUp(7)=true T2: InsertPair(6,8) 6 7  elt valid 8 Overwrites 5! 5 1 2 3 F T Read A[2].elt = null LookUp(6)=true, LookUp(8)=true LookUp(5)=false FINDSLOT (x) // Buggy for i  1 to n if (A[i].elt == null) acquire (A[i]) A[i].elt = x release(A[i]) return i return 0 IO refinement is still not sufficient to catch some errors that does not appear in method calls and return values. Consider the buggy impl of FindSlot on the left. It does not lock the elements before reading from their content fields. It acquires the lock just before starting the modification. As a result as you see on the right, two insertpairs can get interleaved in a way so that they read the first slot in the array as empty but only one of them succeeds in inserting its x into this slot. In this case, the integer 5 inserted by the first thread is overwritten by the second thread. Each thread checks the insertpair it runs by calling lookup methods with the same arguments as the insertpair had. If the lookups are scheduled in this way, they all return true although the last state is inconsistent with the termination status of the methods. The error is there and is observable from the state but IO refinement is unable to detect it in this senario due to the lookup not being scheduled in the right places. 2/15/2019 Tayfun Elmas - Thesis Defense

InsertP(5,7) Returns “success” InsertP(6,8) Returns “success”
Catching FindSlot bug Specification Call InsertPair(5,7) Return “success” Call InsertPair(6,8) Commit action viewSpec = Ø {5, 7} {5, 6, 7, 8} viewImpl = Ø {6, 7, 8} Call InsertP(6,8) Implementation Read A[1].elt Call InsertP(5,7) A[1].valid=true A[1].content=5 InsertP(5,7) Returns “success” A[2].content=7 A[2].valid=true A[1].content=6 A[3].valid=true A[3].content=8 Read A[2].elt InsertP(6,8) Returns “success” Commit action Suppose we are checking the execution trace with buggy FindSlot implementation. First we drive the spec according to the witness ordering of the commit points. Then we track the valuations of the view variables for the impl and the spec at commit points and compare them for equivalence. Here at the commit point of the second insertpair, 5 disappears from the view variable of the impl but it is there in the view var of the spec. Then a refinement error is signalled that says an element is overwritten betweeen last two commit actions. 2/15/2019 Tayfun Elmas - Thesis Defense

The VYRD tool Test harness Impl Write traceImpl to log Call Insert(3) Unlock A[0] A[0].elt=3 Call LookUp(3) Return“success” Unlock A[1] A[1].elt=4 read A[0] Return “true” A[0].elt=null Call Insert(4) Call Delete(3) ... Read traceImpl from log VYRD Abstraction function (for checking view-refinement) An extra method of the data structure For the current data structure state, computes the current state of the view variable Vyrd analyzes execution traces of the impl generated by test programs. Vyrd uses two separate threads for the process. The testing thread runs a test harness that generates test programs. A test program makes concurrent method calls to the impl. During the run of the test program, the corresponding execution trace is recorded in a shared sequential log. The verification thread reads the execution trace from the log. Since the verification thread follows the testing thread from behind, it can not access the instantaneous state of the impl. Thus the replaying module re-executes actions from the log on a separate instance of the impl called impl-replay and executes atomic methods on the spec at commit points. During replaying, the replaying mechanism also computes the view variables when it reaches a commit point and annotates the commit actions along the traces with view variables. The refinement checker module checks the resulting lambda traces of impl and the spec for IO and view refinement. These threads can run in online or offline setting. In online checking both threads simultaneously while in offline checking the verification thread runs after the whole test program finishes its work. Replay Mechanism Implreplay Spec Refinement Checker traceImpl traceSpec 2/15/2019 Tayfun Elmas - Thesis Defense

Logging Communication between testing and verification Reduces impact on concurrency of the program Worker threads Collecting runtime information (write execution trace into log) Checker thread Analyzing the execution (read logged actions) Impl Test harness Call Insert(3) Unlock A[0] A[0].elt=3 Call LookUp(3) Return“success” Unlock A[1] A[1].elt=4 read A[0] Return “true” A[0].elt=null Call Insert(4) Call Delete(3) ... Write traceImpl to log 2/15/2019 Tayfun Elmas - Thesis Defense

Replaying mechanism Only actions are logged, not the states View refinement checker needs to access snapshots of states at commit points Re-executes logged actions on Implreplay States at commit points are equal for Impl and Implreplay Executes atomic methods of Spec in witness ordering Methods may be given return values from the log as input Replay Mechanism Implreplay Spec traceImpl Refinement Checker Read traceImpl from log traceSpec 2/15/2019 Tayfun Elmas - Thesis Defense

Atomized Impl as Spec Different executable specifications for multiset Example: Atomic binary tree-based representation viewSpec: Integers stored at the tree nodes Spec : atomized version of Impl Slight modification Fully synchronized methods Return value from Impl method additional argument to Spec methods Makes Spec more permissive than Impl Can handle failure return values Check correctness of sequential executions of Impl separately viewSpec = {1, 3, 4, 5, 6, 8, 9} 5 3 8 4 6 9 1 Global lock serializes the methods, this separates from sequential verification checking concurrency errors. Easily usable, no need to write a separate spec. Our approach can employ specifications in different forms and in different levels of detail. However, it is straightforward to obtain an executable specification from the atomized version of the impl. To accomplish this, we use a single global lock to fully synchronize the method bodies. You can see the modified version of the ınsertpair method for the spec. Slight modification is needed to make the methods to model nondeterministic failures of the impl due to concurrency. The spec methods accept as an input parameter, return values from the impl trace. When a method takes the return value “failure”, it does nothing even though without this check it can complete its job with a “success” as the return value. Although they use the same method impl, the impl and spec can have diff states at a commit point. But we use the canonicized forms of view and exact match between the impl and the spec states is not required. 2/15/2019 Tayfun Elmas - Thesis Defense

Atomized Impl as Spec class MultisetSpec Multiset multiset; Multiset(Multiset currentset) { multiset = currentset.clone(); } synchronized InsertPair (x, y, returnValue) if (returnValue == failure) { return failure; return multiset.InsertPair(x, y); synchronized Insert(x, returnValue) if (returnValue == failure) { return failure; } return multiset.Insert(x); synchronized Delete(x, returnValue) return multiset.Delete(x); synchronized LookUp(x) return multiset.LookUp(x); 2/15/2019 Tayfun Elmas - Thesis Defense

Abstraction function Abstraction function: Extra method of the data structure For current state, computes current state of the view variable Returns another data structure pointed by the view variable Checks invariants Properties on consistency of internal data structure Checking at commit points Simplifies the view computation For Multiset: (A[i].valid)  (A[i].elt != null) Set<int> AbstractionFunction() { Set<int> M = new Set<int>(); for(int i=0; i<n; ++i){ if(A[i].valid) { Assert(A[i].elt != null); M.Insert(A[i].elt); } return M; The abst func is given by the user. One more responsibility for the programmer is the abstraction function. The abstraction function is written as a regular method of the data structure. It operates on the current state of the data structure and returns the corresponding state of the view variable. We require that the abstraction function be defined at only clean states at which no method is in the middle of manipulating the data structure. For a programmer who knows its data structure well, it is easy for her to figure out how to extract view variables from a clean state. For example the abstraction function for multiset at the bottom is defined for the clean states. It runs by assuming that no insertpair, delete or lookup method are in the middle of operation. We also extend the abst function to dirty commit points at which other methods are in the middle of their operation. We refer to our paper for the details. 2/15/2019 Tayfun Elmas - Thesis Defense

Experience: The Boxwood Project
BLINKTREE MODULE Root Pointer Node Internal Pointer Node Level n Level n Root Level Leaf Pointer Node ... Level 0 ... Data Node ... Data Nodes ... GlobalDiskAllocator CHUNK MANAGER MODULE Replicated Disk Manager Write Read Read Write CACHE MODULE Dirty Cache Entries ... Clean Cache Entries Cache We verified all modules of this system. We caught an interesting difficult tricky error that has gone undetected. We run Vyrd on the Boxwood Project from Microsoft. The goal of the Boxwood project is building a distributed abtract storage infrastructure for applications with high data storage and retrieval requirements. Here, you see a high level picture of Boxwood. Boxwood has a concurrent blinktree implementation in Blinktree module. The blinktree module uses a cache module to store and retrieve its data at tree nodes quickly. The cache module makes its data persistent using a chunk manager module. The chunk manager implements distributed storage system that abstracts the storage system from the upper layers. 2/15/2019 Tayfun Elmas - Thesis Defense

Verifying storage modules
Highly concurrent access to each module Methods for manipulating (handle, byte-array) pairs View for Cache+ChunkManager: set of (handle, byte-array) pairs For each handle If Cache hit : Get byte-array from Cache If Cache miss: Get byte-array from Chunk Manager Invariants (i) For a clean cache entry for (handle,byte-array) same byte-array associated with handle in Chunk Manager (ii) A cache entry is in either clean or dirty list, not both BLinkTree Cache Chunk Manager Read, Write, Flush, Revoke Allocate, Deallocate We verified the storage modules of Boxwood that consists of Cache and Chunk manager. Both modules are accessed in a highly concurrent manner. They provide public methods for manipulating handle,byte-array pairs where handle is an abtract address for the data encoded into the byte array. We decided the view variable for these modules as the set of handle, byte-array pairs managed by the modules. Both cache and chunk manager manages the same set of handles but the byte-array stored by cache and chunk manager may change as cache may store the last version of the bytearray but not chunk manager. Thus for each handle managed by them, we first look at cache to see if there is any entry with the same handle. If there is a dirty cache entry for the handle we get the bytearray from cache. If there is a clean entry in the cache we again read the bytearray and to make sure that the state being abstracted is valid, we require that the byte arrays in cache and the chunk manager are the same. If cache has no entry for the handle, we fetch the bytearray from chunk manager. 2/15/2019 Tayfun Elmas - Thesis Defense

Concurrency bug in Cache
Had not been caught by developers Current version does not contain bug Cause: Concurrent execution of Write and Flush on the same entry Write to a dirty entry not locked properly Flush writes corrupted data to Chunk Manager Marks entry clean Bug manifestation: Violation of invariant (i) Cache entry is correct Permanent storage has corrupted data Hard to catch through testing As long as Read’s hit in Cache, return value correct Caught through testing only if Cache fills, clean entry in Cache is evicted No “Write”s to entry in the meantime Entry read after eviction Very unlikely handle Chunk Manager A Y Cache Read/Write handle Chunk Manager A X Cache A Y Read/Write The error we showed in the previous slide is caused by a Flush method interleaved with a Write method which does not properly protecting a dirty entry it accesses. Thus the operations on the same handle by Flush can ge interleaved without consent of the Write method. This error is hard to catch through testing because the data in the cache entry is correct while the data in the persistent storage is not. Thus all the reads that hit in Cache return the correct value. It would be caught through testing only if the clean cache entry was evicted and no writes were performed in the meantime. Then a read would detect the errornous value and signal the refinement violation. However, this scenarios is very unlikely since cache is accessed intensively in a highly concurrent manner. 2/15/2019 Tayfun Elmas - Thesis Defense

Experimental Results Scalable method: Caught bugs in industrial-scale designs BLinkTree and Cache in Boxwood (≈ 30K LOC) Java Libraries with known bugs (≈ 1K LOC) java.util.Vector, java.util.StringBuffer Vector- and binary tree-based multiset implementations (≈ 1.5K LOC) Moderate instrumentation effort Several lines for each method I/O-refinement Low logging and verification overhead: BLinkTree: Logging 17% over testing, refinement check 27% View-refinement BLinkTree: Logging 20% over testing, refinement check 137% More effective in catching errors Boxwood Cache: View-refinement: 26 random methods before error I/O-refinement: 539 random methods before error Remove tables. Explain why cache bug is tricky. Here are the experimental results from application of Vyrd on the Blinktree and cache modules. The overall results show that Vyrd can handle industrial scale designs with modest logging and verification costs. IO refinement requires only method call commit and return actions to be logged so the logging overhead is much less than view refinement requires. Note that the logging overhead includes the logging for IO refinement. But view refinement is more effective in catching bugs the first table shows the big difference in time passes in terms of the number of method calls made before detecting the error. The overhead of logging the actions for view-refinement may take much time as the granularity of actions gets finer. Remove tables. Explain whay cache bug is tricky. 2/15/2019 Tayfun Elmas - Thesis Defense

Conclusions Refinement as a correctness condition Refinement to sequential specification implies abstract atomicity Novel correctness criteria: I/O and view refinement Improved observability over testing Early and precise detection of errors Less restrictive specification rules out fewer implementations Runtime checking refinement criteria Intermediate between testing & exhaustive verification Improves testing with practically-efficient checking Scalable, powerful technique with reasonable computational cost The VYRD tool [Elmas, Tasiran, Qadeer, PLDI’05] Low-overhead, applicable to industrial-scale programs Able to catch intricate errors (i.e. Boxwood Cache bug) In this talk we introduced a runtime checking technique for refinement. It is a powerful verification technique with reasonable computation cost. Although it imay not be exhaustive, complex industrial scale software can be effectively checked by our method. As a future work we plan to improve the coverage of testing by using model checkers to explore the interleavings more structurally. Our current work in this approach will appear in RV 2005. We have an upcoming paper at runtime verification workshop. 2/15/2019 Tayfun Elmas - Thesis Defense

Future Directions Verifying refinement during model checking VyrdMC [Elmas, Tasiran, RV’05] Use execution-based model checker, i.e. JPF, to drive VYRD Improve coverage, reduce instrumentation burden More extensive validation from small test programs Explore all distinct thread interleavings Measuring Coverage for Concurrency LP metric [Tasiran, Elmas, Bolukbasi, Keremoglu, FATES’05] Inspired by atomicity, refinement violations in real examples “What else should we try to test if we are after concurrency errors?” Code-based, practical metric that captures concurrency errors well Static methods for verifying refinement Reason about atomic update of the view using symbolic execution Extending existing approaches for refinement [Owicki, Gries, 1976] 2/15/2019 Tayfun Elmas - Thesis Defense

Acknowledgements Assis. Prof. Serdar Taşıran Koç University, College of Engineering Shaz Qadeer Software Productivity Tools Group Microsoft Research, Redmond, WA Lidong Zhou, Chandu Thekkath Storage Infrastructure Group, The Boxwood Project Microsoft Research, Silicon Valley, CA Grants TÜBİTAK, BAYG Microsoft Research Gift 2/15/2019 Tayfun Elmas - Thesis Defense

Thanks - Questions ? Runtime Checking of Refinement for Concurrent Software Components Tayfun Elmas KOÇ University Graduate School of Sciences and Engineering 2/15/2019 Tayfun Elmas - Thesis Defense

Linearizability, atomicity/reducibility
Criterion: For each execution of implementation (Impl) there exists an “equivalent” atomic execution of Impl Criteria in literature too restrictive Atomicity defined on operations over shared program variables Declare Boxwood, the Scan file system systems incorrect 2/15/2019 Tayfun Elmas - Thesis Defense

The VYRD tool Test program Implementation Write execution trace to log Call Insert(3) Unlock A[0] A[0].elt=3 Call LookUp(3) Return“success” Unlock A[1] A[1].elt=4 read A[0] Return “true” A[0].elt=null Call Insert(4) Call Delete(3) ... Read execution trace from log VYRD Abstraction function (for checking view-refinement) An extra method of the data structure For the current data structure state, computes the current state of the view variable Vyrd analyzes execution traces of the impl generated by test programs. Vyrd uses two separate threads for the process. The testing thread runs a test harness that generates test programs. A test program makes concurrent method calls to the impl. During the run of the test program, the corresponding execution trace is recorded in a shared sequential log. The verification thread reads the execution trace from the log. Since the verification thread follows the testing thread from behind, it can not access the instantaneous state of the impl. Thus the replaying module re-executes actions from the log on a separate instance of the impl called impl-replay and executes atomic methods on the spec at commit points. During replaying, the replaying mechanism also computes the view variables when it reaches a commit point and annotates the commit actions along the traces with view variables. The refinement checker module checks the resulting lambda traces of impl and the spec for IO and view refinement. These threads can run in online or offline setting. In online checking both threads simultaneously while in offline checking the verification thread runs after the whole test program finishes its work. Replay Mechanism Implementation* Specification Refinement Checker Impl. trace Spec. trace 2/15/2019 Tayfun Elmas - Thesis Defense

Runtime information Call Insert(3) Call LookUp(3) Return“success” Call Insert(4) Return “true” Call Delete(3) Unlock A[0] A[0].elt=3 Unlock A[1] A[1].elt=4 read A[0] A[0].elt=null Execution trace: Sequence of actions Action: Atomically executed code fragment Operations on program variables Method call and return actions Analysis in different granularities Fine-grained analysis Reads/Writes to shared variables Example: assignment to a primitive typed variable Coarse-grained analysis Data structure-specific operations Example: balancing children of a red-black tree Vyrd has two prototype implementations in Java and C# languages. It allows the programmer to work on different levels of granularities for actions. She can treat fine-grained operations like single varible assignment or coarse-grained application specific operations. We use the atomized version of the original data structure impl. As the executable spec by fully synch the impl method bodies and adding nondeterm failure for some methods. Vyrd allows incremental view computation to avoid reading the whole state of the data structure for each commit point. To do incremental update on the view variables it for each action executed on the impl and each method performed on the spec figures out which parts of the view variable to be updated. In addition the view itself can be huge, so the view comparison may have to be done incrementally by comparing only the parts just affected. 2/15/2019 Tayfun Elmas - Thesis Defense

-refinement Refinement criterion in the form of trace equivalence -trace: Project trace onto subset  of actions Choose what actions to observe -refinement For each -trace of Impl there exists an equivalent -trace of Spec Actions in -traces are comparable Different choice of  Different notion of refinement Example: I/O refinement  = {call and return actions} 2/15/2019 Tayfun Elmas - Thesis Defense

View refinement  = {call, return and commit actions} For Impl trace Each commit action is annotated with viewImpl For Spec trace a commit action annotated with viewSpec is inserted between call and return actions 2/15/2019 Tayfun Elmas - Thesis Defense

Checking observer methods
Precise commit point difficult for observers Need to record all reads Commit point depends on other methods More efficient check Record only call and return actions Commit action can lie anywhere between call and return For execution of an observer method ’ Consider Spec states s0,s1,s2,...,sn at commit points Check: Return value consistent with at least one of s0,s1,s2,...,sn Call(’) Return(’) Commit(1) Commit(2) Commit(n) s1 s2 sn s0 Actually deciding on the commit points precisely is hard for observer methods. The programmer should track the reads performed during the observer method execution. Any of the reads can be a commit action and which one is the commit action depends on the return value of the method so you can not decide on the commit point until the method finishes its work and returns. Thus tracing many reads and not being able to decide on the commit point before having the method returned makes checking observer methods hard. Therefore we make a more efficient check by recording only the call and return action of an observer method. The commit action can be anywhere between the call and the return actions. Here you see a depiction of our approach. For an execution of the observer method nu-prime, we track the spec states at commit points of the mutator methods that lie between the call and the return of nu-prime. Then we check that the return value of prime-nu is consistent with at least one of these spec states. 2/15/2019 Tayfun Elmas - Thesis Defense

Commit blocks InsertPair (x, y) i = FindSlot (x) if (i == 0) return failure j = FindSlot (y) if (j == 0) A[i].elt = null acquire (A[i]) acquire (A[j]) A[i].valid = true A[j].valid = true release (A[i]) release (A[j]) return success Problem: Computing viewImpl when several methods in progress Example: Some other method commits while InsertPair is between (valid = true) Only one element appears in viewImpl Solution: Designate “commit blocks” Requirement: Easily verified to be atomic Check re-ordered version of Impl trace Using commutativity Commit blocks do not overlap Commit block One problem with computing the view of the impl during replaying is the dirty states at which more than one methods are in progress. For example suppose that another method commits when an insertpair execution is at this point. If you run the abstraction function at this state only the integer x appears in the view. However, no method can see this element before the lock of the ith element is released. Our solution to deal with dirty states is designating some code blocks as commit blocks. The commit blocks are required to be atomic. We omit checking atomicity of commit blocks by relying that atomicity of the blocks is easily verified separately. By identfying the actions in a commit block, we reorder the actions in the commit block using commutativity arguments so that no commit action gets interleaved in the commit block. Then we check the reordered version of the trace. 2/15/2019 Tayfun Elmas - Thesis Defense

Incremental view computation
Avoid retraversing entire data structure state May be huge: entire database or disk contents At each commit point: Determine “parts” of view modified since last commit point Only compute and compare those parts Currently data structure-specific A B D X Y Z C ...... view= {..., A, ..., B, ..., C,..., D, ...} For some cases the size of the view may get huge and the abstraction function may have to traverse big chunks of variables like the ones in an entire database or an entire disk. Therefore at each commit point we determine the parts of the data structure that are modified since the last commit point and do not recheck the unchanged parts of the view. The incremental view computation and comparison is mostly dependent on the representation of the data structure. For example the figure shows a part of a btree. The view is extracted by going from the left to the right and reading the contents of the leaf nodes. This part has the pointer nodes X, Y and Z. The data nodes contains the data elements A through D. If granularity of an action is at pointer node or data node level, and execution of an action changes the pointer node Y or one of the data nodes B through C, the part of view B through that is coming from Y is replaced and the other parts remains the same. 2/15/2019 Tayfun Elmas - Thesis Defense

Verifying the BLinkTree module
Highly concurrent operations with overtaking traversals Abstraction function for BLinkTree Indexing structure abstracted away view: Ordered list of leaf node contents Effort: a few days to work out the view computation After becoming familiar with Boxwood code view We also verified the BLinkTree module for refinement. Blinktree has highly concurrent methods that run for high performance. Especially the traversal of the tree should be so efficient that methods can take over each other on their way to increase the throughput. The abstraction function for blinktree is tricky. The clients are capable of modifying or accessing only the data stored in the data nodes using public methods. Thus the view of blinktree is extracted as the ordered list of the leaf data node contents. The indexing structure above the leaves are abstracted since it is used only for fast access to the data nodes. We computed view incrementally during the verification since the number of nodes in a tree may get huge. The effort to figure out the view computation and to write the abstraction function took a few days after getting familiar with the Boxwood implementation. 2/15/2019 Tayfun Elmas - Thesis Defense

Concurrency bug in the Cache
Write(handle,AB) starts Flush() starts handle T Z Chunk Manager X Y Cache handle X Z Chunk Manager A Y Cache handle A Y Chunk Manager Cache Write(handle, AB) ends Flush() ends handle A Y Chunk Manager A B Cache handle A Y Chunk Manager A B Cache In this slide we demonstrate you the error Vyrd caught in Cache module. Think of dirty cache containing the data X.Y. Let the persistent storage for the same handle contains T.Z. At the end corrupted data is written to the persistent storage. It breaks the invariant (i) because for the clean cache entry the byte arrays in cache and chunk manager are not the same. 2/15/2019 Tayfun Elmas - Thesis Defense

Runtime Checking of Refinement for Concurrent Software Components

Similar presentations

Presentation on theme: "Runtime Checking of Refinement for Concurrent Software Components"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Runtime Checking of Refinement for Concurrent Software Components

Similar presentations

Presentation on theme: "Runtime Checking of Refinement for Concurrent Software Components"— Presentation transcript:

Similar presentations

About project

Feedback