1 Perracotta: Mining Temporal API Rules from Imperfect Traces Jinlin Yang David Evans Deepali Bhardwaj Thirumalesh Bhat Manuvir Das
2 Agenda Background Perracotta Approximate Inference Contextual Properties Chaining Results Critique
3 Background Software tasks require specifications. What are the intended behaviors of the program? Expected outputs are necessary for testing. What aspects can be modified during maintenance of software? etc. The problem: Many programs don't provide precise specifications. Many implementations are not consistent with specifications. As maintenance continues, specifications become increasingly incorrect. So? Several researchers have been motivated to study the problem of specification inference.
4 Background (2) Previous work: Proposed an approach to dynamically infer temporal properties of programs. “...dynamically?” To infer specifications by analysing sample execution traces of a program. “...temporal properties”?...deal with the order of occurrence of program events. ex) Property: acquiring a lock should eventually be followed by a release of the lock. This paper addresses only inference of Alternating Properties, because “It's the strictest of the template patterns and has proven the most useful in practice.” ex) If A and B are events specified to behave according to the Alternating Property, “ABABAB”, not “ABABBAAB”.
5 Background (3) Current limitations: Inference algorithms scale poorly with the size of the program and input trace. Inferred properties only worked for perfect traces....in other words, there was an assumption that the implementations of the traced programs were correct. Many of the inferred properties are uninteresting; since uninteresting properties add up, this makes it unfeasible for large programs.
6 Background (4) Quick summary of background: Specifications tend to be inconsistent with implementations. Researchers have developed techniques to dynamically infer specification properties of programs....however they suffer three notable limitations: These techniques only work with small programs The techniques cannot detect specifications from inconsistent implementations. Many of the inferred properties are uninteresting “noise”. Contributions of this paper/Perracotta: Address the above problems.
7 Perracotta Contributions: Approximate Inference Makes it possible to infer a specification from an implementation that is not always consistent with that specification Contextual Properties Allows for more precise inferences by keeping track of contextual data instead of mere static behaviors Selection Heuristics Filters out the uninteresting properties, thus greatly reducing the amount of “noise” from the inferred properties.
8 Approximate Inference Imperfect traces: It is expected that allocated memory will be freed eventually, to avoid memory leak. Unfortunately even skilled programmers fail to be consistent with this temporal property, especially in complex code. A sample execution trace that is not consistent with this property is an example of an “imperfect trace”. Previous algorithms failed to associate/infer properties from imperfect traces, thus ruining the whole point of dynamic inference. Approximate Inference: Infers a specification from an implementation, even if the implementation is bugged with respect to that specification
9 Approximate Inference (2) “STSTSTSTSTSSS” Events (functions) S and T are called multiple times in a trace. They alternate n times, but there is no alternation in last three S's. Will Perracotta (successfully) infer the Alternating Property from this? Perracotta's approach: Partition the trace: [S,T][S,T][S,T][S,T][S,T][SSS] Number of alternations = 4 Number of total partitions = 5 Satisfaction rate of Alternating Property = 4/5 The higher the satisfaction rate, the more likely it is to infer. The lower the predefined threshold value, the more likely it is to infer.
10 Contextual Properties “An acquired lock should eventually be released.” But what if there are multiple locks? Context-neutral: “a lock was acquired” Context-sensitive: “Lock#1 was acquired” Without context, the inference tool will treat all locks as if they are the same lock, thus not being able to infer anything about lock behavior.
11 Selection Heuristics Goal: reduce the number of uninteresting properties ex) “There is always a printf() before a readLine() prompt”. Infers nothing about API specifications - uninteresting. Reachability: Events with call relationships are less interesting than ones without.
12 Selection Heuristics (2) Name Similarities Functions with similar names are likely to be associated with interesting inferences. ex) ExAcquireFastMutexUnsafe vs ExReleaseFastMutexUnsafe Chaining: Suppose A->B, B->C, and A->C, that is, A and B have Alternating Property, as to BC and AC. There are three inferences. It is correct to chain them into a single inference – ABC....thus reducing the number of inferences from 3 to 1, reducing “noise”.
13 Results Test Programs: Daisy JBoss Windows Kernel APIs
14 Results (2)
15 Critique Likes: The scope of the paper was narrowed down to the Alternating Property. It tackled problems worth solving in the field of Dynamic Inference. Actually detected a major bug in Windows. Dislikes: Definitions of important keywords were all over the place in the paper. It's not very clear how they got the algorithm to work with large programs. The “Approach” section was actually merely the approach for the previous work, not the current one. There was no explicit label to clarify where the overview of Perracotta starts.