1 Inferring Specifications A kind of review. 2 The Problem Most programs do not have specifications Those that do often fail to preserve the consistency.

1 Inferring Specifications A kind of review

2 The Problem Most programs do not have specifications Those that do often fail to preserve the consistency between specification and implementation Specification are needed for verification, testing and maintenance

3 Suggested Solution Automatic discovery of specifications

4 Our Playground Purpose  Verification, testing, promote understanding Specification representation  Contracts, properties, automaton, … Inference technique  Static, dynamic, combination Human intervention

5 Restrictions and Assumptions Learning automata from positive traces is undecidable [Gold67] An executing program is usually “almost” correct If a miner can identify the common behavior, it can produce a correct specification, even from programs that contain errors

6 Perracota: Mining Temporal API Rules From Imperfect Traces Jinlin Yang, David Evans Department of Computer Science, University Of Virginia Deepali Bhardwai, Thirumalesh Bhat, Manuvir Das Center for Software Excellence, Microsoft Corp. ICSE ‘06

7 Key Contribution Addressing the problem of imperfect traces Techniques for incorporating contextual information into the inference algorithm Heuristics for automatically identifying interesting properties

8 Perracota A dynamic analysis tool for automatically inferring temporal properties Takes the program's execution traces as input and outputs a set of temporal properties it likely has Program Instrumented Program instrumentation Test Suite Execution Traces Property Templates Inferred Properties Testing Inference

9 Property Templates NameQREValid Example Invalid Examples Response S*(PP*SS*)*SPPSSSPPSSP Alternating (PS)*PSPSPSS, PPS, SPS MultiEffect (PSS*)*PSSPPS, SPS MultiCause (PP*S)*PPSPSS, SPS EffectFirst S*(PS)*SPSPSS, PPS CauseFirst (PP*SS*)*PPSSSPSS, SPPS OneCause S*(PSS*)*SPSSPPSS, SPPS OneEffect S*(PP*S)*SPPSPPSS, SPSS

10 Initial approach Algorithm is developed for inferring two-event properties (scalability) Complexity O(nL) time, O(n 2 ) space  n – number of distinct events  L – length of trace Each cell in the matrix holds the current state of a state machine that represent the alternating pattern between the pair of events Require perfect traces

11 Approximate Inference Partition trace into sub-traces  For example:PSPSPSPSPSPPPP PS|PS|PS|PS|PS|PPP Compute satisfaction rate of each template  The ratio between partitions satisfying the alternate property and the total number of partitions Set a satisfaction threshold

12 Contextual Properties lock1.acq lock2.acq lock2.rel lock1.rel neutral sensitive No property Inferred lock1.acq→lock2.acqlock1.acq→lock2.rel lock1.acq→lock1.rellock2.acq→lock2.rel lock2.acq→lock1.rellock2.rel→lock1.rel slicing //lock1 acq rel //lock2 acq rel lock1.acq→lock2.acq

13 Selecting Interesting Properties Reachablity  Mark a property P→S as probably uninteresting if S is reachable from P in the call graph  For example:  The relationship between C and D is not obvious from inspecting either C or D X() { … C(); … D(); … } A() { … B(); … }

14 Selecting Interesting Properties Name Similarity  A property is more interesting if it involves similarly named events  For Example: ExAcquireFastMutexUnsafe ExReleaseFastMutexUnsafe  Compute word similarity as

15 Chaining Connect related Alternating properties into chains  A→B, B→C and A→C implies A→B→C Provide a way to compose complex state machines out of many small state machines Identification of complex multi-event properties without suffering a high computational cost

16 SMArTIC: Towards Building an Accurate, Robust and Scalable Specification Miner David Lo and Siau-Cheng Khoo Department of Computer Science, National University of Singapore FSE ‘06

17 Hypotheses Mined specifications will be more accurate when:  erroneous behavior is removed before learning  they are obtained by merging the specifications learned from clusters of related traces than when they are obtained from learning the entire traces

18 Filtering Merging Traces Clustering Learning Merged Automaton Filtered Traces Clusters of Filtered Traces Automatons Structure

19 Filtering How can you tell what’s wrong if you don’t know what’s right? Filter out erroneous traces based on common behavior Common behavior is represented by “statistically significant” temporal rules

20 Pre → Post Rules Look for rules of the form a → bc when a occurs b must eventually occur after a, and c must also eventually occur after b Rules exhibiting high confidence and reasonable support can be considered as “statistical” invariants  Support – Number of traces exhibiting the property pre→post  Confidence – the ratio of traces exhibiting the property pre→post to those exhibiting the property pre

21 Clustering Convert a set of traces into groups of related traces  Localize inaccuracies  Scalability

22 Clustering Algorithm Variant of the k-medoid algorithm  Compute the distance between a pair of data items (traces) based on a similarity metric  k is the number of clusters to create Algorithm: Repeatedly increase k until a local maximum is reached

23 Similarity Metric Use global sequence alignment algorithm Problem: doesn’t work well in the presence of loops Solution: compare the regular expression representation FTFTALILLAVAV F--TAL-LLA-AV ABCBCDABCBCBCD ABCD (A(BC)+D)+ ABCD

24 Learning Learn PFSAs from clusters of filtered traces  PFSA per cluster A “place holder”  In current experiment – sk-strings learner

25 Merging Merge multiple PFSAs into one The merged PFSA accepts exactly the union of sentences accepted by the multiple PFSAs Ensures probability integrity  Probability for transition  in output PFSA

26 From Uncertainty to Belief: Inferring the Specifications Within Ted Kremenek, Paul Twohey, Andrew Y. Ng, Dawson Engler Computer Science Dep., Stanford University Godmar Back Computer Science Dep., Virginia Tech OSDI ‘06

27 Motivating Example Problem: Inferring ownership roles  Ownership idiom: a resource has at any time exactly one owning pointer Infer annotation  ro – returns ownership  co – claims ownership Is fopen a ro? fread/fclose a co? FILE* fp = fopen(“myfile.txt”, r); fread(buffer, n, 1000, fp); fclose(fp);

28 Basic Ownership Rules fp = ro(); ¬co(fp); co(fp); fp = ¬ro(); ¬co(fp); ¬co(fp); ¬Owned Owned Uninit ClaimedOK Bug ¬co ¬ro ro co end-of- path any use ¬co

29 Goal Provide a framework that: 1. Allows users to easily express every intuition and domain-specific observation they have that is useful for inferring annotations 2. Reduce such knowledge in a sound way to meaningful probabilities (“common currency”)

30 Annotation Inference 1. Define the set of possible annotations to infer 2. Model domain-specific knowledge and intuitions in the probabilistic model 3. Compute annotations probabilitis

31 Factors – Modeling Beliefs Relations mapping the possible values of one or more annotations variables to non- negative real numbers For example:  CheckFactor  belief: any random place might have a bug 10% of times; set  = 0.1,  = 0.9  : if DFA = OK {  : if DFA = Bug f =

32 Factors Other factors  bias toward specifications with ro  without co  based on naming conventions  …

33 Annotation Factor Graph fopen:retfread:4fclose:1fdopen:retfwrite:4 f annotation variables prior beliefs behavioral tests

34 Results fopen:retfread:4fclose:1DFAf f (fread:4)P(A) ro¬coco+0.90.70.483 ¬ro¬coco+0.90.70.282 ro¬coco-0.10.70.125 roco¬co-0.10.30.054 roco¬co-0.10.30.023 ¬ro¬coco-0.10.70.013 ¬roco¬co-0.10.30.013 ¬roco¬co-0.10.30.006

35 QUARK: Empirical Assessment of Automaton-based Specification Miners David Lo and Siau-Cheng Khoo Department of Computer Science, National University of Singapore WCRE ‘06

36 QUARK Framework Assessing the quality of specification miners Measure performance along multiple dimensions  Accuracy – extent of inferred specification being representative of the actual specification  Scalability – ability to infer large specifications  Robustness – sensitivity to errors

37 Quality Assessment QUARK Framework Simulator Model (PFSA) Trace Generator User Defined Miner Measurements

38 Accuracy (Trace Similarity) Absence of error Metrics:  Recall – correct information that can be recollected by the mined model  Precision – correct information that can be produced by the mined model  Co-emission – probability similarity

39 Robustness Sensitivity to errors “Inject” error nodes and error transition to the PFSA model 1234 start end A BC D E EF H error Z Z Z Z

40 Scalability Use synthetic models  Build a tree from a pre-determined number of nodes  Add loops based on ‘locality of reference’  Assign equal probabilities to transition from the same node Vary the size of the model (nodes, transitions)

1 Inferring Specifications A kind of review. 2 The Problem Most programs do not have specifications Those that do often fail to preserve the consistency.

Similar presentations

Presentation on theme: "1 Inferring Specifications A kind of review. 2 The Problem Most programs do not have specifications Those that do often fail to preserve the consistency."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Inferring Specifications A kind of review. 2 The Problem Most programs do not have specifications Those that do often fail to preserve the consistency.

Similar presentations

Presentation on theme: "1 Inferring Specifications A kind of review. 2 The Problem Most programs do not have specifications Those that do often fail to preserve the consistency."— Presentation transcript:

Similar presentations

About project

Feedback