1 Inferring Specifications A kind of review. 2 The Problem Most programs do not have specifications Those that do often fail to preserve the consistency.

Slides:

Advertisements

Similar presentations

De necessariis pre condiciones consequentia sine machina P. Consobrinus, R. Consobrinus M. Aquilifer, F. Oratio.

Advertisements

Indexing DNA Sequences Using q-Grams

Design of Experiments Lecture I

Mining Specifications Glenn Ammons, Dept. Computer Science University of Wisconsin Rastislav Bodik, Computer Science Division University of California,

Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:

50.530: Software Engineering Sun Jun SUTD. Week 10: Invariant Generation.

Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.

Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.

Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.

Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 13.

A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.

1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.

1 Perracotta: Mining Temporal API Rules from Imperfect Traces Jinlin Yang David Evans Deepali Bhardwaj Thirumalesh Bhat Manuvir Das.

ISBN Chapter 3 Describing Syntax and Semantics.

Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol

Mutual Information Mathematical Biology Seminar

ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.

Basic Data Mining Techniques

Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael Ernst, Jake Cockrell, William Griswold, David Notkin Presented by.

Program Testing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Automatically Extracting and Verifying Design Patterns in Java Code James Norris Ruchika Agrawal Computer Science Department Stanford University {jcn,

Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Describing Syntax and Semantics

EE694v-Verification-Lect5-1- Lecture 5 - Verification Tools Automation improves the efficiency and reliability of the verification process Some tools,

Applying Dynamic Analysis to Test Corner Cases First Penka Vassileva Markova Madanlal Musuvathi.

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.

Combining Exact and Metaheuristic Techniques For Learning Extended Finite-State Machines From Test Scenarios and Temporal Properties ICMLA ’14 December.

Mining Windows Kernel API Rules Jinlin Yang 09/28/2005CS696.

Inductive learning Simplest form: learn a function from examples

Automatically Inferring Temporal Properties for Program Evolution Jinlin Yang and David Evans 15 th IEEE International Symposium on Software Reliability.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

Software Testing Testing types Testing strategy Testing principles.

Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008.

Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”

Jinlin Yang and David Evans [jinlin, Department of Computer Science University of Virginia PASTE 2004 June 7 th 2004

Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.

1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.

Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Probabilistic Networks Chapter 14 of Dechter’s CP textbook Speaker: Daniel Geschwender April 1, 2013 April 1&3, 2013DanielG--Probabilistic Networks1.

Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.

CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur

Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.

Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.

Automated Debugging with Error Invariants TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A AA A A Chanseok Oh.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Winter 2007SEG2101 Chapter 121 Chapter 12 Verification and Validation.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.

Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.

MOPS: an Infrastructure for Examining Security Properties of Software Authors Hao Chen and David Wagner Appears in ACM Conference on Computer and Communications.

Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.

Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):

Week#3 Software Quality Engineering.

Chapter - 12 GRAPH MATRICES AND APPLICATIONS.

RE-Tree: An Efficient Index Structure for Regular Expressions

Hierarchical clustering approaches for high-throughput data

CSCI1600: Embedded and Real Time Software

Test Case Purification for Improving Fault Localization

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

CSCI1600: Embedded and Real Time Software

Discovery of Significant Usage Patterns from Clickstream Data

Presentation transcript:

1 Inferring Specifications A kind of review

2 The Problem Most programs do not have specifications Those that do often fail to preserve the consistency between specification and implementation Specification are needed for verification, testing and maintenance

3 Suggested Solution Automatic discovery of specifications

4 Our Playground Purpose  Verification, testing, promote understanding Specification representation  Contracts, properties, automaton, … Inference technique  Static, dynamic, combination Human intervention

5 Restrictions and Assumptions Learning automata from positive traces is undecidable [Gold67] An executing program is usually “almost” correct If a miner can identify the common behavior, it can produce a correct specification, even from programs that contain errors

6 Perracota: Mining Temporal API Rules From Imperfect Traces Jinlin Yang, David Evans Department of Computer Science, University Of Virginia Deepali Bhardwai, Thirumalesh Bhat, Manuvir Das Center for Software Excellence, Microsoft Corp. ICSE ‘06

7 Key Contribution Addressing the problem of imperfect traces Techniques for incorporating contextual information into the inference algorithm Heuristics for automatically identifying interesting properties

8 Perracota A dynamic analysis tool for automatically inferring temporal properties Takes the program's execution traces as input and outputs a set of temporal properties it likely has Program Instrumented Program instrumentation Test Suite Execution Traces Property Templates Inferred Properties Testing Inference

9 Property Templates NameQREValid Example Invalid Examples Response S*(PP*SS*)*SPPSSSPPSSP Alternating (PS)*PSPSPSS, PPS, SPS MultiEffect (PSS*)*PSSPPS, SPS MultiCause (PP*S)*PPSPSS, SPS EffectFirst S*(PS)*SPSPSS, PPS CauseFirst (PP*SS*)*PPSSSPSS, SPPS OneCause S*(PSS*)*SPSSPPSS, SPPS OneEffect S*(PP*S)*SPPSPPSS, SPSS

10 Initial approach Algorithm is developed for inferring two-event properties (scalability) Complexity O(nL) time, O(n 2 ) space  n – number of distinct events  L – length of trace Each cell in the matrix holds the current state of a state machine that represent the alternating pattern between the pair of events Require perfect traces

11 Approximate Inference Partition trace into sub-traces  For example:PSPSPSPSPSPPPP PS|PS|PS|PS|PS|PPP Compute satisfaction rate of each template  The ratio between partitions satisfying the alternate property and the total number of partitions Set a satisfaction threshold

12 Contextual Properties lock1.acq lock2.acq lock2.rel lock1.rel neutral sensitive No property Inferred lock1.acq→lock2.acqlock1.acq→lock2.rel lock1.acq→lock1.rellock2.acq→lock2.rel lock2.acq→lock1.rellock2.rel→lock1.rel slicing //lock1 acq rel //lock2 acq rel lock1.acq→lock2.acq

13 Selecting Interesting Properties Reachablity  Mark a property P→S as probably uninteresting if S is reachable from P in the call graph  For example:  The relationship between C and D is not obvious from inspecting either C or D X() { … C(); … D(); … } A() { … B(); … }

14 Selecting Interesting Properties Name Similarity  A property is more interesting if it involves similarly named events  For Example: ExAcquireFastMutexUnsafe ExReleaseFastMutexUnsafe  Compute word similarity as

15 Chaining Connect related Alternating properties into chains  A→B, B→C and A→C implies A→B→C Provide a way to compose complex state machines out of many small state machines Identification of complex multi-event properties without suffering a high computational cost

16 SMArTIC: Towards Building an Accurate, Robust and Scalable Specification Miner David Lo and Siau-Cheng Khoo Department of Computer Science, National University of Singapore FSE ‘06

17 Hypotheses Mined specifications will be more accurate when:  erroneous behavior is removed before learning  they are obtained by merging the specifications learned from clusters of related traces than when they are obtained from learning the entire traces

18 Filtering Merging Traces Clustering Learning Merged Automaton Filtered Traces Clusters of Filtered Traces Automatons Structure

19 Filtering How can you tell what’s wrong if you don’t know what’s right? Filter out erroneous traces based on common behavior Common behavior is represented by “statistically significant” temporal rules

20 Pre → Post Rules Look for rules of the form a → bc when a occurs b must eventually occur after a, and c must also eventually occur after b Rules exhibiting high confidence and reasonable support can be considered as “statistical” invariants  Support – Number of traces exhibiting the property pre→post  Confidence – the ratio of traces exhibiting the property pre→post to those exhibiting the property pre

21 Clustering Convert a set of traces into groups of related traces  Localize inaccuracies  Scalability

22 Clustering Algorithm Variant of the k-medoid algorithm  Compute the distance between a pair of data items (traces) based on a similarity metric  k is the number of clusters to create Algorithm: Repeatedly increase k until a local maximum is reached

23 Similarity Metric Use global sequence alignment algorithm Problem: doesn’t work well in the presence of loops Solution: compare the regular expression representation FTFTALILLAVAV F--TAL-LLA-AV ABCBCDABCBCBCD ABCD (A(BC)+D)+ ABCD

24 Learning Learn PFSAs from clusters of filtered traces  PFSA per cluster A “place holder”  In current experiment – sk-strings learner

25 Merging Merge multiple PFSAs into one The merged PFSA accepts exactly the union of sentences accepted by the multiple PFSAs Ensures probability integrity  Probability for transition  in output PFSA

26 From Uncertainty to Belief: Inferring the Specifications Within Ted Kremenek, Paul Twohey, Andrew Y. Ng, Dawson Engler Computer Science Dep., Stanford University Godmar Back Computer Science Dep., Virginia Tech OSDI ‘06

27 Motivating Example Problem: Inferring ownership roles  Ownership idiom: a resource has at any time exactly one owning pointer Infer annotation  ro – returns ownership  co – claims ownership Is fopen a ro? fread/fclose a co? FILE* fp = fopen(“myfile.txt”, r); fread(buffer, n, 1000, fp); fclose(fp);

28 Basic Ownership Rules fp = ro(); ¬co(fp); co(fp); fp = ¬ro(); ¬co(fp); ¬co(fp); ¬Owned Owned Uninit ClaimedOK Bug ¬co ¬ro ro co end-of- path any use ¬co

29 Goal Provide a framework that: 1. Allows users to easily express every intuition and domain-specific observation they have that is useful for inferring annotations 2. Reduce such knowledge in a sound way to meaningful probabilities (“common currency”)

30 Annotation Inference 1. Define the set of possible annotations to infer 2. Model domain-specific knowledge and intuitions in the probabilistic model 3. Compute annotations probabilitis

31 Factors – Modeling Beliefs Relations mapping the possible values of one or more annotations variables to non- negative real numbers For example:  CheckFactor  belief: any random place might have a bug 10% of times; set  = 0.1,  = 0.9  : if DFA = OK {  : if DFA = Bug f =

32 Factors Other factors  bias toward specifications with ro  without co  based on naming conventions  …

33 Annotation Factor Graph fopen:retfread:4fclose:1fdopen:retfwrite:4 f annotation variables prior beliefs behavioral tests

34 Results fopen:retfread:4fclose:1DFAf f (fread:4)P(A) ro¬coco ¬ro¬coco ro¬coco roco¬co roco¬co ¬ro¬coco ¬roco¬co ¬roco¬co

35 QUARK: Empirical Assessment of Automaton-based Specification Miners David Lo and Siau-Cheng Khoo Department of Computer Science, National University of Singapore WCRE ‘06

36 QUARK Framework Assessing the quality of specification miners Measure performance along multiple dimensions  Accuracy – extent of inferred specification being representative of the actual specification  Scalability – ability to infer large specifications  Robustness – sensitivity to errors

37 Quality Assessment QUARK Framework Simulator Model (PFSA) Trace Generator User Defined Miner Measurements

38 Accuracy (Trace Similarity) Absence of error Metrics:  Recall – correct information that can be recollected by the mined model  Precision – correct information that can be produced by the mined model  Co-emission – probability similarity

39 Robustness Sensitivity to errors “Inject” error nodes and error transition to the PFSA model 1234 start end A BC D E EF H error Z Z Z Z

40 Scalability Use synthetic models  Build a tree from a pre-determined number of nodes  Add loops based on ‘locality of reference’  Assign equal probabilities to transition from the same node Vary the size of the model (nodes, transitions)