Automatic Program Correction Anton Akhi Friday, July 08, 2011.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Semantics Static semantics Dynamic semantics attribute grammars
Delta Debugging and Model Checkers for fault localization
ICE1341 Programming Languages Spring 2005 Lecture #6 Lecture #6 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
Hoare’s Correctness Triplets Dijkstra’s Predicate Transformers
Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 11.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 13.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Axiomatic Semantics.
ISBN Chapter 3 Describing Syntax and Semantics.
Copyright © 2006 Addison-Wesley. All rights reserved. 3.5 Dynamic Semantics Meanings of expressions, statements, and program units Static semantics – type.
1 Design by Contract Building Reliable Software. 2 Software Correctness Correctness is a relative notion  A program is correct with respect to its specification.
Fall Semantics Juan Carlos Guzmán CS 3123 Programming Languages Concepts Southern Polytechnic State University.
CS 330 Programming Languages 09 / 19 / 2006 Instructor: Michael Eckmann.
1 Semantic Description of Programming languages. 2 Static versus Dynamic Semantics n Static Semantics represents legal forms of programs that cannot be.
CSE 331 Software Design & Implementation Dan Grossman Winter 2014 Lecture 2 – Reasoning About Code With Logic 1CSE 331 Winter 2014.
CS 355 – Programming Languages
Axiomatic Semantics Dr. M Al-Mulhem ICS
The Theory of NP-Completeness
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 18 Program Correctness To treat programming.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Axiomatic Semantics ICS 535.
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
Describing Syntax and Semantics
Zichao Qi, Fan Long, Sara Achour, and Martin Rinard MIT CSAIL
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
1 Program Correctness CIS 375 Bruce R. Maxim UM-Dearborn.
Bug Localization with Machine Learning Techniques Wujie Zheng
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 15: Automated Patch Generation.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
CS 363 Comparative Programming Languages Semantics.
Testing Testing Techniques to Design Tests. Testing:Example Problem: Find a mode and its frequency given an ordered list (array) of with one or more integer.
Muhammad Idrees Lecturer University of Lahore 1. Outline Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute.
Propositional Calculus CS 270: Mathematical Foundations of Computer Science Jeremy Johnson.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
Reasoning about programs March CSE 403, Winter 2011, Brun.
ISBN Chapter 3 Describing Semantics.
Chapter 3 Part II Describing Syntax and Semantics.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Semantics In Text: Chapter 3.
Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.
REPRESENTATIONS AND OPERATORS FOR IMPROVING EVOLUTIONARY SOFTWARE REPAIR Claire Le Goues Westley Weimer Stephanie Forrest
Xusheng Xiao North Carolina State University CSC 720 Project Presentation 1.
COP4020 Programming Languages Introduction to Axiomatic Semantics Prof. Robert van Engelen.
1 / 48 Formal a Language Theory and Describing Semantics Principles of Programming Languages 4.
Testing OO software. State Based Testing State machine: implementation-independent specification (model) of the dynamic behaviour of the system State:
Introduction to Software Testing Chapter 9.2 Program-based Grammars Paul Ammann & Jeff Offutt
Automated discovery in math Machine learning techniques (GP, ILP, etc.) have been successfully applied in science Machine learning techniques (GP, ILP,
1 Assertions. 2 A boolean expression or predicate that evaluates to true or false in every state In a program they express constraints on the state that.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
CSC3315 (Spring 2009)1 CSC 3315 Languages & Compilers Hamid Harroud School of Science and Engineering, Akhawayn University
C HAPTER 3 Describing Syntax and Semantics. D YNAMIC S EMANTICS Describing syntax is relatively simple There is no single widely acceptable notation or.
Introduction to Software Testing (2nd edition) Chapter 5 Criteria-Based Test Design Paul Ammann & Jeff Offutt
A Review of Software Testing - P. David Coward
Evidence-Based Automated Program Fixing
Reasoning About Code.
Reasoning about code CSE 331 University of Washington.
Syntax Questions 6. Define a left recursive grammar rule.
Propositional Calculus: Boolean Algebra and Simplification
Programming Languages 2nd edition Tucker and Noonan
Semantics In Text: Chapter 3.
Automated Patch Generation
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Assertions References: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 4/25/2019.
Programming Languages 2nd edition Tucker and Noonan
COP4020 Programming Languages
Presentation transcript:

Automatic Program Correction Anton Akhi Friday, July 08, 2011

Plan Introduction Survey of existing tools 2

Why do we need automatic program correction? Even if bug has been found it is still programmer’s job to think of fix Allows to fix bug automatically or provide high-quality fix suggestions 3

Isn’t it a magic? Tools require failing and passing runs Oracle to check if test passes Program with bug is not far away from the right one 4

Existing tools Genetic programming: – Automatic Program Repair with Evolutionary Computation – A Novel Co-evolutionary Approach to Automatic Software Bug Fixing Machine learning: – BugFix Contract usage: – Automated Debugging using Path-Based Weakest Preconditions – AutoFix-E – AutoFix-E2 5

Automatic Program Repair with Evolutionary Computation W. Weimer, S. Forrest, C. Le Goues, T. Nguyen Usage of GP to fix bugs: – Individuals – Genetic Operators – Fitness function 6

Genetic programming: individuals Individual is represented as an abstract syntax tree and weighted path Weighted path is a list of statements visited on negative test case with weight of every statement: – 1 if not visited by positive tests – 0.1 if visited by positive tests 7

Genetic programming: operators Mutation: – statement on a weighted pass is considered for mutation with probability proportional to its weight – deletion, insertion, swap Crossover: – exchange of subtrees chosen at random between two individuals 8

Genetic programming: fitness function Weighted sum of test cases passed Weight of a negative test is at least as much as positive test 9

Minimizing changes Trim unnecessary edits One-minimal subset of changes is a subset such that without any of the changes program will stop passing all the tests Use delta debugging to compute one-minimal subset of changes 10

Automatic Program Repair with Evolutionary Computation: Conclusions Needs: – negative and positive test cases – oracle – fault localization Uses: – genetic programming – Delta debugging Method has a lot of criticism 11

A Novel Co-evolutionary Approach to Automatic Software Bug Fixing A. Arcuri and X. Yao Genetic Programming Distance Functions Search Based Software Testing Co-evolution 12

Genetic Programming Genetic Program consists of primitives Program is represented in a tree form Fitness function to be minimized: is a number of nodes in a program is a number of raised exceptions is a special distance function 13

Distance Functions Works fine with numbers and boolean expressions Predicates involving and can be handled only in cases small 14

Search Based Software Testing Find tests that make evolutionary programs fail Fitness function for test case to be maximized: 15

Co-evolution Competitive co-evolution First generation: copies of buggy program and unit tests Mutations, crossover, replacement by the original program Penalty for short programs 16

A Novel Co-evolutionary Approach to Automatic Software Bug Fixing: Conclusions Needs: – starting set of unit tests – oracle Uses: – genetic programming – co-evolution – search based software testing Some bugs are too difficult to solve Bug that is difficult to be fixed by a human might be very easy for program 17

BugFix: A Learning-Based Tool to Assist Developers in Fixing Bugs D. Jeffrey, M. Feng, N. Gupta, R. Gupta Association rule learning Interesting value mapping pairs (IVMP) Situation descriptors Knowledgebase of rules 18

Association rule learning Association rule learning is a popular method for discovering the relationship between variables in database Association rule where X, Y are sets of attributes and means that if the items in set X are present then it is probable that the items in set Y are also present The confidence of the rule is where supp(X) is the fraction of transactions containing X 19

Interesting Value Mapping Pairs An Interesting Value Mapping Pair (IVMP) is a pair of value mappings (original, alternate) associated with a particular statement instance in a failing run, such that: (1) original is the original value mapping used by the failing run at that instance; and (2) alternate is an alternate (different) value mapping such that if the values in original are replaced by the values in alternate at that instance during re-execution of the failing run, then the incorrect output of the failing run becomes correct 20

Situation descriptors Statement structure situation descriptors IVMP pattern situation descriptors Value pattern situation descriptors 21

Statement structure situation descriptors Unordered tokens comprising the statement 22

Statement structure situation descriptors: examples 23

IVMP pattern situation descriptors Consider pattern to occur when corresponding values in the IVMPs compare to each other in the same way across all IVMPs at a statement Compare in terms less than, greater than, or equal to Look at the pairs of values: – within original sets of values – within alternate sets of values – between corresponding values in sets 24

IVMP pattern situation descriptors: examples 25

Value pattern situation descriptors Similar to IVMP patterns 26

Knowledgebase of rules Database of bug-fix scenarios Initially created through training data of known debugging situations 27

How it works Identify rules to consider – rules in which debugging situation is subset of the current debugging situation Sort rules by confidence values Report prioritized bug-fix descriptions Learn from current debugging situation and the corresponding bug fix 28

BugFix: A Learning-Based Tool to Assist Developers in Fixing Bugs: Conclusions BugFix assists in fixing bugs by producing a list of bug-fix suggestions Tool learns through new situations Is not very good with new and logically difficult bugs 29

Automated Debugging using Path ‑ Based Weakest Preconditions H. He, N. Gupta Representation of an error trace Path-based weakest precondition Hypothesized program state Actual program state Detection of evidence Location and modification of likely errorneous statement 30

Representation of an error trace is an instance of an executed statement in an error trace; i is an execution point and j is the line number of statement Branch predicates: – atomic predicate: (expr relop const) – compound predicates are in disjunctive normal form: where and is an atomic predicate Precondition and postcondition for failing run are transformed in DNF: – replace and by disjunction and conjunction 31

Path ‑ based weakest precondition pwp(T, R) is the set of all states such that an execution of function F that flows execution trace T begun in any of them is guaranteed to terminate is state satisfying R where means substituting every occurrence of x in R with a where B is branch predicate 32

Hypothesized program state Let where is a trace from point i to the end of trace and R is postcondition defines the set of hypothesized program states at an execution point i and 33

Actual program state Represented by predicates in DNF which are true for given input Consists of forward program states and backward program states 34

Forward and backward program states Forward program states are defined as: – positive conjunctions in precondition – – and are sets of predicates killed by and derived from statement Backward program states are defined as: – if is an assignment statement – if is a branch predicate – 35

Detection of evidence A is less restrictive than B if is false Evidence at point i is situation when is less restrictive than Two types of evidence: – Explicit Type I Type II – Implicit 36

Types of evidence Explicit – Type I: if at point i in appears negative r in form (0 relop const) then r forms an explicit evidence of Type I – Type II: let an atomic predicate in and a negative atomic predicate in If then q and r form an explicit evidence of Type II Implicit: negative predicate r in that is not present in an explicit evidence 37

Location and modification of likely erroneous statement Use transitivity and equality to deduce new predicates. New states are and Explicit Type I – If = then match r to 0=0. Consider every assignment statement between i and bottom of the trace as a possible candidate for modification – Let from be a corresponding predicate to r, and and – Goal is to make, so – If then problem is solved 38

Location and modification of likely erroneous statement: Explicit Type I Consider e and c as a set of strings of characters Let and be difference between e and c and between c and e correspondingly If appears in rhs than replace it with If appears in lhs than replace it with only if is a single variable If none of the above works than select r as e and try every q from as c 39

Location and modification of likely erroneous statement: Explicit Type II Consider Either q or r could be in error Change the form of r to q at i – Same manner as above Change the form of q to r at i – Change original branch predicate from which q may be derived or some assignment statement – Change of an assignment statement does not change relop in q 40

Location and modification of likely erroneous statement: Implicit If there is a loop in the trace, which contributes some constraints on, and missed constraints have similarity with constraints added by the loop then try to derive the possible missing iterations in the loop Try to match negative r from to some q from 41

Automated Debugging using Path ‑ Based Weakest Preconditions: Conclusions Uses contracts Can handle only one error Cannot handle loops well 42

Automated Fixing of Programs with Contracts Yi Wei, Yu Pei, C.A. Furia, L.S. Silva, S. Buchholz, B. Meyer, A. Zeller Assessing Object State Fault Analysis Behavioral Models Generating Candidate Fixes Linearly Constrained Assertions Validating and Ranking Fixes 43

Assessing Object State Argument-less Boolean Queries – Absolutely describe state – Seldom preconditions – Widely used in Eiffel contracts Complex Predicates – Boolean queries are often correlated – Implication expresses correlation – Mine the contracts – Mutate implications 44

Fault Analysis Find state invariants: passing and failing states – sets of predicates that hold during all passing and failing runs respectively Fault profile contains all predicates that hold in the passing run but not in the failing run Find the strongest predicate that implies the negation of violated assertion 45

Behavioral models Finite state automaton – States are predicates that hold – Transitions are routines Automaton is built based on test runs Determine a sequence of routine calls that change the object state appropriately A snippet for a set of predicates is any sequence of routines that drive object from a state where none of predicates hold to one where all of them hold 46

Generating Candidate Fixes Fix Schemas snippet is a sequence of routine calls old_stmt – some statements in the original program fail: – 47

Linearly Constrained Assertions Determine what is variable and what is constant Assign weights: – Arguments in precondition receive lower weights – In assertion weight is inversely proportional to the number of occurrences – Identifiers that routine can assign receive less weight 48

Linearly Constrained Assertions: Generating fixes Select a value for variable that satisfies the constraint – Look for extremal values Plug the value into a fix schema – if not constraint then new_stmt else old_stmt end 49

Validating and Ranking Fixes Candidate is valid if it passes all the tests Two metrics for ranking: – Dynamic: estimates the difference in runtime behavior between the fix and the original based on state distance – Static: OS: 0 for schemas (a) and (b) and number of statements in old_stmt for (c) and (d) SN: number of statements in snippet BF: number of branches to reach old_stmt from the point of injection of the instantiated fix schema 50

Automated Fixing of Programs with Contracts: Conclusions Uses contracts, passing and failing tests to deduce and suggest bug fixes Successfully proposed fixes for 16 out of 42 found bugs in EiffelBase library 51

Evidence-Based Automated Program Fixing Yu Pei, Yi Wei, C.A. Furia, M. Nordio, B. Meyer Predicates, Expressions, and States Static Analysis Dynamic Analysis Fixing Actions Fix Candidate Generation Validation of Candidates 52

Predicates, Expressions is a set of all non-constant expressions in routine r is a set of boolean predicates: – Boolean expressions: every – Voidness checks: for every – Integer comparisons: for every and every and in – Complements for every 53

States State Components – - a triple where v is a value of predicate p for some test case t which riches l – comp(T) denotes all the triples defined by the tests in the set 54

Static Analysis sub(e) is the set of all sub-expressions of e Expression proximity Expression dependence between predicate and a contract clause c Control distance is the length of the shortest directed path from to on the control-flow graph Control dependence 55

Dynamic Analysis is a score for tests for i-th failing test and for i-th passing test; The evidence provided by each test case: 56

Combining Static and Dynamic Analysis Combined evidence score: 57

Fixing Actions A component with a high evidence score induces a number of possible actions: – Derived Expressions – Expression Modification – Expression Replacement 58

Fix Candidate Generation Candidates are generated in a way similar to previous method Candidate is valid if it passes all the tests 59

Evidence-Based Automated Program Fixing: Conclusions Uses contracts and sets of passing and failing tests Combines static and dynamic approaches 60

Conclusion Automated Bug Fixing is real! Some bugs are still too difficult to fix or even localize All approaches need some kind of oracle; sometimes contracts are the oracle 61

References W. Weimer, S. Forrest, C. Le Goues, T. Nguyen, Automatic Program Repair with Evolutionary Computation A. Arcuri and X. Yao, A Novel Co-evolutionary Approach to Automatic Software Bug Fixing D. Jeffrey, M. Feng, N. Gupta, R. Gupta, BugFix: A Learning-Based Tool to Assist Developers in Fixing Bugs H. He, N. Gupta, Automated Debugging using Path ‑ Based Weakest Preconditions Yi Wei, Yu Pei, C.A. Furia, L.S. Silva, S. Buchholz, B. Meyer, A. Zeller, Automated Fixing of Programs with Contracts Yu Pei, Yi Wei, C.A. Furia, M. Nordio, B. Meyer, Evidence-Based Automated Program Fixing 62