Code Learning and Transfer for Automatic Patch Generation

Slides:



Advertisements
Similar presentations
Mining Specifications Glenn Ammons, Dept. Computer Science University of Wisconsin Rastislav Bodik, Computer Science Division University of California,
Advertisements

Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Test-Driven Development and Refactoring CPSC 315 – Programming Studio.
Automatic Software Repair Using GenProg 张汉生 ZHANG Hansheng 2013/12/3.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael Ernst, Jake Cockrell, William Griswold, David Notkin Presented by.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
Chapter 13 & 14 Software Testing Strategies and Techniques
Zichao Qi, Fan Long, Sara Achour, and Martin Rinard MIT CSAIL
State coverage: an empirical analysis based on a user study Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens.
` Research 2: Information Diversity through Information Flow Subgoal: Systematically and precisely measure program diversity by measuring the information.
Computer Security and Penetration Testing
Bug Localization with Machine Learning Techniques Wujie Zheng
Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.
Presenter: Shanshan Lu 03/04/2010
Software Testing Input Space Partition Testing. 2 Input Space Coverage Four Structures for Modeling Software Graphs Logic Input Space Syntax Use cases.
Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.
REPRESENTATIONS AND OPERATORS FOR IMPROVING EVOLUTIONARY SOFTWARE REPAIR Claire Le Goues Westley Weimer Stephanie Forrest
Xusheng Xiao North Carolina State University CSC 720 Project Presentation 1.
Machine Learning for Program Language Research Yao Peisen Prism Group, HKUST
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Defensive Programming. Good programming practices that protect you from your own programming mistakes, as well as those of others – Assertions – Parameter.
Chapter 5 Ranking with Indexes. Indexes and Ranking n Indexes are designed to support search  Faster response time, supports updates n Text search engines.
11 Making Decisions in a Program Session 2.3. Session Overview  Introduce the idea of an algorithm  Show how a program can make logical decisions based.
Andy Nguyen Christopher Piech Jonathan Huang Leonidas Guibas. Stanford University.
Constraint Framework, page 1 Collaborative learning for security and repair in application communities MIT site visit April 10, 2007 Constraints approach.
Modular Decomposition, Abstraction and Specifications
The Scientific Method 7M Science.
Formal Specification.
Anti-patterns in Search-based Program Repair
Algorithms and Problem Solving
Summary of “Efficient Deep Learning for Stereo Matching”
Control Flow Testing Handouts
John D. McGregor Session 9 Testing Vocabulary
Quality and applicability of automated repair
Input Space Partition Testing CS 4501 / 6501 Software Testing
Chapter 8 – Software Testing
Towards Trustworthy Program Repair
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
Outline of the Chapter Basic Idea Outline of Control Flow Testing
Chapter 13 & 14 Software Testing Strategies and Techniques
Stacks Chapter 4.
John D. McGregor Session 9 Testing Vocabulary
Ruru Yue1, Na Meng2, Qianxiang Wang1 1Peking University 2Virginia Tech
UNIT-4 BLACKBOX AND WHITEBOX TESTING
John D. McGregor Session 9 Testing Vocabulary
Topics Introduction to File Input and Output
Lecture 12: Data Wrangling
CSc4730/6730 Scientific Visualization
Test Case Purification for Improving Fault Localization
Mock Object Creation for Test Factoring
Automated Patch Generation
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Algorithms and Problem Solving
Code search & recommendation engines
50.530: Software Engineering
CS5123 Software Validation and Quality Assurance
Precise Condition Synthesis for Program Repair
Review of Previous Lesson
Topics Introduction to File Input and Output
Automatically Diagnosing and Repairing Error Handling Bugs in C
Stelios Sidiroglou-Douskos, Eric Lahtinen, Fan Long, Martin Rinard
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
UNIT-4 BLACKBOX AND WHITEBOX TESTING
Shin Hwei Tan, Hiroaki Yoshida, Mukul R. Prasad, Abhik Roychoudhury
CS249: Neural Language Model
Defensive Programming
Chapter 13 & 14 Software Testing Strategies and Techniques 1 Software Engineering: A Practitioner’s Approach, 6th edition by Roger S. Pressman.
Presentation transcript:

Code Learning and Transfer for Automatic Patch Generation Fan Long MIT EECS & CSAIL

Generate And Validate Patching A standard way to deal with this situation has emerged – called generate and validate patching.

Generate Candidate Patches Suspicious Statements: line 340 at foo.c line 338 at foo.c line 337 at foo.c ... Apply Modifications Search Space of Candidate Patches …

Anatomy of a Modification Statement in Original Unpatched Program Statement in Patched Program if (C) {…} else {…} if (C && E) {…} else {…} E is a clause of the form “exp == c” or “exp != c”, where exp is a variable or field access and c is a constant Here is one example modification that Prophet applies: Possibilities for E in control flow modifications are all 1) E is local variable, global variable, structure access path. Local variable occurs in same function. Global accessed somewhere in file. structure access path before last member access should occur in same basic block (a.b->c, a.b occurs in same basic block). Synthesized conditions check E == K or E != K for some K that appears in E during negative test case. 2) For tighten and loosen, also try clauses CL that appear in C, both CL and !CL. Split C on && and ||, try each alternative A and !A For machine learning ranking, put these in a bin with 3) below, feature says there is an abstract expression. 3) Also try 0 for tighten and insert guard. For loosen try 1. Goto L – L in same function. Return K – K is a constant (+0) that appears in the same function.

Other Modifications if (C) {…} else {…} if (C || E ) {…} else {…} S if ( E ) { S } S if ( E ) return c; S Replace S S[replace v1 with v2] Here are the remaining modifications that Prophet applies to a statement S. ... This is an empirical set of modification transforms. Each modification transform is a hypothesis about an error the developer may have made when they wrote the code. The goal of the transform is to correct that error. Modifications manipulate program at the granularity of expressions. Possibilities for E in control flow modifications are all 1) E is local variable, global variable, structure access path. Local variable occurs in same function. Global accessed somewhere in file. structure access path before last member access should occur in same basic block (a.b->c, a.b occurs in same basic block). Synthesized conditions check E == K or E != K for some K that appears in E during negative test case. 2) For tighten and loosen, also try clauses CL that appear in C, both CL and !CL. Split C on && and ||, try each alternative A and !A For machine learning ranking, put these in a bin with 3) below, feature says there is an abstract expression. 3) Also try 0 for tighten and insert guard. For loosen try 1. Goto L – L in same function. Return K – K is a constant (+0) that appears in the same function. Copy & Replace S Q[replace v1 with v2]; S Initialize S memset(&e, 0, sizeof(e)); S

Validate Candidate Patches … p->f1 = y; = Positive Inputs = = = ≠ Negative Inputs = ≠ Generate a search space of candidate patches

Validate Candidate Patches … p->f1 = y z; = Positive Inputs = = And then validate each candidate patch against the test suite = Negative Inputs = ≠ Validate each candidate patch against the test suite

Validate Candidate Patches … p->f1 f2 = y; = Positive Inputs = ≠ = Negative Inputs = Validate each candidate patch against the test suite

Validate Candidate Patches … if (p != 0) return; p->f1 = y; = Positive Inputs = = The system collects all candidate patches that = Negative Inputs = Collect all of the patches that validate

Challenges for Automatic Patch Generation Only specification is the test suite Many validated but incorrect patches How to prioritize correct patches ahead? We are going to be able to patch systems with million lines of code. You may generate patches that validate but have negative effects. How to make patch generation systems work well in the presence of these patches.

A Validated but Incorrect Patch … return; p->f1 = y; = Positive Inputs = = Change the small code example. = Negative Inputs = Because test suite is incomplete!

Negative Effects of Validated but Incorrect Patches Remove functionality Generate incomplete bug fix Introduce vulnerability CVE 2006-2025 tsize_t offset = dir->tdir_offset + cc; if ((tsize_t)dir->tdir_offset != offset - cc       || offset > (tsize_t)tif->tif_size) goto bad; TODO: Validated but Incorrect patches could have negative effects. if (dir->tdir_offset + cc > tif->tif_size) goto bad; Original code: tif_dirread.c Developer patch

Negative Effects of Validated but Incorrect Patches Remove functionality Generate incomplete bug fix Introduce vulnerability CVE 2006-2025 tsize_t offset = dir->tdir_offset + cc; if ((tsize_t)dir->tdir_offset != offset - cc       || offset > (tsize_t)tif->tif_size) goto bad; TODO: Validated but Incorrect patches could have negative effects. if (dir->tdir_offset + cc > tif->tif_size) goto bad; A validated but incorrect patch Developer patch

Challenges for Automatic Patch Generation Only specification is the test suite Many validated but incorrect patches How to prioritize correct patches ahead? Search space explosion Tradeoff between coverage and tractability What mutation transforms we should use? We are going to be able to patch systems with million lines of code. You may generate patches that validate but have negative effects. How to make patch generation systems work well in the presence of these patches.

Learning-based Patch Generation 💡💡💡… Learned Human Knowledge Training Set of Past Human Patches Code Learning Systems A Program With Bug Generate-and-validate Patch Generation Result Patches

The Growing Number of Existing Programs GitHub Open Source Projects It is not surprising that we now have more software programs than ever before. We got 12 millions projects hosted on the Github and this number is rapidly counting. This is just one of the repository hosting sites. If you count all software programs in the world, the number will be much bigger. There is an enormous amount of software in the world with more coming every day. While this software is very valuable and does great things for us, I’m here to tell you that this software is of variable quality. It often contains bugs that cause the software to give the wrong result or crash. And security vulnerabilities are always a prominent problem. http://spectrum.ieee.org/transportation/systems/this-car-runs-on-code

How Learned Human Knowledge Can Help? Better patch prioritization: Prophet [POPL’16, Long and Rinard] HDRepair [SANER’16, Le et al.] Better search spaces: Genesis [MIT-TR-2016-010, Long et al.]

Patch Generation System Prophet Patch Generation System Ranked list of patches All of which pass test suite Prophet Test Suite Key issue here – for many bugs, there can be 10s or even 100s of patches that pass the test suite. But are nevertheless incorrect. Developer chooses correct patch Goal: rank correct patch first Inputs Correct Outputs

Prophet: Key Insights Correct patches share universal features that hold across applications These features capture interactions between the patch and the surrounding code TODO: Hypothesis: learn across application. Many people do not believe this, we tested this hypothesis in our experiments

Example Features PHP_METHOD(DatePeriod, __construct) { char* str = NULL; …  if (parse(&str, &str_len)==FAIL) return;   if (str_len || str != 0) { initialize(str, str_len, …); … } else { … } … } Let's go back to our example.

Atomic characteristic (patch): Atomic characteristic (original code): Example Features PHP_METHOD(DatePeriod, __construct) { char* str = NULL; …  if (parse(&str, &str_len)==FAIL) return;   if (str_len || str != 0) { initialize(str, str_len, …); … } else { … } … } Atomic characteristic (patch): A variable is checked by a condition. Atomic characteristic (original code): is also a call parameter at the current statement. Co-occurrence pairs: <checked, call para/C> Let's go back to our example.

Atomic characteristic (patch): Atomic characteristic (original code): Example Features PHP_METHOD(DatePeriod, __construct) { char* str = NULL; …  if (parse(&str, &str_len)==FAIL) return;   if (str_len || str != 0) { initialize(str, str_len, …); … } else { … } … } Atomic characteristic (patch): A variable is checked by a condition. Atomic characteristic (original code): is also address-taken before in the original code Co-occurrence pairs: <checked, call para/C> <checked, addr taken/B>

Atomic characteristic (patch): Atomic characteristic (original code): Example Features PHP_METHOD(DatePeriod, __construct) { char* str = NULL; …  if (parse(&str, &str_len)==FAIL) return;   if (str_len || str != 0) { initialize(str, str_len, …); … } else { … } … } Atomic characteristic (patch): A variable is checked by a condition. Atomic characteristic (original code): is also a local variable Co-occurrence pairs: <checked, call para/C> <checked, addr taken/B> <checked, local var> Interaction in a meaning full way

Atomic characteristic (patch): Example Features PHP_METHOD(DatePeriod, __construct) { char* str = NULL; …  if (parse(&str, &str_len)==FAIL) return;   if (str_len || str != 0) { initialize(str, str_len, …); … } else { … } … } Atomic characteristic (patch): A variable is checked by a condition. Atomic characteristic (original code): is also a pointer Co-occurrence pairs: <checked, call para/C> <checked, addr taken/B> <checked, local var> <checked, pointer> Prophet extract all of these co-occurrence pairs of atomic characteristics as features. Each of the pair indicates an interaction between the patch and the original code.

Example Features Co-occurrence pairs: <checked, call para/C> PHP_METHOD(DatePeriod, __construct) { char* str = NULL; …  if (parse(&str, &str_len)==FAIL) return;   if (str_len || str != 0) { initialize(str, str_len, …); … } else { … } … } PHP_METHOD(DatePeriod, __construct) { char* str = NULL; …  if (parse(&str, &str_len)==FAIL) return;   if (str_len || ht == 1) { initialize(str, str_len, …); … } else { … } … } Co-occurrence pairs: <checked, call para/C> <checked, addr taken/B> <checked, local var> <checked, pointer>

Example Features Co-occurrence pairs: <checked, call para/C> PHP_METHOD(DatePeriod, __construct) { char* str = NULL; …  if (parse(&str, &str_len)==FAIL) return;   if (str_len || str != 0) { initialize(str, str_len, …); … } else { … } … } PHP_METHOD(DatePeriod, __construct) { char* str = NULL; …  if (parse(&str, &str_len)==FAIL) return;   if (str_len || ht == 1) { initialize(str, str_len, …); … } else { … } … } Co-occurrence pairs: <checked, call para/C> <checked, addr taken/B> <checked, local var> <checked, pointer> <checked, global var>

Example Features Co-occurrence pairs: <checked, call para/C> PHP_METHOD(DatePeriod, __construct) { char* str = NULL; …  if (parse(&str, &str_len)==FAIL) return;   if (str_len || str != 0) { initialize(str, str_len, …); … } else { … } … } PHP_METHOD(DatePeriod, __construct) { char* str = NULL; …  if (parse(&str, &str_len)==FAIL) return;   if (str_len || ht == 1) { initialize(str, str_len, …); … } else { … } … } Co-occurrence pairs: <checked, call para/C> <checked, addr taken/B> <checked, local var> <checked, pointer> How we formalize this intuitive idea? Features -> Apply the model <checked, global var> By learning from the corpus, Prophet identifies: Positive Features

Example Features Co-occurrence pairs: <checked, call para/C> PHP_METHOD(DatePeriod, __construct) { char* str = NULL; …  if (parse(&str, &str_len)==FAIL) return;   if (str_len || str != 0) { initialize(str, str_len, …); … } else { … } … } PHP_METHOD(DatePeriod, __construct) { char* str = NULL; …  if (parse(&str, &str_len)==FAIL) return;   if (str_len || ht == 1) { initialize(str, str_len, …); … } else { … } … } Co-occurrence pairs: <checked, call para/C> <checked, addr taken/B> <checked, local var> <checked, pointer> <checked, global var> By learning from the corpus, Prophet identifies: Positive Features Negative Features

Using Program Analysis + Machine Learning To Prioritize Correct Patches Use program analysis to extract features Obtain corpus of patches from open source software development efforts Learn a probabilistic model to prioritize correct patches TODO: Put the number of features I collected. Learn properties of successful patches from one set of applications. Use those properties to recognize correct patches for a completely different set of application. Universal properties that characterize correct patches.

Setup for Model Goal: estimate , given , Program S if (E) { S } Modification Location (statement in ) Goal of the model is to give us an estimate that the patch is correct. A patch is a modification m applied to a location l in the program. Model will estimate m, l given the program and the model parameters. Use the estimate to rank the patches A patch is a modification applied to a location ( identifies a statement in program )

Probabilistic Model Probability that modification applied at location in program given produces a correct patch Log linear distribution based on extracted features lllll l We are going to use a standard log linear model. But we will encode the error localization information, which gives us an error localization rank r(p,l) for every location l, Using a geometric distribution. We choose this mechanism simply because it works well in practice. Geometric distribution that encodes error localization A patch is a modification applied to a location ( identifies a statement in program )

Application Lines of Code Defects libtiff 77 K 8 lighttpd 62 K 7 php 31 gmp 145 K 2 gzip 491 K 4 python 407 K 9 wireshark 2,814 K 6 fbc 97 K GenProg Benchmark set by Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, Westley Weimer [ICSE 2012]

Prophet Results Number of Bugs Angelix System

History Driven Program Repair Candidates: - frequently occur in the knowledge base - pass negative test cases Mutates buggy program to create repair candidates Test Cases The idea of using existing bug fixes to better prioritize patches has applied to Java as well. HDRepair is such a system for Java. Knowledge base: Mined from past bug fix behaviors

Bug Fix History Mining Collection of Bug Fixes Graph Representation Pre-fix AST GumTree Graph Bug Fix Post-Fix AST Collection of Graphs Count the frequencies of different AST diff graph patterns. Convert pairs of ASTs into AST diff graph. gSpan Closed Graph Mining Collection of Graph Patterns

Candidates with higher scores Selection Fix Patterns Candidates with higher scores E1 Matching Average Score Select E2 A single candidate patch Prioritize patches with higher pattern frequencies

HDRepair Results HDRepair 18 bugs PAR 4 bugs GenProg 1 bug For 90 selected defects4j cases With a perfect localization oracle

All previous systems operates with a set of manually defined transforms Manual Transforms … Search Space of Candidate Patches ,but you like to answer those questions: language change, coding style change, etc.

Can we learn how developers write patches in the first place?

?: … Learn how developers write patches in the first place Training Human Patches Inferred Transforms … Search Space of Candidate Patches

Genesis: … Learn how developers write patches in the first place Training Human Patches Inferred Transforms … Search Space of Candidate Patches

Genesis Results # of cases with correct patches TODO: put 13/20 rather than 13, etc. Genesis outperforms humans

Code Transfer

Motivation If a bug in a program can be fixed by existing code logics in another program. Can we extract and transfer the code for patch generation? Systems: CodePhage [PLDI’15, Stelios et al.] QACrashFix [ASE’15, Gao et al.]

Display Cat

Cat Crash

ViewNior Cat

ViewNior Protection

Application-Independent Representation of Check CodePhage Overview Donor Recipient viewnior 1.4 (stripped binary) display 6.5.2 (source code) 3. Identify Patch Insertion Point 8B45FC 4863F0 8B45FC 4863D0 5. Verify Patch 2. Extract Check 1. Locate Check 4. Translate and Insert (source code patch) Application-Independent Representation of Check

Patch Display

Patched Display protects Cat

QACrashFix ……

QACrashFix Overview

QACrashFix Overview Query search engine with stack trace snippet to find relevant Q&A pages.

Scrap the Q&A pages and extract edit scripts. QACrashFix Overview Scrap the Q&A pages and extract edit scripts.

QACrashFix Overview Search a variable name mapping and apply edit scripts accordingly at suspicious locations.

Filter out invalid patches. QACrashFix Overview Filter out invalid patches.

Summary Code Learning: Extract useful human knowledge to enable Better patch prioritization Better transforms and search spaces Code Transfer: Transfer program logics of existing code From a donor program to a recipient program From a Q&A website example to a program Code Transfer techniques have stronger assumption, narrow scope, but better precision.

Looking into the Future Human Developers: Domain specific knowledge Software engineering training Patch Generation Systems: Computation power Ultimate Goal: Build systems that combine human developer knowledge and machine computation power.

Looking into the Future The growing volume of existing programs is not just a challenge but also a great opportunity. Exploiting this opportunity is a key for solving future software engineering problems