Jiawei Han and Micheline Kamber Department of Computer Science

Slides:

Advertisements

Similar presentations

Object Oriented Analysis And Design-IT0207 iiI Semester

Advertisements

Mining Behavior Graphs for Backtrace of Noncrashing Bugs Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han University of Illinois at Urbana-Champaign Philip.

Semantics Static semantics Dynamic semantics attribute grammars

Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.

A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.

SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.

Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.

Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.

Programming Types of Testing.

Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.

CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Statistical Debugging: A Tutorial Steven C.H. Hoi Acknowledgement: Some slides in this tutorial were borrowed from Chao Liu at UIUC.

Automatically Extracting and Verifying Design Patterns in Java Code James Norris Ruchika Agrawal Computer Science Department Stanford University {jcn,

Programming Fundamentals (750113) Ch1. Problem Solving

Handouts Software Testing and Quality Assurance Theory and Practice Chapter 5 Data Flow Testing

CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.

P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Compilers, Interpreters and Debuggers Ruibin Bai (Room AB326) Division of Computer Science.

System/Software Testing

© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.

AMOST Experimental Comparison of Code-Based and Model-Based Test Prioritization Bogdan Korel Computer Science Department Illinois Institute of Technology.

Locating Causes of Program Failures Texas State University CS 5393 Software Quality Project Yin Deng.

Verification and Validation Overview References: Shach, Object Oriented and Classical Software Engineering Pressman, Software Engineering: a Practitioner’s.

Bug Localization with Machine Learning Techniques Wujie Zheng

1 Software Testing. 2 Path Testing 3 Structural Testing Also known as glass box, structural, clear box and white box testing. A software testing technique.

Dr. Tom WayCSC Testing and Test-Driven Development CSC 4700 Software Engineering Based on Sommerville slides.

1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng

1 Introduction to Software Testing. Reading Assignment P. Ammann and J. Offutt “Introduction to Software Testing” ◦ Chapter 1 2.

“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.

Software Development Problem Analysis and Specification Design Implementation (Coding) Testing, Execution and Debugging Maintenance.

Bug Localization with Association Rule Mining Wujie Zheng

Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.

Chapter 1 Software Engineering Principles. Problem analysis Requirements elicitation Software specification High- and low-level design Implementation.

Software Quality Assurance and Testing Fazal Rehman Shamil.

Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.

Quality Assurance in the Presence of Variability Kim Lauenroth, Andreas Metzger, Klaus Pohl Institute for Computer Science and Business Information Systems.

Verification vs. Validation Verification: "Are we building the product right?" The software should conform to its specification.The software should conform.

The PLA Model: On the Combination of Product-Line Analyses 강태준.

Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.

Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.

OPERATING SYSTEMS CS 3502 Fall 2017

Algorithms and Problem Solving

14 Compilers, Interpreters and Debuggers

Testing Tutorial 7.

Software Testing.

Software Testing and Maintenance 1

CompSci 230 Software Construction

Chapter 8 – Software Testing

Verification and Validation Overview

Learning Software Behavior for Automated Diagnosis

About the Presentations

runtime verification Brief Overview Grigore Rosu

CSS 161: Fundamentals of Computing

User-Defined Functions

UNIT-4 BLACKBOX AND WHITEBOX TESTING

Jiawei Han and Micheline Kamber Department of Computer Science

Objective of This Course

Test Case Purification for Improving Fault Localization

Testing and Test-Driven Development CSC 4700 Software Engineering

Programming Fundamentals (750113) Ch1. Problem Solving

Programming Fundamentals (750113) Ch1. Problem Solving

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

Algorithms and Problem Solving

Programming Fundamentals (750113) Ch1. Problem Solving

Tonga Institute of Higher Education IT 141: Information Systems

Programming Fundamentals (750113) Ch1. Problem Solving

Tonga Institute of Higher Education IT 141: Information Systems

Precise Condition Synthesis for Program Repair

Review of Previous Lesson

UNIT-4 BLACKBOX AND WHITEBOX TESTING

Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.

Presentation transcript:

Data Mining: Concepts and Techniques — Chapter 11 — — Additional Theme: Software Bug Mining— Jiawei Han and Micheline Kamber Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj ©2006 Jiawei Han and Micheline Kamber. All rights reserved. Acknowledgement: Chao Liu 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Outline Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Motivation Software is “full of bugs” Windows 2000, 35 million lines of code 63,000 known bugs at the time of release, 2 per 1000 lines Software failure costs Ariane 5 explosion due to “errors in the software of the inertial reference system” (Ariaen-5 flight 501 inquiry board report http://ravel.esrin.esa.it/docs/esa-x-1819eng.pdf) A study by the National Institute of Standards and Technology found that software errors cost the U.S. economy about $59.5 billion annually http://www.nist.gov/director/prog-ofc/report02-3.pdf Testing and debugging are laborious and expensive “50% of my company employees are testers, and the rest spends 50% of their time testing!” —Bill Gates, in 1995 Courtesy to CNN.com This work is basically about how to automatically localize the software bugs. The major motivation is that software is full of bugs. A research once showed that the average error rate is 1 – 4.5 errors per 1000 lines of code. For example, the windows 2000, which has 35M lines of code, contains 63 thousands of KNOWN bugs at the time of its release. This means 2 errors are in each thousand lines. When the bugs happen in practice, the costs are tremendous. In 1996, the Ariane 5 exploded 40 seconds after lauching. As investigated, the explosion was due to errors in the software of the inertial reference system. A study by the National Institute of Standards and Technology found that the software errors cost the U.S. economy about $59.5 billions annually. Therefore, great many efforts are put on the testing and debugging during the software cycletime. Bill Gates once said that 50% of my company employees are testers, and the rest spends 50% of their time testing. As we all know, testing and debugging are tough task. So there are some researches carried out on bug localization. 9/20/2018 Data Mining: Principles and Algorithms

A Glimpse on Software Bugs Crashing bugs Symptoms: segmentation faults Reasons: memory access violations Tools: Valgrind, CCured Noncrashing bugs Symptoms: unexpected outputs Reasons: logic or semantic errors if ((m >= 0)) vs. if ((m >= 0) && (m != lastm)) < vs. <=, > vs. >=, etc .. j = i vs. j= i+1 Tools: No sound tools 9/20/2018 Data Mining: Principles and Algorithms

Example of Noncrashing Bugs void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0) && (lastm != m) ){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >= 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >= 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m > 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; From memory access point of view, even incorrect executions are correct. 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Debugging Crashes Crashing Bugs 9/20/2018 Data Mining: Principles and Algorithms

Bug Localization via Backtrace Can we circle out the backtrace for noncrashing bugs? Major challenges We do not know where abnormality happens Observations Classifications depend on discriminative features, which can be regarded as a kind of abnormality Can we extract backtrace from classification results? Recall that in crashing bugs, memory accesses are obviously where abnormality happens so that the call stack constitutes the backtrace. Should we have known where abnormality happens, the call stack can also be the back trace of noncrashing bugs. 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Outline Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Related Work Crashing bugs Memory access monitoring Purify [HJ92], Valgrind [SN00] … Noncrashing bugs Static program analysis Traditional model checking Model checking source code 9/20/2018 Data Mining: Principles and Algorithms

Static Program Analysis Methodology Examine source code directly Enumerate all the possible execution paths without running the program Check user-specified properties, e.g. free(p) …… (*p) lock(res) …… unlock(res) receive_ack() … … send_data() Strengths Check all possible execution paths Problems Shallow semantics Properties can be directly mapped to source code structure Tools ESC [DRL+98], LCLint [EGH+94], ESP [DLS02], MC Checker [ECC00] … × 9/20/2018 Data Mining: Principles and Algorithms

Traditional Model Checking Methodology Formally model the system under check in a particular description language Exhaustive exploration of the reachable states in checking desired or undesired properties Strengths Model deep semantics Naturally fit in checking event-driven systems, like protocols Problems Significant amount of manual efforts in modeling State space explosion Tools SMV [M93], SPIN [H97], Murphi [DDH+92] … usually, this is a final state machine 9/20/2018 Data Mining: Principles and Algorithms

Model Checking Source Code Methodology Run real program in sandbox Manipulate event happenings, e.g., Message incomings the outcomes of memory allocation Strengths Less significant manual specification Problems Application restrictions, e.g., Event-driven programs (still) Clear mapping between source code and logic event Tools CMC [MPC+02], Verisoft [G97], Java PathFinder [BHP+-00] … 9/20/2018 Data Mining: Principles and Algorithms

Summary of Related Work In common, Semantic inputs are necessary Program model Properties to check Application scenarios Shallow semantics Event-driven system When these methods do not work? 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Outline Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Example Revisited void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0) && (lastm != m) ){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >= 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >= 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m > 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; No memory violations Not event-driven program No explicit error properties From memory access point of view, even incorrect executions are correct. , hence hard to model using finite state machines. 9/20/2018 Data Mining: Principles and Algorithms

Identification of Incorrect Executions A two-class classification problem How to abstract program executions Program behavior graph Feature selection Edges + Closed frequent subgraphs Program behavior graphs Function-level abstraction of program behaviors int main(){ ... A(); B(); } int A(){ ... } int B(){ ... C() ... } int C(){ ... } Behavior graph = call graph + transition graph One graph from one execution 9/20/2018 Data Mining: Principles and Algorithms

Values of Classification A graph classification problem Every execution gives one behavior graph Two sets of instances: correct and incorrect Values of classification Classification itself does not readily work for bug localization Classifier only labels each run as either correct or incorrect as a whole It does not tell when abnormality happens Successful classification relies on discriminative features Can discriminative features be treated as a kind of abnormality? When abnormality happens? Incremental classification? ? 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Outline Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions 9/20/2018 Data Mining: Principles and Algorithms

Incremental Classification Classification works only when instances of two classes are different. So that we can use classification accuracy as a measure of difference. Relate classification dynamics to bug relevant functions The main idea of incremental classification is that we train classifiers at different stages of program executions so that we have chance to capture when the bug happens or where the abnormality is. Basically, the incorrect execution looks the same at the beginning of execution, and then at certain stage, the execution triggers the bug, then the execution diverge from a correct execution. So if we can 9/20/2018 Data Mining: Principles and Algorithms

Illustration: Precision Boost main A E F G H B C D One Correct Execution One Incorrect Execution 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Bug Relevance Precision boost For each function F: Precision boost = Exit precision - Entrance precision. Intuition Differences take place within the execution of F Abnormalities happens while F is in the stack The larger this precision boost, the more likely F is part of the backtrace Bug-relevant function 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Outline Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Case Study Conclusions 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Case Study void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0) && (lastm != m) ){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >= 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; Subject program replace: perform regular expression matching and substitutions 563 lines of C code 17 functions are involved Execution behaviors 130 out of 5542 test cases fail to give correct outputs No incorrect executions incur segmentation faults Logic bug Can we circle out the backtrace for this bug? From memory access point of view, even incorrect executions are correct. 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Precision Pairs 9/20/2018 Data Mining: Principles and Algorithms

Precision Boost Analysis Objective judgment of bug relevant functions main function is always bug relevant Stepwise precision boost Line-up property 9/20/2018 Data Mining: Principles and Algorithms

Backtrace for Noncrashing Bugs 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Method Summary Identify incorrect executions from program runtime behaviors Classification dynamics can give away “backtrace” for noncrashing bugs without any semantic inputs Data mining can contribute to software engineering and system researches in general CP-Miner [LLM+04] detects copy-paste bugs in OS code uses Clospan algorithm C-Miner [LCS+04] discovers block correlations in storage systems again uses Clospan algorithm effectively reduces I/O response time … … 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Outline Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms An Example void dodash(char delim, char *src, int *i, char *dest, int *j, int maxset) { while (…){ … if(isalnum(isalnum(src[*i+1]) && src[*i-1]<=src[*i+1]){ for(k = src[*i-1]+1; k<=src[*i+1]; k++) junk = addst(k, dest, j, maxset); *i = *i + 1; } Had the function been written correctly, the subclause in red should have been there. Replace program: 563 lines of C code, 20 functions Symptom: 30 out of 5542 test cases fail to give correct outputs, and no crashes Goal: Localizing the bug, and prioritizing manual examination 9/20/2018 Data Mining: Principles and Algorithms

Difficulty & Expectation Statically, even small programs are complex due to dependencies Dynamically, execution paths can vary significantly across all possible inputs Logic errors have no apparent symptoms Expectations Unrealistic to fully unload developers Localize buggy region Prioritize manual examination 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Execution Profiling Full execution trace Control flow + value tags Too expensive to record at runtime Unwieldy to process Summarized control flow for conditionals (if, while, for) Branch evaluation counts Lightweight to take at runtime Easy to process and effective How to represent 9/20/2018 Data Mining: Principles and Algorithms

Analysis of the Example if(isalnum(isalnum(src[*i+1]) && src[*i-1]<=src[*i+1]){ for(k = src[*i-1]+1; k<=src[*i+1]; k++) junk = addst(k, dest, j, maxset); *i = *i + 1; } A = isalnum(isalnum(src[*i+1])) B = src[*i-1]<=src[*i+1] An execution is logically correct until (A ^ ¬B) is evaluated as true when the evaluation reaches this condition If we monitor the program conditionals like A here, their evaluation will shed light on the hidden error and can be exploited for error isolation Had the function been written correctly, the subclause in red should have been there. 9/20/2018 Data Mining: Principles and Algorithms

Analysis of Branching Actions Correct vs. in correct runs in program P AS we tested through 5542 test cases, the true eval prob for (A^¬B) is 0.727 in a correct and 0.896 in an incorrect execution on average Error location does exhibit detectable abnormal behaviors in incorrect executions A ¬A B nAB n¬AB ¬B nA¬B = 0 n¬A¬B A ¬A B nAB n¬AB ¬B nA¬B ≥1 n¬A¬B 9/20/2018 Data Mining: Principles and Algorithms

Conditional Test Works for Nonbranching Errors Void makepat (char *arg, int start, char delim, char *pat) { … if (!junk) result = 0; else result = i + 1; /* off-by-one error */ /* should be: result = i */ return result; } Had the function been written correctly, the subclause in red should have been there. Off-by-one error can still be detected using the conditional tests 9/20/2018 Data Mining: Principles and Algorithms

Ranking Based on Boolean Bias Let input di has a desired output oi. We execute P. P passes the test iff oi’ is identical to oi Tp = {ti| oi’= P(di) matches oi} Tf = {ti| oi’= P(di) does not match oi} Boolean bias: nt: # times that a boolean feature B evaluates true, similar for nf Boolean bias: π(B) = (nt – nf )/(nt + nf) It encodes the distribution of B’s value: 1 if B always assumes true, -1 if always false, in between for all the other mixtures 9/20/2018 Data Mining: Principles and Algorithms

Evaluation Abnormality Boolean bias for branch P the probability of being evaluated as true within one execution Suppose we have n correct and m incorrect executions, for any predicate P, we end up with An observation sequence for correct runs S_p = (X’_1, X’_2, …, X’_n) An observation sequence for incorrect runs S_f = (X_1, X_2, …, X_m) Can we infer whether P is suspicious based on S_p and S_f? 9/20/2018 Data Mining: Principles and Algorithms

Underlying Populations Imagine the underlying distribution of boolean bias for correct and incorrect executions are f(X|θp) and f(X|θf) S_p and S_f can be viewed as random sample from the underlying populations respectively Major heuristic: The larger the divergence between f(X|θp) and f(X|θf), the more relevant the branch P is to the bug 1 Prob Evaluation bias 1 Prob Evaluation bias 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Major Challenges 1 Prob Evaluation bias 1 Prob Evaluation bias No knowledge of the closed forms of both distributions Usually, we do not have sufficient incorrect executions to estimate f(X|θf) reliably. If we knew them, some standard measures may apply, i.e., KL-divergence 9/20/2018 Data Mining: Principles and Algorithms

Our Approach: Hypothesis Testing 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Faulty Functions Motivation Bugs are not necessarily on branches Higher confidence in function rankings than branch rankings Abnormality score for functions Calculate the abnormality score for each branch within each function Aggregate them 9/20/2018 Data Mining: Principles and Algorithms

Two Evaluation Measures CombineRank Combine these score by summation Intuition: When a function contains many abnormal branches, it is likely bug-relevant UpperRank Choose the largest score as the representative Intuition: When a function has one extremely abnormal branch, it is likely bug-relevant With some derivation shown in paper. 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Dodash vs. Omatch: Which function is likely buggy?─And Which Measure is More Effective? 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Bug Benchmark Bug benchmark Siemens Program Suite 89 variants of 6 subject programs, each of 200-600 LOC 89 known bugs in total Mainly logic (or semantic) bugs Widely used in software engineering research 9/20/2018 Data Mining: Principles and Algorithms

Results on Program “replace” 9/20/2018 Data Mining: Principles and Algorithms

Comparison between CombineRank and UpperRank Buggy function ranked within top-k 9/20/2018 Data Mining: Principles and Algorithms

Results on Other Programs 9/20/2018 Data Mining: Principles and Algorithms

More Questions to Be Answered What will happen (i.e., how to handle) if multiple errors exist in one program? How to detect bugs if only very few error test cases are available? Is it really more effective if we have more execution traces? How to integrate program semantics in this statistics-based testing algorithm? How to integrate program semantics analysis with statistics-based analysis? Here comes the outline. We first discuss based on an example, which illustrates why logic errors are hard to deal with. 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Outline Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions 9/20/2018 Data Mining: Principles and Algorithms

Mining Copy-Paste Bugs Copy-pasting is common 12% in Linux file system [Kasper2003] 19% in X Window system [Baker1995] Copy-pasted code is error prone Among 35 errors in Linux drivers/i2o, 34 are caused by copy-paste [Chou2001] void __init prom_meminit(void) { …… for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1]; } for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1]; } Forget to change! for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1]; } (Simplified example from linux-2.6.6/arch/sparc/prom/memory.c) 9/20/2018 Data Mining: Principles and Algorithms

An Overview of Copy-Paste Bug Detection Parse source code & build a sequence database Mine for basic copy-pasted segments Compose larger copy-pasted segments Prune false positives 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Parsing Source Code Purpose: building a sequence database Idea: statement  number Tokenize each component Different operators/constant/key words  different tokens Handle identifier renaming: same type of identifiers  same token old = 3; new = 3; Tokenize 5 61 20 5 61 20 Hash Hash 16 16 9/20/2018 Data Mining: Principles and Algorithms

Building Sequence Database Program  a long sequence Need a sequence database Cut the long sequence Naïve method: fixed length Our method: basic block Hash values 65 16 16 71 … 65 16 71 for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1]; } …… for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1]; } Final sequence DB: (65) (16, 16, 71) … (65) (16, 16, 71) 9/20/2018 Data Mining: Principles and Algorithms

Mining for Basic Copy-pasted Segments Apply frequent sequence mining algorithm on the sequence database Modification Constrain the max gap Frequent subsequence total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1]; Insert 1 statement (gap = 1) (16, 16, 71) …… (16, 16, 71) (16, 16, 71) …… (16, 16, 10, 71) taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1]; 9/20/2018 Data Mining: Principles and Algorithms

Composing Larger Copy-Pasted Segments Combine the neighboring copy-pasted segments repeatedly Hash values 65 16 16 71 65 for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1]; } for (i=0; i<n; i++) { combine 16 16 71 total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1]; …… copy-pasted 65 16 16 71 65 for (i=0; i<n; i++) { for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1]; } combine 16 16 71 taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1]; 9/20/2018 Data Mining: Principles and Algorithms

Pruning False Positives Unmappable segments Identifier names cannot be mapped to corresponding ones Tiny segments For more detail, see Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code, in Proc. 6th Symp. Operating Systems Design and Implementation, 2004 f (a1); f (a2); f (a3); f1 (b1); f1 (b2); f2 (b3); conflict 9/20/2018 Data Mining: Principles and Algorithms

Some Test Results of C-P Bug Detection Software Verified Bugs Potential Bugs (careless programming) Linux 28 21 FreeBSD 23 8 Apache 5 PostgreSQL 2 Software # LOC Linux 4.4 M FreeBSD 3.3 M Apache 224 K PostgreSQL 458 K Space (MB) Time Software 57 38 secs PostgreSQL 30 15 secs Apache 459 20 mins FreeBSD 527 Linux 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Outline Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms Conclusions Data mining into software and computer systems Identify incorrect executions from program runtime behaviors Classification dynamics can give away “backtrace” for noncrashing bugs without any semantic inputs A hypothesis testing-like approach is developed to localize logic bugs in software No prior knowledge about the program semantics is assumed Lots of other software bug mining methods should be and explored 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms References [DRL+98] David L. Detlefs, K. Rustan, M. Leino, Greg Nelson and James B. Saxe. Extended static checking, 1998 [EGH+94] David Evans, John Guttag, James Horning, and Yang Meng Tan. LCLint: A tool for using specifications to check code. In Proceedings of the ACM SIG-SOFT '94 Symposium on the Foundations of Software Engineering, pages 87-96, 1994. [DLS02] Manuvir Das, Sorin Lerner, and Mark Seigle. Esp: Path-sensitive program verication in polynomial time. In Conference on Programming Language Design and Implementation, 2002. [ECC00] D.R. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specic, programmer-written compiler extensions. In Proc. 4th Symp. Operating Systems Design and Implementation, October 2000. [M93] Ken McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993 [H97] Gerard J. Holzmann. The model checker SPIN. Software Engineering, 23(5):279-295, 1997. [DDH+92] David L. Dill, Andreas J. Drexler, Alan J. Hu, and C. Han Yang. Protocol verication as a hardware design aid. In IEEE Int. Conf. Computer Design: VLSI in Computers and Processors, pages 522-525, 1992. [MPC+02] M. Musuvathi, D. Y.W. Park, A. Chou, D. R. Engler and D. L. Dill. CMC: A Pragmatic Approach to Model Checking Real Code. In Proc. 5th Symp. Operating Systems Design and Implementation, 2002. 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms References (cont’d) [G97] P. Godefroid. Model Checking for Programming Languages using VeriSoft. In Proc. 24th ACM Symp. Principles of Programming Languages, 1997 [BHP+-00] G. Brat, K. Havelund, S. Park, and W. Visser. Model checking programs. In IEEE Int.l Conf. Automated Software Engineering (ASE), 2000. [HJ92] R. Hastings and B. Joyce. Purify: Fast Detection of Memory Leaks and Access Errors. 1991. in Proc. Winter 1992 USENIX Conference, pp. 125-138. San Francisco, California Chao Liu, Xifeng Yan, and Jiawei Han, “Mining Control Flow Abnormality for Logic Error Isolation,” in Proc. 2006 SIAM Int. Conf. on Data Mining (SDM'06), Bethesda, MD, April 2006. C. Liu, X. Yan, L. Fei, J. Han, and S. Midkiff, “SOBER: Statistical Model-based Bug Localization”, in Proc. 2005 ACM SIGSOFT Symp. Foundations of Software Engineering (FSE 2005), Lisbon, Portugal, Sept. 2005. C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu, “Mining Behavior Graphs for Backtrace of Noncrashing Bugs”, in Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05), Newport Beach, CA, April 2005. [SN00] Julian Seward and Nick Nethercote. Valgrind, an open-source memory debugger for x86-GNU/Linux http://valgrind.org/ [LLM+04] Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code, in Proc. 6th Symp. Operating Systems Design and Implementation, 2004 [LCS+04] Zhenmin Li, Zhifeng Chen, Sudarshan M. Srinivasan, Yuanyuan Zhou. C-Miner: Mining Block Correlations in Storage Systems. In pro. 3rd USENIX conf. on file and storage technologies, 2004 9/20/2018 Data Mining: Principles and Algorithms

Data Mining: Principles and Algorithms 9/20/2018 Data Mining: Principles and Algorithms