1 Tracking Down Bugs Benny Vaksendiser. 2 Overview Motivation Isolating Cause-Effect Chains from Computer Programs Visualization of Test Information to.

Slides:



Advertisements
Similar presentations
Delta Debugging and Model Checkers for fault localization
Advertisements

50.530: Software Engineering Sun Jun SUTD. Week 10: Invariant Generation.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Slides prepared by Rose Williams, Binghamton University ICS201 Exception Handling University of Hail College of Computer Science and Engineering Department.
FIT FIT1002 Computer Programming Unit 19 Testing and Debugging.
Michael Ernst, page 1 Learning and repair tools background Michael Ernst MIT Lab for Computer Science Joint work with Jake.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Dynamic Invariant Discovery Modified from Tevfik Bultan’s original presentation.
VBA Modules, Functions, Variables, and Constants
1 Chapter 4 Language Fundamentals. 2 Identifiers Program parts such as packages, classes, and class members have names, which are formally known as identifiers.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.
272: Software Engineering Fall 2008 Instructor: Tevfik Bultan Lecture 15: Interface Extraction.
272: Software Engineering Fall 2008 Instructor: Tevfik Bultan Lecture 16: Dynamic Invariant Discovery.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael Ernst, Jake Cockrell, William Griswold, David Notkin Presented by.
IMSE Week 18 White Box or Structural Testing Reading:Sommerville (4th edition) ch 22 orPressman (4th edition) ch 16.
Automatically Extracting and Verifying Design Patterns in Java Code James Norris Ruchika Agrawal Computer Science Department Stanford University {jcn,
1 Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
Automatic Extraction of Object-Oriented Component Interfaces John Whaley Michael C. Martin Monica S. Lam Computer Systems Laboratory Stanford University.
Guide To UNIX Using Linux Third Edition
Programming Logic and Design, Introductory, Fourth Edition1 Understanding Computer Components and Operations (continued) A program must be free of syntax.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 5 Data Flow Testing
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 16: Automated Interface Extraction.
1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.
Ernst, ICSE 99, page 1 Dynamically Detecting Likely Program Invariants Michael Ernst, Jake Cockrell, Bill Griswold (UCSD), and David Notkin University.
COMP s1 Computing 2 Complexity
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
Unit Testing & Defensive Programming. F-22 Raptor Fighter.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 17: Code Mining.
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
DySy: Dynamic Symbolic Execution for Invariant Inference.
CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.
Presented By Dr. Shazzad Hosain Asst. Prof., EECS, NSU
Locating Causes of Program Failures Texas State University CS 5393 Software Quality Project Yin Deng.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Presented By: Wes Toland, Geoff Gerfin Michael D. Ernst, Jake Cockrell,
CSE 219 Computer Science III Testing. Testing vs. Debugging Testing: Create and use scenarios which reveal incorrect behaviors –Design of test cases:
Bug Localization with Machine Learning Techniques Wujie Zheng
Analysis of Algorithms
The Daikon system for dynamic detection of likely invariants MIT Computer Science and Artificial Intelligence Lab. 16 January 2007 Presented by Chervet.
Introduction to Software Testing. Types of Software Testing Unit Testing Strategies – Equivalence Class Testing – Boundary Value Testing – Output Testing.
DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
Dynamically Discovering Likely Program Invariants All material in this presentation is derived from documentation online at the Daikon website,
CSC 211 Data Structures Lecture 13
Today’s Agenda  Reminder: HW #1 Due next class  Quick Review  Input Space Partitioning Software Testing and Maintenance 1.
Data Structure Introduction.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.
How to isolate cause of failure? 최윤라. Contents Introduction Isolating relevant input Isolating relevant states Isolating the error Experiments.
Software Engineering Research Group, Graduate School of Engineering Science, Osaka University A Slicing Method for Object-Oriented Programs Using Lightweight.
Final Review. From ArrayLists to Arrays The ArrayList : used to organize a list of objects –It is a class in the Java API –the ArrayList class uses an.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
 In the java programming language, a keyword is one of 50 reserved words which have a predefined meaning in the language; because of this,
Structuring Data: Arrays ANSI-C. Representing multiple homogenous data Problem: Input: Desired output:
CPSC 871 John D. McGregor Module 8 Session 1 Testing.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
CPSC 372 John D. McGregor Module 8 Session 1 Testing.
The PLA Model: On the Combination of Product-Line Analyses 강태준.
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.
Testing and Debugging.
Chapter 3: Program Statements
Chapter 1 Introduction(1.1)
Effective and Efficient memory Protection Using Dynamic Tainting
Algorithm Correctness
50.530: Software Engineering
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Presentation transcript:

1 Tracking Down Bugs Benny Vaksendiser

2 Overview Motivation Isolating Cause-Effect Chains from Computer Programs Visualization of Test Information to Assist Fault Localization Dynamically Discovering Likely Program Invariants to Support Program Evolution Automatic Extraction of Object-Oriented Component Interfaces Comparison Summary References

3 Motivation Improve software quality. Reduce the number of delivered faults. Lowering maintenance cost.

4 Motivation Lowering Maintenance Cost The cost for maintaining software represents more than 90%. Debugging is a high consuming task.

5 Why Debugging So Hard? Complexity of software. Fixing unfamiliar code. Finding the cause of a bug isn ’ t trivial. “ Corner places ”. Undocumented code. Documentation often incomplete or wrong. Lack of a guiding tool.

6 Isolating Cause-Effect Chains from Computer Programs Andreas Zeller professor at Universität des Saarlandes in Saarbrücken, GermanyUniversität des Saarlandes

7 Isolating Cause-Effect Chains from Computer Programs Andreas Zeller A passing run and a failing run Delta Debugging algorithm Isolating the cause of the failing run

8 Failing Run double mult(double z[], int n){ int i,j; i=0; for(j=0;j<n;j++){ i=i+j+1; z[i]=z[i]*(z[0]+1.0); } return z[n]; } Compiling fail.c, the GNU compiler (GCC) crashes: linux$ gcc O bug.c gcc: Internal error: program cc1 got fatal signal 11 What’s the error that causes this failure?

9 Cause What ’ s the cause for the GCC failure? The cause of any event ( “ effect ” ) is a preceding event without which the effect would not have occurred. — Microsoft Encarta To prove causality, we must show that: 1. The effect occurs when the cause occurs –failing run. 2. The effect does not occur when the cause does not occur – passing run. General technique: Experimentation—constructing a theory from a series of experiments (runs) Can’t we automate experimentation?

10 Isolating Failure Causes

11 Isolating Failure Causes

12 Isolating Failure Causes

13 Isolating Failure Causes

14 Isolating Failure Causes +1.0 is the failure cause – after only 19 tests

15 What ’ s going on in GCC?

16 What ’ s going on in GCC?

17 What ’ s going on in GCC?

18 What ’ s going on in GCC?

19 What ’ s going on in GCC? To fix the failure, we must break this cause-effect chain.

20 Small Cause, Big Effect How do we isolate the relevant state dierences?

21 Memory Graphs Vertices are variables. Edges are references.

22 The GCC Memory Graph

23 The Process in a Nutshell

24 The Process in a Nutshell

25 The Process in a Nutshell

26 The GCC Cause-Effect Chain

27 Submit buggy program Specify invocations Click on “ Debug it ” Diagnosis comes via

28 Visualization of Test Information to Assist Fault Localization Mary Jean Harrold John T. Stasko James A. Jones College of Computing Georgia Institute of Technology 24th International Conference on Software Engineering USA, May 2002

29 Visualization of Test Information to Assist Fault Localization James A. Jones, Mary Jean Harrold, John Stasko Higher frequency of execution of a statement of program by failure test cases Higher probability of having fault in the statement

30 Discrete Approach Input Source code For each test case its pass/fail status statements that it executes Display statements in program according to the test cases that execute them Only failed test cases Both passed & failed test cases Only passed test cases Statements executed by:

31 Example mid() { int x,y,z,m; 3,3,51,2,33,2,15,5,55,3,42,1,3 1: read(x,y,z); 2: m=z; 3: if(y<z) 4: if(x<y) 5: m=y; 6: elseif(x<z) 7: m=y; 8: else 9: if(x>y) 10: m=y; 11: elseif(x>z) 12: m=x; 13: print(m); } PPPPPF

32 Example mid() { int x,y,z,m; 3,3,51,2,33,2,15,5,55,3,42,1,3 1: read(x,y,z); ●●●●●● 2: m=z; ●●●●●● 3: if(y<z) ●●●●●● 4: if(x<y) ● 5: m=y; ● 6: elseif(x<z) ●●● 7: m=y; ●● 8: else ●●● 9: if(x>y) ● 10: m=y; ● 11: elseif(x>z) 12: m=x; 13: print(m); ●●●●●● } PPPPPF

33 Example mid() { int x,y,z,m; 3,3,51,2,33,2,15,5,55,3,42,1,3 1: read(x,y,z); ●●●●●● 2: m=z; ●●●●●● 3: if(y<z) ●●●●●● 4: if(x<y) ● 5: m=y; ● 6: elseif(x<z) ●●● 7: m=y; ●● 8: else ●●● 9: if(x>y) ● 10: m=y; ● 11: elseif(x>z) 12: m=x; 13: print(m); ●●●●●● } PPPPPF

34 Problem Not very helpful! Does not capture the relative frequency. 2,1,33,3,51,2,33,2,15,5,55,3,4 ●●●●●● 1: read(x,y,z); ●● 7: m=y; FPPPPP

35 Continuous Approach Distribute statements executed by both passed and failed test cases over spectrum. Indicate the relative success rate of each statement by its hue. Discrete Approach: Continuous Approach: Only failed test cases Both passed & failed test cases Only passed test cases

36 Continuous Approach - Hue Indicate the relative success rate of each statement by its hue.

37 Continuous Approach - Brightness Statements executed more frequently are rendered brighter

38 Back To Example mid() { int x,y,z,m; 3,3,51,2,33,2,15,5,55,3,42,1,3 1: read(x,y,z); ●●●●●● 2: m=z; ●●●●●● 3: if(y<z) ●●●●●● 4: if(x<y) ● 5: m=y; ● 6: elseif(x<z) ●●● 7: m=y; ●● 8: else ●●● 9: if(x>y) ● 10: m=y; ● 11: elseif(x>z) 12: m=x; 13: print(m); ●●●●●● } PPPPPF

39 Scalability } mid() { int x,y,z,m; 1: read(x,y,z); 2: m=z; 3: if(y<z) 4: if(x<y) 5: m=y; 6: elseif(x<z) 7: m=y; 8: else 9: if(x>y) 10: m=y; 11: elseif(x>z) 12: m=x; 13: print(m); }

40 Limitations double() { int x,d; : read(x); ●●● 2: d=abs(x+x); ●●● 3: print(d); ●●● } PPF Data related Bugs. Very test case depended.

41 Future Work What other views and analyses would be useful? What is the maximum practical number of faults for which this technique works? Visualization on higher-level representations of the code. Using visualization in other places.

42 Tarantula

43 Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst William G. Griswold David Notkin Jake Cockrell Dept. of Computer Science & Engineering University of Washington Dept. of Computer Science & Engineering University of California San Diego IEEE Transactions on Software Engineering 2001

44 Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, William G. Griswold, Jake Cockrell, David Notkin Problem Invariants are useful. Programmers (usually) don’t write invariants. Solution Dynamic Invariant Detection. Automatic tool: “Daikon“.

45 Example example from “ The Science of Programming, ” by Gries, 1981.

46 Example 100 randomly-generated arrays Length is uniformly distributed from 7 to 13 Elements are uniformly distributed from – 100 to 100 Daikon discovers invariant by running the program on this test set monitoring the values of the variables

47 Example 100 randomly-generated arrays Length is uniformly distributed from 7 to 13 Elements are uniformly distributed from – 100 to 100 Daikon discovers invariant by running the program on this test set monitoring the values of the variables Invariants produced by Daikon

48 Architecture

49 Instrumentation At program points of interest: Function entry points Loop heads Function exit points Output values of all `interesting' variables Scalar values (locals, globals, array subscript expressions, etc.) Arrays of scalar values Object addresses/ids More kinds of invariants checked for numeric types

50 Types of Invariants Variables x, y, z ; constants a, b, c Invariants over any variable x Constant value: x = a Uninitialized: x = uninit Small value set: x  {a, b, c} variable takes a small set of values Invariants over a single numeric variable: Range limits: x  a, x  b, a  x  b Nonzero: x  0 Modulus: x = a (mod b) Nonmodulus: x  a (mod b) reported only if x mod b takes on every value other than a

51 Types of Invariants Invariants over two numeric variables x, y Linear relationship: y = ax + b Ordering comparison: x  y, x  y, x  y, x  y, x = y, x  y Functions: y = fn(x) or x = fn(y) where fn is absolute value, negation, bitwise complement Invariants over x+y invariants over single numeric variable where x+y is substituted for the variable Invariants over x-y

52 Types of Invariants Invariants over three numeric variables Linear relationship: z = ax + by + c, y=ax+bz+c, x=ay+bz+c Functions z = fn(x,y) where fn is min, max, multiplication, and, or, greatest common divisor, comparison, exponentiation, floating point rounding, division, modulus, left and right shifts All permutations of x, y, z are tested (three permutations for symmetric functions, 6 permutations for asymmetric functions)

53 Types of Invariants Invariants over a single sequence variable Range: minimum and maximum sequence values, ordered lexicographically Element ordering: nondecreasing, nonincreasing, equal Invariants over all sequence elements: such as each value in an array being nonnegative

54 Types of Invariants Invariants over two sequence variables: x, y Linear relationship: y = ax + b, elementwise Comparison: x  y, x  y, x  y, x  y, x = y, x  y, performed lexicographically Subsequence relationship: x is a subsequence of y Reversal: x is the reverse of y Invariants over a sequence x and a numeric variable y Membership: x  y

55 Derived Variables Variables not appearing in source text. array: length, sum, min, max array and scalar: element at index, subarray number of calls to a procedure Enable inference of more complex relationships. Staged derivation and invariant inference. avoid deriving meaningless values avoid computing tautological invariants

56 Invariant Confidence To make the tool useful, invariants must be supported by statistically significant number of different values. Daikon checks likelihood that invariant would occur by chance. Invariants filtered based on a minimum confidence parameter.

57 Invariant Confidence – show x ≠0 x in range of size r. Probability that x is not 0 is 1 – 1/r Given s samples then probability that x is never 0 is (1-1/r) s. If this probability is less than a user defined confidence level then x  0 is reported as an invariant.

58 Efficiency Efficiency of instrumentation Values of tracked variables are output at each instrumentation point Significant program slowdown, large amounts of trace data produced Efficiency of analysis Potentially cubic in number of variables at any program point Influenced more strongly by size of trace data

59 Limitations The instrumentation needs large disk space. We need large test suite. Needs human intervention.

60 Future Work Combining this dynamic invariant detection with a static one. Extending the types of invariants. Increasing relevance.

61 Automatic Extraction of Object- Oriented Component Interfaces Monica S. LamJohn WhaleyMichael C. Martin ISSTA 2002 Stanford University ACM SIGSOFT Distinguished Paper Award, 2002

62 Automatic Extraction of Object-Oriented Component Interfaces J. Whaley, M. C. Martin, M. S. Lam Documentation Based on the actual code, so no divergence Rules for static or dynamic checkers Find errors in API usage Find API bugs Discrepancy between code & intended API Dynamic extraction: Evaluation of test coverage

63 Interfaces? Interfaces are constraints on the orderings of method calls. Example, Method m1 can be called only after a call to method m2. Both methods m1 and m2 have to be called before method m3 is called.

64 Specification Use a Finite State Machine (FSM) to express ordering constraints. States correspond to methods Transitions imply the ordering constraints M2M1 Method M2 can be called after method M1 is called

65 Example: File open S TART read write close E ND

66 A Simple OO Component Model Each object follows an FSM model. One state per method, plus S TART & E ND states. Method call causes a transition to a new state. open S TART read write close E ND m1 ; m2 is legal, new state is m2 m1 m2

67 Problem 1 An object has two fields, a and b. Each field must be set before being read. set_a get_a set_b get_b S TART set_aget_aset_bget_bE ND set_a get_b

68 Problem 1 set_a get_a set_b get_b S TART set_aget_aset_bget_bE ND set_a get_b An object has two fields, a and b. Each field must be set before being read.

69 Solution: Splitting by Fields set_a get_a set_b get_b S TART set_aget_aset_bget_bE ND S TART set_bget_bE ND S TART set_a get_a E ND Separate by fields into different, independent submodels.

70 Problem 2 getFileDescriptor is state-preserving. start S TART create E ND connect close getFileDescriptor S TART getFileDescriptor Model for Socket

71 Problem 2 getFileDescriptor is state-preserving. start S TART create E ND connect close getFileDescriptor S TART getFileDescriptorconnect Model for Socket

72 Solution: State Preserving Methods start S TART create E ND connect close S TART getFileDescriptor m1 is state-modifying m2 is state-preserving m1 ; m2 is legal, new state is m1 m1m2

73 Extraction Techniques StaticDynamic For all possible program executions For one particular program execution ConservativeExact (for that execution) Analyze implementationAnalyze component usage Detect illegal transitionsDetect legal transitions Superset of ideal model (upper bound) Subset of ideal model (lower bound)

74 Extracting Interface Statically The static algorithm has two main steps: 1. For each method m identify those fields and predicates that guard whether exceptions can be thrown. 2. Find the methods m ’ that set those fields to values that can cause the exception. This means that immediate transitions from m ’ to m are illegal Complement of the illegal transitions forms the a model of transitions accepted by the static analysis.

75 Detecting Illegal Transitions Only support simple predicates Comparisons with constants, implicit null pointer checks Find pairs such that: Source must execute: field = const ; Target must execute: if (field == const) throw exception;

76 Static Model Extractor Defensive programming Implementation throws exceptions (user or system defined) on illegal input. public void connect() { connection = new Socket(); } public void read() { if (connection == null) throw new IOException(); } S TART connectread

77 Dynamic Extractor Goal: find the legal transitions that occur during an execution of the program. Java bytecode instrumentation. For each thread, each instance of a class: Track last state-modifying method for each submodel. Same mechanism for dynamic checking Instead of adding to model, flag exception.

78 Limitations The model is too simple – only one state history.

79 Future Work Interfaces between classes. E ND S TART ServerSocket.accept()Socket.close() Socket.getOutputStream()

80 Comparison Delta Debugging TarantulaDaikonInterfaces Extraction Main Use Isolating the cause of a failing run Visualization of fault localization Extract invariants Extract documentation Test Cases 1 pass case, 1 fail case Many pass / fail cases Many pass cases Examine Program statesource codeProgram state Program state and source code Humane involvement LowHighMedium

81 Comparison Delta Debugging TarantulaDaikonInterfaces Extraction Efficiency MediumHighLowHigh Detailed Result Medium – HighLowHighMedium Tool Availability Available. In the near future also for java. Not availableAvailableNot available

82 Summary Programmers aren ’ t going to be obsolete in the near future. Automatic tools can guide humans in the debugging process.

83 References Isolating Cause-Effect Chains from Computer Programs Isolating Cause-Effect Chains from Computer Programs Presentation of Andreas Zeller Presentaion of Jinlin Yang Presentaion of Jinlin Yang Visualization of Test Information to Assist Fault Localization Visualization of Test Information to Assist Fault Localization Paper presentation. Presentation of Jinlin Yang. Presentation of Jinlin Yang. Tarantula homepage.

84 References Dynamically Discovering Likely Program Invariants to Support Program Evolution Dynamically Discovering Likely Program Invariants to Support Program Evolution Daikon homepage. Daikon homepage. Presentation of Tevfik Bultan. Presentation of Tevfik Bultan. Presentation of Marcelo D ’ Amorim Presentation of Marcelo D ’ Amorim Presentation of Joel Winstead. Talk Sliedes. Presentation of David Hovemeyer.

85 References Automatic Extraction of Object-Oriented Component Interfaces Automatic Extraction of Object-Oriented Component Interfaces Presentation of John Whaley. Presentation of Tevfik Bultan. Presentation of Tevfik Bultan.