1 Tracking Down Bugs Benny Vaksendiser
2 Overview Motivation Isolating Cause-Effect Chains from Computer Programs Visualization of Test Information to Assist Fault Localization Dynamically Discovering Likely Program Invariants to Support Program Evolution Automatic Extraction of Object-Oriented Component Interfaces Comparison Summary References
3 Motivation Improve software quality. Reduce the number of delivered faults. Lowering maintenance cost.
4 Motivation Lowering Maintenance Cost The cost for maintaining software represents more than 90%. Debugging is a high consuming task.
5 Why Debugging So Hard? Complexity of software. Fixing unfamiliar code. Finding the cause of a bug isn ’ t trivial. “ Corner places ”. Undocumented code. Documentation often incomplete or wrong. Lack of a guiding tool.
6 Isolating Cause-Effect Chains from Computer Programs Andreas Zeller professor at Universität des Saarlandes in Saarbrücken, GermanyUniversität des Saarlandes
7 Isolating Cause-Effect Chains from Computer Programs Andreas Zeller A passing run and a failing run Delta Debugging algorithm Isolating the cause of the failing run
8 Failing Run double mult(double z[], int n){ int i,j; i=0; for(j=0;j<n;j++){ i=i+j+1; z[i]=z[i]*(z[0]+1.0); } return z[n]; } Compiling fail.c, the GNU compiler (GCC) crashes: linux$ gcc O bug.c gcc: Internal error: program cc1 got fatal signal 11 What’s the error that causes this failure?
9 Cause What ’ s the cause for the GCC failure? The cause of any event ( “ effect ” ) is a preceding event without which the effect would not have occurred. — Microsoft Encarta To prove causality, we must show that: 1. The effect occurs when the cause occurs –failing run. 2. The effect does not occur when the cause does not occur – passing run. General technique: Experimentation—constructing a theory from a series of experiments (runs) Can’t we automate experimentation?
10 Isolating Failure Causes
11 Isolating Failure Causes
12 Isolating Failure Causes
13 Isolating Failure Causes
14 Isolating Failure Causes +1.0 is the failure cause – after only 19 tests
15 What ’ s going on in GCC?
16 What ’ s going on in GCC?
17 What ’ s going on in GCC?
18 What ’ s going on in GCC?
19 What ’ s going on in GCC? To fix the failure, we must break this cause-effect chain.
20 Small Cause, Big Effect How do we isolate the relevant state dierences?
21 Memory Graphs Vertices are variables. Edges are references.
22 The GCC Memory Graph
23 The Process in a Nutshell
24 The Process in a Nutshell
25 The Process in a Nutshell
26 The GCC Cause-Effect Chain
27 Submit buggy program Specify invocations Click on “ Debug it ” Diagnosis comes via
28 Visualization of Test Information to Assist Fault Localization Mary Jean Harrold John T. Stasko James A. Jones College of Computing Georgia Institute of Technology 24th International Conference on Software Engineering USA, May 2002
29 Visualization of Test Information to Assist Fault Localization James A. Jones, Mary Jean Harrold, John Stasko Higher frequency of execution of a statement of program by failure test cases Higher probability of having fault in the statement
30 Discrete Approach Input Source code For each test case its pass/fail status statements that it executes Display statements in program according to the test cases that execute them Only failed test cases Both passed & failed test cases Only passed test cases Statements executed by:
31 Example mid() { int x,y,z,m; 3,3,51,2,33,2,15,5,55,3,42,1,3 1: read(x,y,z); 2: m=z; 3: if(y<z) 4: if(x<y) 5: m=y; 6: elseif(x<z) 7: m=y; 8: else 9: if(x>y) 10: m=y; 11: elseif(x>z) 12: m=x; 13: print(m); } PPPPPF
32 Example mid() { int x,y,z,m; 3,3,51,2,33,2,15,5,55,3,42,1,3 1: read(x,y,z); ●●●●●● 2: m=z; ●●●●●● 3: if(y<z) ●●●●●● 4: if(x<y) ● 5: m=y; ● 6: elseif(x<z) ●●● 7: m=y; ●● 8: else ●●● 9: if(x>y) ● 10: m=y; ● 11: elseif(x>z) 12: m=x; 13: print(m); ●●●●●● } PPPPPF
33 Example mid() { int x,y,z,m; 3,3,51,2,33,2,15,5,55,3,42,1,3 1: read(x,y,z); ●●●●●● 2: m=z; ●●●●●● 3: if(y<z) ●●●●●● 4: if(x<y) ● 5: m=y; ● 6: elseif(x<z) ●●● 7: m=y; ●● 8: else ●●● 9: if(x>y) ● 10: m=y; ● 11: elseif(x>z) 12: m=x; 13: print(m); ●●●●●● } PPPPPF
34 Problem Not very helpful! Does not capture the relative frequency. 2,1,33,3,51,2,33,2,15,5,55,3,4 ●●●●●● 1: read(x,y,z); ●● 7: m=y; FPPPPP
35 Continuous Approach Distribute statements executed by both passed and failed test cases over spectrum. Indicate the relative success rate of each statement by its hue. Discrete Approach: Continuous Approach: Only failed test cases Both passed & failed test cases Only passed test cases
36 Continuous Approach - Hue Indicate the relative success rate of each statement by its hue.
37 Continuous Approach - Brightness Statements executed more frequently are rendered brighter
38 Back To Example mid() { int x,y,z,m; 3,3,51,2,33,2,15,5,55,3,42,1,3 1: read(x,y,z); ●●●●●● 2: m=z; ●●●●●● 3: if(y<z) ●●●●●● 4: if(x<y) ● 5: m=y; ● 6: elseif(x<z) ●●● 7: m=y; ●● 8: else ●●● 9: if(x>y) ● 10: m=y; ● 11: elseif(x>z) 12: m=x; 13: print(m); ●●●●●● } PPPPPF
39 Scalability } mid() { int x,y,z,m; 1: read(x,y,z); 2: m=z; 3: if(y<z) 4: if(x<y) 5: m=y; 6: elseif(x<z) 7: m=y; 8: else 9: if(x>y) 10: m=y; 11: elseif(x>z) 12: m=x; 13: print(m); }
40 Limitations double() { int x,d; : read(x); ●●● 2: d=abs(x+x); ●●● 3: print(d); ●●● } PPF Data related Bugs. Very test case depended.
41 Future Work What other views and analyses would be useful? What is the maximum practical number of faults for which this technique works? Visualization on higher-level representations of the code. Using visualization in other places.
42 Tarantula
43 Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst William G. Griswold David Notkin Jake Cockrell Dept. of Computer Science & Engineering University of Washington Dept. of Computer Science & Engineering University of California San Diego IEEE Transactions on Software Engineering 2001
44 Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, William G. Griswold, Jake Cockrell, David Notkin Problem Invariants are useful. Programmers (usually) don’t write invariants. Solution Dynamic Invariant Detection. Automatic tool: “Daikon“.
45 Example example from “ The Science of Programming, ” by Gries, 1981.
46 Example 100 randomly-generated arrays Length is uniformly distributed from 7 to 13 Elements are uniformly distributed from – 100 to 100 Daikon discovers invariant by running the program on this test set monitoring the values of the variables
47 Example 100 randomly-generated arrays Length is uniformly distributed from 7 to 13 Elements are uniformly distributed from – 100 to 100 Daikon discovers invariant by running the program on this test set monitoring the values of the variables Invariants produced by Daikon
48 Architecture
49 Instrumentation At program points of interest: Function entry points Loop heads Function exit points Output values of all `interesting' variables Scalar values (locals, globals, array subscript expressions, etc.) Arrays of scalar values Object addresses/ids More kinds of invariants checked for numeric types
50 Types of Invariants Variables x, y, z ; constants a, b, c Invariants over any variable x Constant value: x = a Uninitialized: x = uninit Small value set: x {a, b, c} variable takes a small set of values Invariants over a single numeric variable: Range limits: x a, x b, a x b Nonzero: x 0 Modulus: x = a (mod b) Nonmodulus: x a (mod b) reported only if x mod b takes on every value other than a
51 Types of Invariants Invariants over two numeric variables x, y Linear relationship: y = ax + b Ordering comparison: x y, x y, x y, x y, x = y, x y Functions: y = fn(x) or x = fn(y) where fn is absolute value, negation, bitwise complement Invariants over x+y invariants over single numeric variable where x+y is substituted for the variable Invariants over x-y
52 Types of Invariants Invariants over three numeric variables Linear relationship: z = ax + by + c, y=ax+bz+c, x=ay+bz+c Functions z = fn(x,y) where fn is min, max, multiplication, and, or, greatest common divisor, comparison, exponentiation, floating point rounding, division, modulus, left and right shifts All permutations of x, y, z are tested (three permutations for symmetric functions, 6 permutations for asymmetric functions)
53 Types of Invariants Invariants over a single sequence variable Range: minimum and maximum sequence values, ordered lexicographically Element ordering: nondecreasing, nonincreasing, equal Invariants over all sequence elements: such as each value in an array being nonnegative
54 Types of Invariants Invariants over two sequence variables: x, y Linear relationship: y = ax + b, elementwise Comparison: x y, x y, x y, x y, x = y, x y, performed lexicographically Subsequence relationship: x is a subsequence of y Reversal: x is the reverse of y Invariants over a sequence x and a numeric variable y Membership: x y
55 Derived Variables Variables not appearing in source text. array: length, sum, min, max array and scalar: element at index, subarray number of calls to a procedure Enable inference of more complex relationships. Staged derivation and invariant inference. avoid deriving meaningless values avoid computing tautological invariants
56 Invariant Confidence To make the tool useful, invariants must be supported by statistically significant number of different values. Daikon checks likelihood that invariant would occur by chance. Invariants filtered based on a minimum confidence parameter.
57 Invariant Confidence – show x ≠0 x in range of size r. Probability that x is not 0 is 1 – 1/r Given s samples then probability that x is never 0 is (1-1/r) s. If this probability is less than a user defined confidence level then x 0 is reported as an invariant.
58 Efficiency Efficiency of instrumentation Values of tracked variables are output at each instrumentation point Significant program slowdown, large amounts of trace data produced Efficiency of analysis Potentially cubic in number of variables at any program point Influenced more strongly by size of trace data
59 Limitations The instrumentation needs large disk space. We need large test suite. Needs human intervention.
60 Future Work Combining this dynamic invariant detection with a static one. Extending the types of invariants. Increasing relevance.
61 Automatic Extraction of Object- Oriented Component Interfaces Monica S. LamJohn WhaleyMichael C. Martin ISSTA 2002 Stanford University ACM SIGSOFT Distinguished Paper Award, 2002
62 Automatic Extraction of Object-Oriented Component Interfaces J. Whaley, M. C. Martin, M. S. Lam Documentation Based on the actual code, so no divergence Rules for static or dynamic checkers Find errors in API usage Find API bugs Discrepancy between code & intended API Dynamic extraction: Evaluation of test coverage
63 Interfaces? Interfaces are constraints on the orderings of method calls. Example, Method m1 can be called only after a call to method m2. Both methods m1 and m2 have to be called before method m3 is called.
64 Specification Use a Finite State Machine (FSM) to express ordering constraints. States correspond to methods Transitions imply the ordering constraints M2M1 Method M2 can be called after method M1 is called
65 Example: File open S TART read write close E ND
66 A Simple OO Component Model Each object follows an FSM model. One state per method, plus S TART & E ND states. Method call causes a transition to a new state. open S TART read write close E ND m1 ; m2 is legal, new state is m2 m1 m2
67 Problem 1 An object has two fields, a and b. Each field must be set before being read. set_a get_a set_b get_b S TART set_aget_aset_bget_bE ND set_a get_b
68 Problem 1 set_a get_a set_b get_b S TART set_aget_aset_bget_bE ND set_a get_b An object has two fields, a and b. Each field must be set before being read.
69 Solution: Splitting by Fields set_a get_a set_b get_b S TART set_aget_aset_bget_bE ND S TART set_bget_bE ND S TART set_a get_a E ND Separate by fields into different, independent submodels.
70 Problem 2 getFileDescriptor is state-preserving. start S TART create E ND connect close getFileDescriptor S TART getFileDescriptor Model for Socket
71 Problem 2 getFileDescriptor is state-preserving. start S TART create E ND connect close getFileDescriptor S TART getFileDescriptorconnect Model for Socket
72 Solution: State Preserving Methods start S TART create E ND connect close S TART getFileDescriptor m1 is state-modifying m2 is state-preserving m1 ; m2 is legal, new state is m1 m1m2
73 Extraction Techniques StaticDynamic For all possible program executions For one particular program execution ConservativeExact (for that execution) Analyze implementationAnalyze component usage Detect illegal transitionsDetect legal transitions Superset of ideal model (upper bound) Subset of ideal model (lower bound)
74 Extracting Interface Statically The static algorithm has two main steps: 1. For each method m identify those fields and predicates that guard whether exceptions can be thrown. 2. Find the methods m ’ that set those fields to values that can cause the exception. This means that immediate transitions from m ’ to m are illegal Complement of the illegal transitions forms the a model of transitions accepted by the static analysis.
75 Detecting Illegal Transitions Only support simple predicates Comparisons with constants, implicit null pointer checks Find pairs such that: Source must execute: field = const ; Target must execute: if (field == const) throw exception;
76 Static Model Extractor Defensive programming Implementation throws exceptions (user or system defined) on illegal input. public void connect() { connection = new Socket(); } public void read() { if (connection == null) throw new IOException(); } S TART connectread
77 Dynamic Extractor Goal: find the legal transitions that occur during an execution of the program. Java bytecode instrumentation. For each thread, each instance of a class: Track last state-modifying method for each submodel. Same mechanism for dynamic checking Instead of adding to model, flag exception.
78 Limitations The model is too simple – only one state history.
79 Future Work Interfaces between classes. E ND S TART ServerSocket.accept()Socket.close() Socket.getOutputStream()
80 Comparison Delta Debugging TarantulaDaikonInterfaces Extraction Main Use Isolating the cause of a failing run Visualization of fault localization Extract invariants Extract documentation Test Cases 1 pass case, 1 fail case Many pass / fail cases Many pass cases Examine Program statesource codeProgram state Program state and source code Humane involvement LowHighMedium
81 Comparison Delta Debugging TarantulaDaikonInterfaces Extraction Efficiency MediumHighLowHigh Detailed Result Medium – HighLowHighMedium Tool Availability Available. In the near future also for java. Not availableAvailableNot available
82 Summary Programmers aren ’ t going to be obsolete in the near future. Automatic tools can guide humans in the debugging process.
83 References Isolating Cause-Effect Chains from Computer Programs Isolating Cause-Effect Chains from Computer Programs Presentation of Andreas Zeller Presentaion of Jinlin Yang Presentaion of Jinlin Yang Visualization of Test Information to Assist Fault Localization Visualization of Test Information to Assist Fault Localization Paper presentation. Presentation of Jinlin Yang. Presentation of Jinlin Yang. Tarantula homepage.
84 References Dynamically Discovering Likely Program Invariants to Support Program Evolution Dynamically Discovering Likely Program Invariants to Support Program Evolution Daikon homepage. Daikon homepage. Presentation of Tevfik Bultan. Presentation of Tevfik Bultan. Presentation of Marcelo D ’ Amorim Presentation of Marcelo D ’ Amorim Presentation of Joel Winstead. Talk Sliedes. Presentation of David Hovemeyer.
85 References Automatic Extraction of Object-Oriented Component Interfaces Automatic Extraction of Object-Oriented Component Interfaces Presentation of John Whaley. Presentation of Tevfik Bultan. Presentation of Tevfik Bultan.