50.530: Software Engineering

50.530: Software Engineering
Sun Jun SUTD

Week 5: Bug Localization

Debugging Question 1: Bug Localization/Identification
How do we know where the bugs are? Question 2: Bug Fixing How do we fix the bugs automatically? Last class, we studied an approach for minimizing changes/differences in a test input.

Assume there is at least one failed test case, where is the bug?
Research Question Assume there is at least one failed test case, where is the bug?

Where the bug is? input list = … max = list[0] 1 2 3 i = 1 5 9
max = list[0] 1 2 3 i = 1 5 9 i >= list.length i < list.length 4 max >= list[i] 6 max < list[i] max = list[i] 7 8 i++ 10 previous=max 11 return max The input is wrong! output The output is wrong!

Where the bug is? input 1 2 3 4 5 79 78 77 76 75 … 80 output
1 2 3 4 5 79 78 77 76 75 … 80 output This might be when it went wrong?

Define Bugs Where is a bug? Or equivalently, what is a bug?
public int max (int[] list) { int max = list[0]; for (int i = 1; i < list.length-1; i++) { if (max < list[i]) { max = list[i]; } return max; Intuitively, the bug should be somewhere you wanna fix it?

public int max (int[] list) {
int max = list[0]; for (int i = 1; i < list.length; i++) { if (max < list[i]) { max = list[i]; } return max; How could we reasonably assume where the bug is? public int max (int[] list) { int max = list[list.length-1]; for (int i = 0; i < list.length-1; i++) { if (max < list[i]) { max = list[i]; } return max; public int max (int[] list) { int max = list[0]; for (int i = 1; i < list.length-1; i++) { if (max < list[i]) { max = list[i]; } return max(max, list[list.length-1]);

Ideally This is what the programmer wrote.
This is what the programmer wants. public int max (int[] list) { int max = list[0]; for (int i = 1; i < list.length-1; i++) { if (max < list[i]) { max = list[i]; } return max; public int max (int[] list) { int max = list[0]; for (int i = 1; i < list.length; i++) { if (max < list[i]) { max = list[i]; } return max; Obviously, the bug is at …

Ideally This is what the programmer wrote.
I want max to be the first element in the array here. public int max (int[] list) { int max = list[0]; for (int i = 1; i < list.length-1; i++) { if (max < list[i]) { max = list[i]; } return max; I want to traverse through all but the first element in the array I want to set max to be the bigger one of max and list[i]. I want to return the maximum number in the array. And so, the bug is at …

Where the bug is? Where the bug is depends on what the programmer wants at each step. How do we know what the programmer wants (at each step of the program)?

James A. Jones et at, ASE 2005 “Empirical evaluation of the tarantula automatic fault-localization technique”

Where the bug is? input 1 2 3 4 5 79 78 77 76 75 … 80 output
1 2 3 4 5 79 78 77 76 75 … 80 output This might be when it went wrong? We don’t know what the programmer wants; but we do know something has to be fixed along this trace.

Coverage-based Methods
A statement is more likely to be buggy if it is visited more often in failed test cases and less often in passed test cases. Is this justified?

A: Set Union/Intersection
Assume there are one or more passed test cases {p0, p1, p2, …} and one failed test case f. Proposal 1: the bug is contained in the set {statements executed by f} – {statements executed by any pi} Proposal 2: the bug is contained in the set {statements executed by all pi}

Exercise 1: Apply A Why the bug must be at line 7?

Exercise 2: Apply A A performs poorly in practice – sensitive to the test cases.

B: SDG Ranking Techniques
Identify the initial set of blamed statements; rank them the highest Find statements of distance 1 from the initial set (in both forward and backward direction); rank them the second highest Find statements of distance 2 and so on. Is this justified intuitively? B performs poorly in practice.

C: Nearest Neighbor Pick a passed test case p which is the nearest to the failed test case f. Distance between f and p could be defined as: the number of statements in f which are not in p. Report f-p as the potential bugs.

Exercise 3: Apply C

Exercise 4: Apply C

D: Tarantula Each statement in the program is assigned a suspiciousness score where failed(e) is the number of failed test cases that executed statement e one or more times.

Exercise 5: Apply D

Are these methods (A, B, C, D) justified?
Research Discussion Are these methods (A, B, C, D) justified?

“Isolating Cause-effect Chains from program programs?”
Andreas Zeller, FSE 2002 “Isolating Cause-effect Chains from program programs?”

Example Recall that the following example crashes the GCC compiler. What caused this crash and how to debug the GCC compiler?

We know 1 2 3 5 7 8 4 9 10 11 … “” 1 2 3 5 6 7 8 4 9 10 11 …

DD# Algorithm algorithm DD#(P, F, N) {
partition F-P into X1, X2, …, Xn equally; if (test(P union Xi) = FAIL for some Xi) {//reduce F return DD#(P, P union Xi, 2); } if (test(F - Xi) = PASS for some Xi) {//grow P return DD#(F-Xi, F, 2); if (test(P union Xi = PASS)) {//grow P return DD#(P union Xi, F, max(N-1, 2))); if (test(F - Xi = FAIL)) {//reduce F return DD#(P, F-Xi, max(N-1, 2))); if (N < |F-P|) {//increase granularity return DD#(P, F, min(2N, |F-P|)); return (P, F);

DD#: Application Minimum failure inducing difference

Now We know 1 2 3 5 7 8 4 9 10 11 … 1 2 3 5 6 7 8 4 9 10 11 …

Question How is the information on minimum failure inducing difference useful? Why can’t we simply compare the two traces and find where the bug is?

The Idea 1 2 3 5 7 8 4 9 10 11 … What is the failure-inducing difference here? Or at any state? 1 2 3 5 6 7 8 4 9 10 11 …

The Idea We arbitrarily change a program state (in the trace) so as to find the minimum failure-inducing difference.

What Are Required A debugging tool which allows us to retrieve and alter variables and values. A successful test case and a failed test case and a common location L

1 2 3 5 7 8 4 9 10 11 … 1 2 3 5 6 7 8 4 9 10 11 …

So obviously the problem is with ox81fc4e4. But is this useful?
Delta Debugging Your question: how is this useful? So obviously the problem is with ox81fc4e4. But is this useful?

More Delta Debugging 0x81fc4a0 and 0x81fc4e4 are memory references – they can’t be compared directly. Rather, compare the memory graph at these two states and find exact the difference.

The Memory Graph

Comparing Memory Graphs
To use Delta Debugging, we need to define a set of changes on the good memory graph so that it transforms to the bad one.

Changes on Memory Graphs

Changes on Memory Graphs
Compute the largest common subgraph of the two graphs For all vertex which are not part of the common subgraph, either insert or delete.

A Real Memory Graph

Recap We know now how to apply Delta Debugging to two program states during two traces We need to solve still given a sequence of program states, which ones to compare? how to present the result to the users?

Where to Compare We compare states which have identical valuation of the program counter and the local variable. The paper suggests three places Shortly after the program states In the middle of the program run Shortly before the failure Why we can’t compare states with different local variables or program counter? Are the places justified?

GCC: Shortly After Start
Memory graph: vertices and edges The only difference is in argv[2], i.e., the names of the input source files. Useful?

GCC: In the Middle Memory graph: vertices and edges Memory graph: vertices and edges The difference is the insertion of a node I the RTL tree containing a PLUS operator. Useful?

GCC: Shortly before Failure
Memory graph: vertices and edges Memory graph: vertices and edges The only difference is a single pointer adjustment: set variable link->fld[0].rtx->fld[0].rtx = link Useful?

Report from Delta Debugging
Useful?

Recap We know now how to apply Delta Debugging to multiple pairs of program states during two traces How the results are presented to users (using the minimum failure inducing differences) We need to solve still where the bug is?

public int max (int[] list) {
int max = list[0]; for (int i = 1; i < list.length; i++) { if (max < list[i]) { max = list[i]; } return max; Which fix the programmer has in mind determines where the bug is! In other words, where the bug is depends on what the programmer wants at each step. public int max (int[] list) { int max = list[list.length-1]; for (int i = 1; i < list.length-1; i++) { if (max < list[i]) { max = list[i]; } return max; public int max (int[] list) { int max = list[0]; for (int i = 1; i < list.length-1; i++) { if (max < list[i]) { max = list[i]; } return max(max, list[list.length-1]);

Bug Localization with DD
Iteration 1: The programmer is fine with until the start of GCC. The failure is not what the programmer wants. The bug must be in between.

Iteration 2: This is what the programmer wants. The failure is not what the programmer wants. The bug must be in between.

Iteration 3: This is what the programmer wants. This is not what the programmer wants. The bug must be in between.

The bug is narrowed down to line 4013 to line 4019, where the bug is truly is.

Quick Summary Identify minimum difference between two traces
Programmers are to look at the minimum difference to locate the bug step-by-step.

“Locating Causes of program Failures”
Holger Cleve et al, ICSE 2005 “Locating Causes of program Failures”

Previous Work 1 2 3 5 6 7 8 4 9 10 11 … The failure-inducing difference is on x’s value. The failure-inducing difference is on y’s value.

Motivation Y’s value is related to x; The failure cause transfers from x to y. Focus on the failure transfer might help find the bug. 1 2 3 5 6 7 8 4 9 … The failure-inducing difference is on x’s value. The failure-inducing difference is on y’s value. Is this justified?

Motivation: Example $ sample 9 8 7 7 8 9 $ sample 11 14 0 11
When i = 1; nothing happens; when I = 2, …. $ sample 9 8 7 7 8 9 $ sample 11 14 0 11 Where the bug is?

Delta Debugging Compare the program states at line 9 reveals that a[2] is the failure cause, i.e., changing its value would make the test case fail/succeed.

At line 9, a[2] is the failure cause
This is the part where a[2]’s value is affected by argc. It perhaps contains the bug! When i = 1; nothing happens; when I = 2, …. At line 28, argc is the failure cause

the searching order the variable whose value is related to the minimum failure-inducing difference Where the transfer happens

Case Study: GCC The actual bug is during this “transfer”.

Assumptions Revisited
For Delta Debugging to work, the following have to be true: The DD# algorithm finds the correct cause of the failure. The failure transfer statements are where the programmer would like to fix the bug (and hence where the bug is).

Problematic? Assume we given the following program and Delta Debugging run. How do we make of the result?

Empirical Study Task: Find the bug.
5 Contenders: Set union, Set interaction, Nearest Neighbor, Tarantula and Delta Debugging 2 measures Effectiveness : the rank of the buggy statement Efficiency: the time consumption of each method

122 out of 132 versions are used for the analysis.
Objects of Analysis 122 out of 132 versions are used for the analysis. Is this fair to DD?

Effectiveness Results
NN: Nearest Neighbor (with different distance calculation); CT: DD (with different ranking techniques)

Effectiveness Results

Efficiency Results

Discussion Is this conclusive?
Can we generalize the results to other programs in general?

Exercise 6 Take this program and this input as example. Argue whether the 5 bug localization methods would be able to find the bug.

Where the bug is? Where the bug is depends on what the programmer wants at each step. How do we know what the programmer wants (at each step of the program)?

50.530: Software Engineering

Similar presentations

Presentation on theme: "50.530: Software Engineering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

50.530: Software Engineering

Similar presentations

Presentation on theme: "50.530: Software Engineering"— Presentation transcript:

Similar presentations

About project

Feedback