Download presentation
Presentation is loading. Please wait.
1
(slides adapted from Tevfik Bultan’s )
Delta Debugging Xiangyu Zhang (slides adapted from Tevfik Bultan’s )
2
Problem In 1999 Bugzilla, the bug database for the browser Mozilla, listed more than 370 open bugs Each bug in the database describes a scenario which caused software to fail these scenarios are not simplified they may contain a lot of irrelevant information a lot of the bug reports could be equivalent Overwhelmed with this work Mozilla developers sent out a call for volunteers Process the bug reports by producing simplified bug reports Simplifying means: turning the bug reports into minimal test cases where every part of the input would be significant in reproducing the failure A scenario includes inputs, machine configurations, etc.
3
An Example Bug Report Printing the following file causes Mozilla to crash: <td align=left valign=top> <SELECT NAME="op sys" MULTIPLE SIZE=7> <OPTION VALUE="All">All<OPTION VALUE="Windows 3.1">Windows 3.1<OPTION VALUE="Windows 95">Windows 95<OPTION VALUE="Windows 98">Windows 98<OPTION VALUE="Windows ME">Windows ME<OPTION VALUE="Windows 2000">Windows 2000<OPTION VALUE="Windows NT">Windows NT<OPTION VALUE="Mac System 7">Mac System 7<OPTION VALUE="Mac System 7.5">Mac System 7.5<OPTION VALUE="Mac System 7.6.1">Mac System 7.6.1<OPTION VALUE="Mac System 8.0">Mac System 8.0<OPTION VALUE="Mac System 8.5">Mac System 8.5<OPTION VALUE="Mac System 8.6">Mac System 8.6<OPTION VALUE="Mac System 9.x">Mac System 9.x<OPTION VALUE="MacOS X">MacOS X<OPTION VALUE="Linux">Linux<OPTION VALUE="BSDI">BSDI<OPTION VALUE="FreeBSD">FreeBSD<OPTION VALUE="NetBSD">NetBSD<OPTION VALUE="OpenBSD">OpenBSD<OPTION VALUE="AIX">AIX<OPTION Continued in the next page
4
VALUE="BeOS">BeOS<OPTION VALUE="HP-UX">HP-UX<OPTION
VALUE="IRIX">IRIX<OPTION VALUE="Neutrino">Neutrino<OPTION VALUE="OpenVMS">OpenVMS<OPTION VALUE="OS/2">OS/2<OPTION VALUE="OSF/1">OSF/1<OPTION VALUE="Solaris">Solaris<OPTION VALUE="SunOS">SunOS<OPTION VALUE="other">other</SELECT></td> <td align=left valign=top> <SELECT NAME="priority" MULTIPLE SIZE=7> <OPTION VALUE="--">--<OPTION VALUE="P1">P1<OPTION VALUE="P2">P2<OPTION VALUE="P3">P3<OPTION VALUE="P4">P4<OPTION VALUE="P5">P5</SELECT> </td> <SELECT NAME="bug severity" MULTIPLE SIZE=7> <OPTION VALUE="blocker">blocker<OPTION VALUE="critical">critical<OPTION VALUE="major">major<OPTION VALUE="normal">normal<OPTION VALUE="minor">minor<OPTION VALUE="trivial">trivial<OPTION VALUE="enhancement">enhancement</SELECT> </tr> </table>
5
Delta-Debugging It is hard to figure out what the real cause of the failure is just by staring at that file It would be very helpful in finding the error if we can simplify the input file and still generate the same failure A more desirable bug report looks like this Printing an HTML file which consists of: <SELECT> causes Mozilla to crash. The question is: Can we automate this? Andreas Zeller
6
Overview Let’s use a smaller bug report as a running example:
When Mozilla tries to print the following HTML input it crashes: <SELECT NAME="priority" MULTIPLE SIZE=7> How do we go about simplifying this input? Manually remove parts of the input and see if it still causes the program to crash For the above example assume that we remove characters from the input file Potentially, we need 2^N tries with N being the size of the string.
7
Bold parts remain in the input, the rest is removed
1 <SELECT NAME="priority" MULTIPLE SIZE=7> F 2 <SELECT NAME="priority" MULTIPLE SIZE=7> P 3 <SELECT NAME="priority" MULTIPLE SIZE=7> P 4 <SELECT NAME="priority" MULTIPLE SIZE=7> P 5 <SELECT NAME="priority" MULTIPLE SIZE=7> F 6 <SELECT NAME="priority" MULTIPLE SIZE=7> F 7 <SELECT NAME="priority" MULTIPLE SIZE=7> P 8 <SELECT NAME="priority" MULTIPLE SIZE=7> P 9 <SELECT NAME="priority" MULTIPLE SIZE=7> P 10 <SELECT NAME="priority" MULTIPLE SIZE=7> F 11 <SELECT NAME="priority" MULTIPLE SIZE=7> P 12 <SELECT NAME="priority" MULTIPLE SIZE=7> P 13 <SELECT NAME="priority" MULTIPLE SIZE=7> P P means that it produces the desired output. For example, if the input is syntactically incorrect, the desired output is a syntactic error message. F means input caused failure P means input did not cause failure (input passed)
8
14 <SELECT NAME="priority" MULTIPLE SIZE=7> P
16 <SELECT NAME="priority" MULTIPLE SIZE=7> F 17 <SELECT NAME="priority" MULTIPLE SIZE=7> F 18 <SELECT NAME="priority" MULTIPLE SIZE=7> F 19 <SELECT NAME="priority" MULTIPLE SIZE=7> P 20 <SELECT NAME="priority" MULTIPLE SIZE=7> P 21 <SELECT NAME="priority" MULTIPLE SIZE=7> P 22 <SELECT NAME="priority" MULTIPLE SIZE=7> P 23 <SELECT NAME="priority" MULTIPLE SIZE=7> P 24 <SELECT NAME="priority" MULTIPLE SIZE=7> P 25 <SELECT NAME="priority" MULTIPLE SIZE=7> P 26 <SELECT NAME="priority" MULTIPLE SIZE=7> F
9
Example After 26 tries we found that: <SELECT>
Printing an HTML file which consists of: <SELECT> causes Mozilla to crash. Delta debugging technique automates this approach of repeated trials for reducing the input.
10
Changes That Cause Failures
To describe the algorithm we first need to define the process In general the delta debugging technique deals with changeable circumstances circumstances whose change may cause a different program behavior You can think of circumstances (meaning changeable circumstances) as all the possible behaviors of the environment of the program There are some circumstances that may not change program behavior, e.g., changing your machine hard disk from 250GB to 500GB may not change the program behavior. It indeed means program input. But Zella wanted to pitch his technique high. He tried to model all factors that could affect program execution such as timing, memory contention, environmental setup, etc. He formalized his technique in a very rigid way although my person view is that it is indeed a very practical technique.
11
Circumstances and Failures
Let E be the set of possible configurations of circumstances each r E determines a specific program run If the environment is fixed the program executes deterministically rP E corresponds to a run that passes rF E corresponds to a run that fails
12
Changes We can go from one circumstance to another by changes
A change is a mapping : E E which takes one circumstance and changes it to another circumstance changes one program run to another program run by changing its environment (input) A relevant change is a change such that (rP) = rF Change makes the program fail
13
Decomposing Changes A change can be decomposed to a number of elementary changes 1, 2, ..., n where = 1 o 2 o ... o n where (i o j)(r) = i(j(r)) For example, deleting a part of the input file can be decomposed to deleting characters one by one from the input file another way to say it: by composing deleting of single characters we can get a change that deletes part of the input file
14
Testing Given an r E we define a function rtest: E {P, F, ?}
The function rtest determines if an r E causes the program to fail (F) or pass (P) The output ? means that the result is indeterminate Maybe the input caused another failure other than the one we are trying to capture We can compute the function rtest by running the program Rtest is also called the test oracle. If we can not classify a run as being passing or failing, we put it into the ? category. Indeed, this is a big limitation of delta debugging.
15
Test Cases Given a run rP that does not cause a failure
cF = {1, 2, ..., n } denotes a set of changes such that rF = (1 o 2 o ... o n )(rP) cP is defined as cP = c cF is called a test case To summarize We have a run without failure rP We have a set of changes cF = {1, 2, ..., n } such that rF = (1 o 2 o ... o n )(rP) where rF is a run with failure Each subset c of cF is a test case A little weird name (test case). It is indeed a subsequence of the original failure inducing input. It is called a test case because the subsequence is tested on the program to see if it fails or succeeds.
16
Testing Test Cases Given a test case c, we would like to know if the run generated by changing rP by the changes in c is a run that causes a failure We define a function test: Powerset(cF) {P, F, ?} such that, given c={1, 2, ..., m} cF test(c) = rtest((1 o 2 o ... o n )(rP)) Note that test(cP) = test() = P test(cF) = test({1, 2, ..., n}) = F
17
Minimizing Test Cases Now the question is: Can we find the minimal test case c such that test(c) = F? A test case c cF is called the global minimum of cF if for all c’ cF , |c’| < |c| test(c’) F Global minimum is the smallest set of changes which will make the program fail Finding the global minimum may require us to perform exponential number of tests
18
Minimizing Test Cases A test case c cF is called a local minimum of cF if for all c’ c , test(c’) F A test case c cF is n-minimal if for all c’ c , |c| |c’| n test(c’) F A test case is 1-minimal if for all i c , test(c – {i}) F
19
Minimization Algorithm
The delta debugging algorithm finds a 1-minimal test case It partitions the set cF to 1, 2, ... n 1, 2, ... n are pairwise disjoint cF = 1 2 ... n Define the complement of i as i = cF i Tests each test case defined by the partition and their complements Reduce the test case if a smaller failure inducing set is found otherwise refine the partition The complement of a delta is a nabla.
20
Each Step of the Minimization Algorithm
Test each 1, 2, ... n and each 1, 2, ..., n There are four possible outcomes Some i causes failure Partition i to two and continue with i as the test set Some i causes failure Continue with i as the test set with n 1 subsets No test causes failure Increase granularity by generating a partition with 2n subsets The granularity can no longer be increased Done, found the 1-minimal subset In the worst case the algorithm performs |cF|2 + 3|cF| tests For example an n character input requires n2+3n tests in the worst case (highly unlikely to occur in practice)
21
Note: Changes vs. Minimizing the Input
Our goal is to find the smallest change that makes the program fail, i.e, (rP) = rF If we pick rP to be some trivial input (such as empty string) for which the program does not fail, then finding the smallest fault inducing change becomes equivalent to finding the smallest input that makes the program fail.
22
Example Assume that Input string “ab*defgh” causes the program fail: rF = ab*defgh Program does not fail on empty string: rP = If we assume that any string that contains “*” causes the program fail, then how will the delta-debugging algorithm behave for this case? Let’s define i to mean: “Include the i’th character of the failure inducing case in the modified input” Then, for our example, we have: 1() = a, 2() = b, 3() = *, 4() = d, … Note that, the changes are composable: 2(1()) = ab, 3(2(1())) = ab*, 4(3(2(1()))) = ab*d, …
23
Example Let’s run the algorithm:
Initially: 1 = {1, 2, 3, 4} = 2, 2 = {5, 6, 7, 8} = 1 test(1) = test({1, 2, 3, 4}) = rtest(4(3(2(1())))) = rtest(ab*d) = F test(2) = test({5, 6, 7, 8}) = rtest(5(6(7(8())))) = rtest(efgh) = P This means that we are in the case 1 of the algorithm: Some i causes failure Partition i to two and continue with i as the test set
24
Example We partition 1 as {1, 2} and {3, 4}
So now we have: 1 = {1, 2} = 2, 2 = {3, 4} = 1 test(1) = test({1, 2}) = rtest(2(1())) = rtest(ab) = P test(2) = test({3, 4}) = rtest(4(3())) = rtest(*d) = F This means that we are again in case 1 of the algorithm: Some i causes failure Partition i to two and continue with i as the test set
25
Example We partition 1 as {3} and {4}
So now we have: 1 = {3} = 2, 2 = {4} = 1 test(1) = test({3}) = ctest(3()) = ctest(*) = F test(2) = test({4}) = ctest(4()) = ctest(d) = P We are now in case 4 of the algorithm: 4. The granularity can no longer be increased Done, found the 1-minimal subset The result is 3() = *
26
It was mentioned by one student that DD may not be able to handle the following case.
Input string is “*abcdef*”, the program fails if both *s appear in the input. The answer is that DD can handle this.
27
Case Studies The following C program causes GCC to crash
#define SIZE 20 double mult(double z[], int n) { int i , j ; i = 0; for (j = 0; j < n; j++) { i = i + j + 1; z[i] = z[i] *(z[0]+1.0); return z[n]; } Continued in the next page
28
void copy(double to[], double from[], int count)
{ int n = count + 7) / 8; switch(count % 8) do { case 0: *to++ = *from++; case 7: *to++ = *from++; case 6: *to++ = *from++; case 5: *to++ = *from++; case 4: *to++ = *from++; case 3: *to++ = *from++; case 2: *to++ = *from++; case 1: *to++ = *from++; } while ( --n > 0); return mult(to, 2); } int main(int argc, char *argv[]) double x[SIZE], y[SIZE]; double *px = x; while (px < x + SIZE) *px++ = (px – x) * (SIZE + 1.0); return copy(y, x, SIZE);
29
Case Studies The original input file 755 characters
Delta debugging algorithm minimizes the input file to the following file with 77 characters If a single character is removed from this file then it does not induce the failure t(double z[],int n){int i,j;for(;;){i=i+j+1;z[i]=z[i]* (z[0]+0);}return[n];}
30
Isolating Failure Inducing Differences
Instead of minimizing the input that causes the failure we can also try to isolate the differences that cause the failure Minimization means to make each part of the simplified test case relevant: removing any part makes the failure go away Isolation means to find one relevant part of the test case: removing this particular part makes the failure go away For example changing the input from <SELECT NAME="priority" MULTIPLE SIZE=7> to SELECT NAME="priority" MULTIPLE SIZE=7> makes the failure go away This means that inserting the character < is a failure inducing difference Delta debugging algorithm can be modified to look for minimal failure inducing differences Although it is not as popular, it is quite useful in some applications. Particularly useful for debugging.
31
Failure Inducing Differences: Example
Changing the input program for GCC from the one on the left to the one on the right removes the failure This input does not cause failure This input causes failure #define SIZE 20 double mult(double z[], int n) { int i , j ; i = 0; for (j = 0; j < n; j++) { i + j + 1; z[i] = z[i] *(z[0]+1.0); return z[n]; } #define SIZE 20 double mult(double z[], int n) { int i , j ; i = 0; for (j = 0; j < n; j++) { i = i + j + 1; z[i] = z[i] *(z[0]+1.0); return z[n]; } Modified statement is shown in box
32
Discussions How to compose a failure report for general purpose applications Input? (gui, env…) Output? (how to express?) Textual description often fail to provide direct help How to associate symptoms to code? DD on scheduling decisions: Given a thread schedule for which a concurrent program works and another for which the program fails, delta debugging algorithm can narrow down the differences between two thread schedules and find the locations where a thread switch causes the program to fail. Chipping Given two versions of a program such that one works correctly and the other one fails, delta debugging algorithm can be used to look for changes which are responsible for introducing the failure Fault Localization – apply DD to memory state DD is more for textual inputs. For example, when I was using powerpoint last night. I found that I can not update a file that I had already closed. I kept getting the warning of “the file is being used”. Well, the programmer of powerpoint may know where to look for the error, but such a report is almost useless for an automated system, which expect to have a report like “at execution point X, a wrong value V is observed”. Scheduling decision T1: if (p) { free (p) } T2: P=0;
33
Discussions the rtest function. a large number of runs required.
DD + Jockey. 1-minimality is often not desirable (old slides).
34
Statistical Debugging
35
What is statistical debugging
It relies on a large pool of test cases. Some failing and the other passing. Dynamic info from both passing and failing cases are aggregated to localize the possible faulty statements. The end outcome is often a ranked list of statements. Tarantula Hypothesis: a faulty statement is more likely executed in failing runs. F(s)/P(s): the number of failing/passing cases that execute s. F(s) |failing| Suspiciousness(s)= F(s) P(s) + |failing| |passing|
36
Scalable Remote Bug Isolation
Tarantula requires a large pool of test cases, which may not be available. Idea: rely on deployed systems and end users to provide needed dynamic information. Based on predicates Branch predicates Function return (<0, ==0, <=0, …) Scalar pairs For each assignment x=…, from some other variables yi and some constants ci, acquire (x==yi, x<=yi, ... x==ci) Collect evaluation of these predicates.
37
Statistical analysis F(p)/S(p) – how many test cases in which p is true and the program fails/passes. F(p) Failure(p)= S(p) + F(p) f=…; if (f==null) { x=0; …=*f; } Assume we have 100 test cases with 90 passing and 10 failing.
38
Suspiciousness(p)=Failure(p)-Context(p)
However, there are predicates whose executions only occur in failing runs but not faulty. A context value is computed to adjust the suspiciousness F(p is observed) / S(p is observed): how many cases in which p is executed and the program fails/passes F(p is observed) Context(p)= S(p is observed) F(p is observed) + Suspiciousness(p)=Failure(p)-Context(p) 0. x=10; f=…; if (f==null) { 2.5 if (x>0) x=0; …=*f; 5. } Assume we have 100 test cases with 90 passing and 10 failing. Context
39
Scalability Distribute instrumentation to multiple versions. Sampling
Create two versions of a function, one is the original, the other is the instrumented Using a counter instead of a % operation to perform sampling Assume one sample is collected for n instances counter--; If (!counter) { counter=n; call the instrumented version; } else { call the original version; }
40
Limitation The faulty predicate may be ever evaluated to true in both passing and failing cases (when it is nested in loops). SOBER
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.