Download presentation
Presentation is loading. Please wait.
1
CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)
2
A Very Important Principle Traditional debugging techniques deal with single (or very few) executions. With the acquisition of a large set of executions, including passing and failing executions, statistical debugging is often highly effective. Failure reporting In house testing
3
Tarantula (ASE 2005, ISSTA 2007)
4
Scalable Remote Bug Isolation (PLDI 2004, 2005) Look at predicates Branches Function returns ( 0, >=0, ==0, !=0) Scalar pairs For each assignment x=…, find all variables y_i and constants c_j, each pair of x (=,<,<=…) y_i/c_j Sample the predicate evaluations (Bernoulli sampling) Investigate the relation of the probability of a predicate being true with the bug manifestion.
5
Bug Isolation
6
How much does P being true increase the probability of failure over simply reaching the line P is sampled.
7
An Example Symptoms 563 lines of C code 130 out of 5542 test cases fail to give correct outputs No crashes The predicate are evaluated to both true and false in one execution void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >= 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0) && (lastm != m) ){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } not enough
8
void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0) && (lastm != m) ){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } P_f (A) = tilde P (A | A & !B) P_t (A) = tilde P (A | !(A&!B))
9
Program Predicates A predicate is a proposition about any program properties e.g., idx 0 … Each can be evaluated multiple times during one execution Every evaluation gives either true or false Therefore, a predicate is simply a boolean random variable, which encodes program executions from a particular aspect.
10
Evaluation Bias of Predicate P Evaluation bias Def’n: the probability of being evaluated as true within one execution Maximum likelihood estimation: Number of true evaluations over the total number of evaluations in one run Each run gives one observation of evaluation bias for predicate P Suppose we have n correct and m incorrect executions, for any predicate P, we end up with An observation sequence for correct runs S_p = (X’_1, X’_2, …, X’_n) An observation sequence for incorrect runs S_f = (X_1, X_2, …, X_m) Can we infer whether P is suspicious based on S_p and S_f?
11
Underlying Populations Imagine the underlying distribution of evaluation bias for correct and incorrect executions are and S_p and S_f can be viewed as a random sample from the underlying populations respectively One major heuristic is The larger the divergence between and, the more relevant the predicate P is to the bug 01 Prob Evaluation bias 01 Prob Evaluation bias
12
Major Challenges No knowledge of the closed forms of both distributions Usually, we do not have sufficient incorrect executions to estimate reliably. 01 Prob Evaluation bias 01 Prob Evaluation bias
13
Our Approach
14
Algorithm Outputs A ranked list of program predicates w.r.t. the bug relevance score s(P) Higher-ranked predicates are regarded more relevant to the bug What’s the use? Top-ranked predicates suggest the possible buggy regions Several predicates may point to the same region … …
15
Outline Program Predicates Predicate Rankings Experimental Results Case Study: bc-1.06 Future Work Conclusions
16
Experiment Results Localization quality metric Software bug benchmark Quantitative metric Related works Cause Transition (CT), [CZ05] Statistical Debugging, [LN+05] Performance comparisons
17
Bug Benchmark Bug benchmark Dreaming benchmark Large number of known bugs on large-scale programs with adequate test suite Siemens Program Suite 130 variants of 7 subject programs, each of 100-600 LOC 130 known bugs in total mainly logic (or semantic) bugs Advantages Known bugs, thus judgments are objective Large number of bugs, thus comparative study is statistically significant. Disadvantages Small-scaled subject programs State-of-the-art performance, so far claimed in literature, Cause-transition approach, [CZ05]
18
Localization Quality Metric [RR03]
19
1st Example 1 2 3 5 4 9 6 10 8 7 T-score = 70%
20
2nd Example 1 2 3 74 9 6 10 5 T-score = 20% 8
21
Related Works Cause Transition (CT) approach [CZ05] A variant of delta debugging [Z02]delta debugging Previous state-of-the-art performance holder on Siemens suite Published in ICSE’05, May 15, 2005 Cons: it relies on memory abnormality, hence its performance is restricted. Statistical Debugging (Liblit05) [LN+05] Predicate ranking based on discriminant analysis Published in PLDI’05, June 12, 2005 Cons: Ignores evaluation patterns of predicates within each execution
22
Localized bugs w.r.t. Examined Code
23
Cumulative Effects w.r.t. Code Examination
24
Top-k Selection Regardless of specific selection of k, both Liblit05 and SOBER are better than CT, the current state-of-the-art holder From k=2 to 10, SOBER is better than Liblit05 consistently
25
Outline Evaluation Bias of Predicates Predicate Rankings Experimental Results Case Study: bc-1.06 Future Work Conclusions
26
Case Study: bc 1.06 bc 1.06 14288 LOC An arbitrary-precision calculator shipped with most distributions of Unix/Linux Two bugs were localized One was reported by Liblit in [LN+05] One was not reported previously Some lights on scalability
27
Outline Evaluation Bias of Predicates Predicate Rankings Experimental Results Case Study: bc-1.06 Future Work Conclusions
28
Future Work Further leverage the localization quality Robustness to sampling Torture on large-scale programs to confirm its scalability to code size …
29
Conclusions We devised a principled statistical method for bug localization. No parameter setting hassles It handles both crashing and noncrashing bugs. Best quality so far.
30
Discussion Features Easy implementation Difficult experimentation More advanced statistical technique may not be necessary Go wide, not go deep… Predicates are treated as independent random variables. Can execution indexing help? Can statistical principles be combined with slicing or IWIH ?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.