CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

A Very Important Principle  Traditional debugging techniques deal with single (or very few) executions.  With the acquisition of a large set of executions, including passing and failing executions, statistical debugging is often highly effective. Failure reporting In house testing

Tarantula (ASE 2005, ISSTA 2007)

Scalable Remote Bug Isolation (PLDI 2004, 2005)  Look at predicates Branches Function returns ( 0, >=0, ==0, !=0) Scalar pairs  For each assignment x=…, find all variables y_i and constants c_j, each pair of x (=,<,<=…) y_i/c_j  Sample the predicate evaluations (Bernoulli sampling) Investigate the relation of the probability of a predicate being true with the bug manifestion.

Bug Isolation

How much does P being true increase the probability of failure over simply reaching the line P is sampled.

An Example  Symptoms 563 lines of C code 130 out of 5542 test cases fail to give correct outputs No crashes  The predicate are evaluated to both true and false in one execution void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >= 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0) && (lastm != m) ){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } not enough

void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0) && (lastm != m) ){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } P_f (A) = tilde P (A | A & !B) P_t (A) = tilde P (A | !(A&!B))

Program Predicates  A predicate is a proposition about any program properties e.g., idx 0 … Each can be evaluated multiple times during one execution Every evaluation gives either true or false  Therefore, a predicate is simply a boolean random variable, which encodes program executions from a particular aspect.

Evaluation Bias of Predicate P  Evaluation bias Def’n: the probability of being evaluated as true within one execution Maximum likelihood estimation: Number of true evaluations over the total number of evaluations in one run Each run gives one observation of evaluation bias for predicate P  Suppose we have n correct and m incorrect executions, for any predicate P, we end up with An observation sequence for correct runs  S_p = (X’_1, X’_2, …, X’_n) An observation sequence for incorrect runs  S_f = (X_1, X_2, …, X_m)  Can we infer whether P is suspicious based on S_p and S_f?

Underlying Populations  Imagine the underlying distribution of evaluation bias for correct and incorrect executions are and  S_p and S_f can be viewed as a random sample from the underlying populations respectively  One major heuristic is The larger the divergence between and, the more relevant the predicate P is to the bug 01 Prob Evaluation bias 01 Prob Evaluation bias

Major Challenges  No knowledge of the closed forms of both distributions  Usually, we do not have sufficient incorrect executions to estimate reliably. 01 Prob Evaluation bias 01 Prob Evaluation bias

Our Approach

Algorithm Outputs  A ranked list of program predicates w.r.t. the bug relevance score s(P) Higher-ranked predicates are regarded more relevant to the bug  What’s the use? Top-ranked predicates suggest the possible buggy regions Several predicates may point to the same region … …

Outline  Program Predicates  Predicate Rankings  Experimental Results  Case Study: bc-1.06  Future Work  Conclusions

Experiment Results  Localization quality metric Software bug benchmark Quantitative metric  Related works Cause Transition (CT), [CZ05] Statistical Debugging, [LN+05]  Performance comparisons

Bug Benchmark  Bug benchmark Dreaming benchmark  Large number of known bugs on large-scale programs with adequate test suite Siemens Program Suite  130 variants of 7 subject programs, each of 100-600 LOC  130 known bugs in total  mainly logic (or semantic) bugs Advantages  Known bugs, thus judgments are objective  Large number of bugs, thus comparative study is statistically significant. Disadvantages  Small-scaled subject programs  State-of-the-art performance, so far claimed in literature, Cause-transition approach, [CZ05]

Localization Quality Metric [RR03]

1st Example 1 2 3 5 4 9 6 10 8 7 T-score = 70%

2nd Example 1 2 3 74 9 6 10 5 T-score = 20% 8

Related Works  Cause Transition (CT) approach [CZ05] A variant of delta debugging [Z02]delta debugging Previous state-of-the-art performance holder on Siemens suite Published in ICSE’05, May 15, 2005 Cons: it relies on memory abnormality, hence its performance is restricted.  Statistical Debugging (Liblit05) [LN+05] Predicate ranking based on discriminant analysis Published in PLDI’05, June 12, 2005 Cons: Ignores evaluation patterns of predicates within each execution

Localized bugs w.r.t. Examined Code

Cumulative Effects w.r.t. Code Examination

Top-k Selection  Regardless of specific selection of k, both Liblit05 and SOBER are better than CT, the current state-of-the-art holder  From k=2 to 10, SOBER is better than Liblit05 consistently

Outline  Evaluation Bias of Predicates  Predicate Rankings  Experimental Results  Case Study: bc-1.06  Future Work  Conclusions

Case Study: bc 1.06  bc 1.06 14288 LOC An arbitrary-precision calculator shipped with most distributions of Unix/Linux  Two bugs were localized One was reported by Liblit in [LN+05] One was not reported previously  Some lights on scalability

Outline  Evaluation Bias of Predicates  Predicate Rankings  Experimental Results  Case Study: bc-1.06  Future Work  Conclusions

Future Work  Further leverage the localization quality  Robustness to sampling  Torture on large-scale programs to confirm its scalability to code size  …

Conclusions  We devised a principled statistical method for bug localization.  No parameter setting hassles  It handles both crashing and noncrashing bugs.  Best quality so far.

Discussion  Features Easy implementation Difficult experimentation More advanced statistical technique may not be necessary Go wide, not go deep…  Predicates are treated as independent random variables.  Can execution indexing help?  Can statistical principles be combined with slicing or IWIH ?

CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Similar presentations

Presentation on theme: "CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

Similar presentations

Presentation on theme: "CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)"— Presentation transcript:

Similar presentations

About project

Feedback