“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore
Motivation: Debugging & Maintenance is Super Expensive Cost to develop software worldwide: $1,500,000,000,000 (USD) Debugging and Maintenance cost$350,000,000,000 (USD) (assumes 23% of developer time spent debugging) Source: Judge Business School of the University of Cambridge, UK (2013) Evans Data Corporation (2012), Payscale (2012), RTI (2002), CVP Surveys (2012)
What is Debugging? Finding the fault responsible for the failure, and applying a change to program P such that P is correct with regard to the specification S concerning the failure. Debugging includes a search problem. We can automate search.
Talk Outline Problems BugEx seeks to address Background concepts Inner Workings of BugEx Algorithm Empirical Evaluation Relation of this work to 990 Class Project
Automated Debugging: still a hard problem Parnin, Chris, and Alessandro Orso. "Are automated debugging techniques actually helping programmers?." Proceedings of the 2011 International Symposium on Software Testing and Analysis. ACM, 2011.
BugEx : Overview Problems addressed Problem 1: Automated debugging techniques reveal too many possible code locations Solution 1: Increase precision through guided test-generation Problem 2: Even if the location is known, developer might not have perfect bug understanding Solution 2: presents ‘facts’ rather than code location Problem 3: Other experimental techniques unsound (Delta Debugging, Predicate switching) Solution 3: Generate real program executions
BugEx: Underlying Concepts 1.Expands on statistical debugging. Correlate program facts with failures
1. BugEx extends Statistical Debugging Benjamin Liblit et al. Liblit, B., Aiken, A., Zheng, A. X., & Jordan, M. I. (2003). Bug isolation via remote program sampling. ACM SIGPLAN Notices, 38(5), (and more, identified in the paper) “Statistical debugging works off of the contrast between good and bad runs, so you need to feed it both.” – B. Liblit. Passing test case Failing test case
BugEx: Underlying Concepts 1.Expands on statistical debugging. Correlate program facts with failures 2.Use automatic test generation (genetic algorithms) to create statistically significant number of tests
2. Test Case Generation Genetic Algorithms Individual is a TEST encoded in JAVA bytecode Mutation might change branching or variable values TEST_a TEST_b TEST_b’ TEST_a’ Fitness branch distance or predicate distance (closer is better) Image
Test Case Generation Genetic Algorithms Shape of the search directs fitness function (the gradient) Globally Optimality not guaranteed Image © Mathworks, 2010
Overview of BugEx (hint: it’s a Search) Generate Tests explore search space (Genetic algorithm) Find facts that correlate with failure to guide test generation (Statistical debugging) Show results
Counterfactual conditional If not A, then B If cause is present, the failure is present If cause is absent, the failure is absent Fact i Failure
BugEx Algorithm : Initialization (figure 4 p. 312)
BugEx Algorithm : Main Loop (figure 4 p. 312) (of the best!) (Statistical Debugging) (Genetic Algorithm) LOOP (branches or state predicates)
Microreview of BugEx (hint: it’s a Search) Generate Tests explore search space (Genetic algorithm) Find facts that correlate with failure to guide test generation (Statistical debugging) Show results
14. F := getFacts(T fail ) U getFacts(T pass ) U F 1.Fact must be Boolean: either true or false at runtime 2.Fact must be observable. Branches Reached or not reached T or F branch taken? attribute | parameters | inspector | = | = | != attribute | parameters | inspector | constant State Predicates All available variables, objects, constants at beginning of method ? How Big is this space (in Big O) ? {
16. F correlating := correlateToFailure(F, T fail, T pass ) Bayes’ Theorem Bayesian Inference
Slides courtesy of Jeremias Rößler (2012)
Empirical Evaluation
Empirical Research Questions RQ1. Is the number of relevant facts identified by BUGEX small enough for a developer to examine?
# of Branches vs Time to Converge Branches Seconds
RQ1: BugEx compared to Statistical Debugging BugEx
Empirical Research Questions RQ2. Do the facts identified by BUGEX help the developer understand the failure? Authors answered ‘yes’, compared their fix with the ‘official fix’. Challenging because sometimes the original developers refactored the code at a larger scale.
Subsequent User Studies: nope “This study showed how much effort the design and preparation of a user study requires, and how easy error prone it is. This is probably the reason, why there are still so few user studies in the field of automated debugging.” “So there was little time to prepare BUGEX and the underlying infrastructure.” Rößler, Jeremias. "From software failure to explanation." (2013).
Summary BugEx combines Statistical Debugging and Automated Test Generation (GA) to improve debugging precision. BugEx treats debugging is a search problem, and tries to find information that is useful to developers. Usefulness difficult to evaluate because prototype tool is very specific.
Relation of BugEx to Project Guided automatic test generation. Focus on message passing programs, observed at the component level (ROS – robot operating system) Use program traces to generate test suites for regression testing, based on component properties.