Delta Debugging AAIS 05 Curino, Giusti Delta Debugging Authors: Carlo Curino, Alessandro Giusti Politecnico di Milano An advanced debugging technique
Delta Debugging AAIS 05 Curino, Giusti Motivations Reducing faults: 50%-80% of total cost Debugging: One of the hardest, yet least systematic activities of software engineering most time-consuming Locating faults: most difficult
Delta Debugging AAIS 05 Curino, Giusti Overview Which problems are solved by Delta Debugging Four solutions: a common approach 1.Simplifying failure-inducing input 2.Isolating failure-inducing thread schedule 3.Identifying failure-inducing changes in the code 4.Isolating Cause-Effect Chains
Delta Debugging AAIS 05 Curino, Giusti Failure-inducing input This HTML input makes Mozilla crash (segmentation fault). Which portion is the failure-inducing one?
Delta Debugging AAIS 05 Curino, Giusti Thread scheduling The result of a multithread program seems not deterministic. Why it happens?
Delta Debugging AAIS 05 Curino, Giusti Code changes The old version of GDB works with DDD, the new one doesn’t! lines of code have been modified between the two versions where’s the bug?
Delta Debugging AAIS 05 Curino, Giusti Cause-effect chain Which part of the program state is involved in the failure?
Delta Debugging AAIS 05 Curino, Giusti Four solutions: a single approach The underlying problem is: Find which part of something determines the failure So a common strategy can be applied: Divide et impera applied to deltas between: Working and failing Inputs Working and failing code versions Working and failing threads schedules Working and failing program states This allows: Efficient and automatic debugging procedure
Delta Debugging AAIS 05 Curino, Giusti Common terminology A test case can either: Fail (The failure shows up) Pass (program runs properly) Be Unspecified (different problems arise) Delta debugging Algorithms iteratively: Apply changes (to input, code, schedule or state) Run tests
Delta Debugging AAIS 05 Curino, Giusti Common terminology (2) Concept of difference: A really general delta between something in 2 test cases Examples: Difference in the input: different character (or bit) in the input stream Difference in thread schedule: difference in the time a given thread switch is performed Difference in the code: different statement in 2 version of a program Difference in the program state: different values of the internal variables of a program
Delta Debugging AAIS 05 Curino, Giusti Simplifying Failure-inducing input
Delta Debugging AAIS 05 Curino, Giusti Minimizing vs Isolating Minimizing (ddmin algorithm): Slower More human friendly Isolating (dd algorithm): Generalization of the ddmin algorithm Faster Good to generate the input of the cause-effect chain DD
Delta Debugging AAIS 05 Curino, Giusti Minimizing: Mozilla bug Minimizing: 57 test to simplify the 896 line HTML input to the “ ” tag that causes the crash Each character is relevant (as shown from line 20 to 26) Only removes deltas from the failing test Returns a n-minimal (global minimum is NP) input that causes a failure
Delta Debugging AAIS 05 Curino, Giusti Minimizing: didactic example
Delta Debugging AAIS 05 Curino, Giusti Isolating: Mozilla bug Isolating: Only 7 tests (instead of 26) Removes deltas from the failing test and add deltas to passing test Isolates a single delta “<” that makes the failure to go away Returns the 2 nearest input on failing and the other passing
Delta Debugging AAIS 05 Curino, Giusti General DD Algorithm Initial Fail Initial Pass Differences
Delta Debugging AAIS 05 Curino, Giusti General DD Algorithm Initial Fail Initial Pass Differences What if we remove these diff from current failing test?
Delta Debugging AAIS 05 Curino, Giusti General DD Algorithm Initial Fail Initial Pass Differences Failure disappears: “Move up”
Delta Debugging AAIS 05 Curino, Giusti General DD Algorithm Initial Fail Initial Pass Differences What if we remove these diff?
Delta Debugging AAIS 05 Curino, Giusti General DD Algorithm Initial Fail Initial Pass Differences UNRESOLVED TEST: “Increase Granularity”
Delta Debugging AAIS 05 Curino, Giusti General DD Algorithm Initial Fail Initial Pass Differences What if we remove these diff from current failing test?
Delta Debugging AAIS 05 Curino, Giusti General DD Algorithm Initial Fail Initial Pass Differences Still Fails: “Move Down”
Delta Debugging AAIS 05 Curino, Giusti Formally: the Algorithm
Delta Debugging AAIS 05 Curino, Giusti Efficiency considerations The worst case: |k| 2 + 3|k| tests (k=cardinality of the change set) all test cases are unresolved except the last one very unlikely The best case: 2*log|k| Try to avoid unresolved tests outcomes Lexical, syntactical knowledge about input
Delta Debugging AAIS 05 Curino, Giusti DEMO Eclipse Plugin Live Demo
Delta Debugging AAIS 05 Curino, Giusti Thread Scheduling The behavior of a multithreaded program may depend on the schedule.
Delta Debugging AAIS 05 Curino, Giusti DD applied to Thread Scheduling Debug is even harder here: Thread switches and schedules are nondeterministic It is difficult to reproduce and isolate failures Goal: Relate failure to a small set of relevant differences from passing and failing schedules Again a “purely experimental approach”, no need to understand the program
Delta Debugging AAIS 05 Curino, Giusti Purely experimental: Pros and Cons Pros: program treated as a black box: requires only to execute the program Failure: an arbitrary behaviour of the program. Requires only to distinguish failure from success. Cons: (w.r.t static analysis) Test-based: can not determine properties for all runs of a program like the general absence of deadlocks require an observable failure
Delta Debugging AAIS 05 Curino, Giusti Dejavu tool Tool: Dejavu (DEterministic JAVa replay Utility) by IBM Reproduce of schedules and induced failures Exploiting Dejavu the Thread Schedule becomes an input We can generate schedules by mixing 1 running schedule and 1 failing schedule
Delta Debugging AAIS 05 Curino, Giusti Differences in thread scheduling Starting point: Passing run Failing run Differences (for t1): t1 occurs in at time 254 t1 occurs in at time 278 ∆1 = |278 − 254| induces a statement interval: the code executed between time 254 and 278
Delta Debugging AAIS 05 Curino, Giusti Differences in thread scheduling We can build further test cases mixing the two schedule to isolate the relevant differences
Delta Debugging AAIS 05 Curino, Giusti Real life test: setting Test #205 of the SPEC JVM98 Java test suite Modification of the raytracer program to a multi-threaded version Introduction of a simple race condition Implementation of an automated test that checks failure/passing Generation of random schedules to find a passing schedule and a failing schedule Differences between the passing and failing schedule: 3,842,577,240 differences Each diff moves thread switch time to +1 or -1
Delta Debugging AAIS 05 Curino, Giusti Real life test: results DD isolate one single difference after 50 test (about 28 min)
Delta Debugging AAIS 05 Curino, Giusti Real life test: pin-point the failure The failure occurs if and only if thread switch #33 occurs at yield point (safe point like function invocation) 59,772,127 (instead of 59,772,126) at 59,772,127 line 91 is the first yield point after the initialization of OldScenesLoaded At 59,772,126 line 82 is the yield point just before the initialization of OldScenesLoaded
Delta Debugging AAIS 05 Curino, Giusti Real life test: conclusion Delta Debugging is efficient even when applied to very large thread schedules (>3,000,000,000 diff) No analysis is required as Delta Debugging relies on experiments alone only the schedule was observed and altered failure-inducing thread switch is easily associated with code Alternate runs are obtained automatically by generating random schedules only one initial run (pass or fail) is required
Delta Debugging AAIS 05 Curino, Giusti Code changes A given revision of a program behaves correctly. The next one does not. Find which of the changes in the code causes the problem. Inconvent when difference == thousands of lines of code
Delta Debugging AAIS 05 Curino, Giusti The manual solution Binary search through the revision history Regression containment Does not always work: Multiple changes that cause the failure only when combined (interference) A single change can amount to many code lines (granularity) Mixing parallel developement branches originates inconsistency problems
Delta Debugging AAIS 05 Curino, Giusti Procedure Developed in 1999: some differences with current general DD algorithms. Consider the differences between the working and failing revisions. Ignore any knowledge about the temporal ordering of the changes. Goal: find a minimal failure-inducing change set.
Delta Debugging AAIS 05 Curino, Giusti Inconsistencies Mixing code changes regardless of their ordering originates lots of tests with “Unresolved” outcome: Integration failure Construction failure Execution failure They increase complexity of the DD algorithm!
Delta Debugging AAIS 05 Curino, Giusti Future work Group related changes (partly done) less inconsistent trials. Common change dates/sources Location criteria Lexical criteria Syntactic criteria (common funcions/modules) Semantic criteria
Delta Debugging AAIS 05 Curino, Giusti Cause-Effect Background A bit of background: A program state is represented by variable values, and references.
Delta Debugging AAIS 05 Curino, Giusti Background (2) While the program runs, the state evolves. We assume the program is Deterministic Not interactive identical states at identical times have identical evolutions.
Delta Debugging AAIS 05 Curino, Giusti Idea: apply DD to program states. We need two distinct runs: one failing one passing We want the two runs to be (initially) as much similar as possibile. If we let the two runs evolve in parallel, their initial state will be similar. Isolating failure-inducing input can help. Apply DD to different "slices" of the program evolution. (A sort of TAC for computer routines).
Delta Debugging AAIS 05 Curino, Giusti Procedure Iteratively Build a new state mixing the passing and failing state. Let the program evolve and see if it passes, fails, or does unrelated weird things (undefined outcome). Isolate the smallest subset of the state relevant for the failure. No news so far. But: this happens at a specific moment of the program evolution. It will be repeated (e.g. at important functions' entry points).
Delta Debugging AAIS 05 Curino, Giusti The result A cause-effect chain that leads to a failure.
Delta Debugging AAIS 05 Curino, Giusti The cause-effect chain The initial states are absolutely legitimate: for example, direct consequence of a specific input that the program should handle. intended program states. The final effects are the failure. faulty program states. The error lies somewhere in the middle, when an intended program states evolves into a faulty one.
Delta Debugging AAIS 05 Curino, Giusti Fascinating terminology A defect in the code originates an infection in the state. The infection usually propagates as the program evolves.
Delta Debugging AAIS 05 Curino, Giusti Limits No automatic discrimination of intended and faulty (infected) states! The human user can increase resolution of slices, and pinpoint the code that evolves an INTENDED state to a FAULTY one. Correct the error (== defect in the code) and break the cause-effect chain that leads to the failure.
Delta Debugging AAIS 05 Curino, Giusti Cause Transitions Sometimes executing an instruction a given variable ceases to be failure-inducing others begin the failure-inducing subset of the state changes (cause transition) An algorithm can efficiently find cause transitions in cause- effect chains, by means of binary search (again).
Delta Debugging AAIS 05 Curino, Giusti Cause Transitions (2)
Delta Debugging AAIS 05 Curino, Giusti Cause Transitions (3) Why do we bother looking for cause transitions? A variable begins to cause a failure: Good location for a fix More important: “cause transitions are significantly better locators of defects than any other methods previously known” Result: valuable help in the search for the defect: only a bunch of cause transitions, and nearby code locations need to be analyzed as the source of the infection.
Delta Debugging AAIS 05 Curino, Giusti Other approaches to defect localization Coverage Slicing Dynamic invariants no success with Siemens test suite Explicit specification good results, but needs specification of desired internal behavior Nearest neighbor (using coverage) best results albeit quite naive
Delta Debugging AAIS 05 Curino, Giusti Evaluation setup Siemens suite 7 C sample programs (hundreds of lines of code each). 132 variations with one realistic defect each. A test suite for each program. Apply the different defect locators, and compare their performance (only comparison to NN is presented).
Delta Debugging AAIS 05 Curino, Giusti Evaluation results
Delta Debugging AAIS 05 Curino, Giusti Clarification Two small improvements; relevance of code locations (automatic) sources of infection (programmer-driven): Unfair! Jump to the conclusion
Delta Debugging AAIS 05 Curino, Giusti Zoom on the representation of the state We said: “A program state is represented by variable values, and references” In general, representing and manipulating the state is not trivial One of the problems: C pointers copying their value does not make sense Solution: Memory graphs.
Delta Debugging AAIS 05 Curino, Giusti Memory graphs Systematically unfold all data structures, starting from base variables.
Delta Debugging AAIS 05 Curino, Giusti Memory graphs (2) Nodes: all values and all variables of a program operations like Edges: variable access pointer dereferencing struct member access array element access Abstract from memory addresses. Compare and alter pointers.
Delta Debugging AAIS 05 Curino, Giusti Memory graphs (3) What if the set of variables differ in the two states we are mixing? Just compute the largest common subgraph. The deltas we apply to a state: Change variable values. Alter data structures.
Delta Debugging AAIS 05 Curino, Giusti Implementation considerations All we need is a way to access and modify program state. GDB is the solution for C programs, but has performance problems (5000% overhead). DD applied to states is still a black box approach (sort of) Easily extended to other languages as soon as something provides GDB- like functionality.
Delta Debugging AAIS 05 Curino, Giusti Conclusions Delta Debugging: is an extremely interesting technique works pretty good at least in theory there are no usable tools can be usefully integrated in various IDE the algorithm is now patent-free (expired patent) SO : LET’S MAKE SOME MONEY ON IT!
Delta Debugging AAIS 05 Curino, Giusti Acknowledgements Some slides and images adapted from Dr. Andreas Zeller’s presentations and papers (
Delta Debugging AAIS 05 Curino, Giusti References Yesterday, My Program Worked. Today, It does Not. Why?, Andreas Zeller, FSE 1999 Finding Failure Causes through Automated Testing. Holger Cleve, Andreas Zeller; 4° International Workshop on Automated Debugging 2000 Simplifying failure-inducing input, Ralf Hildebrandt, Andreas Zeller, ISSTA 2000 Automated Debugging: Are We Close? Andreas Zeller; IEEE Computer, November Isolating Failure-Inducing Thread Schedules. Jong-Deok Choi and Andreas Zeller, ISSTA 2002 Isolating Cause-Effect Chains from Computer Programs, Andreas Zeller, FSE 2002 Locating Causes of Program Failures. Holger Cleve and Andreas Zeller, ICSE 2005