Delta: Heuristically Minimize “Interesting” Files delta.tigris.org Daniel S. Wilkerson work with Scott McPeak
This quater million line file crashes my tool! We had a quarter million line (preprocessed) C++ file that crashed our C++ front-end (Elsa) How long would it take you to minimize that by hand? Delta reduced it in a few hours to a page or two of code While we did something else!
Delta Debugging Algorithm Andreas Zeller’s Delta Debugging Algorithm For file minimization, reduces to this: for each granularity g from 0 to log 2 N –partition the file into 2 g parts –for each part test if the file minus part is still interesting if so, permanently throw out that part Result is “one minimal” –removing any one line will make test fail
Example: both blue needed a b c d e f g h
both blue needed: g = 0 a b c d e f g h can’t delete the box since it contains both b and e
both blue needed: g = 1 a b c d e f g h can’t delete; contains b can’t delete; contains e
both blue needed: g = 2 a b c d e f g h can delete
both blue needed: g = 3 a b c d e f g h can delete
both blue needed: final a b c d e f g h
You could do this manually... and be much more clever...but delta is often faster I find it surprising that minimizing a file exibiting a certain behavior, brute force mostly wins over cleverness “Computers are as dumb as hell but they go like 60” -- Richard Feynman
Do a controlled experiment An experiment does many things –the interesting bit –and the boilerplate just to make it go A control is another experiment –that only does the boilerplate Do both and “subtract”; finds interesting bit gcc -c $F control: $F passes gcc &&oink $F | grep 'error:...‘ but not oink
topformflat: “explaining hierarchical structure” To delta, a file is a sequence of lines topformflat “explains” the nesting of C/C++ Simple flex filter that copies input to output –but doesn’t print newlines nested deeper than a nesting-depth argument Strategy: repeatedly minimize with increasing nesting depths
topformflat Example void foo() { for(...) { x -= 5; bar(); } while(...) { j++; } void bar() { z |= 17; foo(); } void baz() {...}
topformflat Example, level=0 void foo() {for(...){x -= 5;bar();}while(...){j++;}} void bar() {z |= 17;foo();} void baz() {...}
topformflat Example, level=1 void foo() { for(...) { x -= 5; bar(); } while(...) { j++; } } void bar() { z |= 17; foo(); } void baz() {...} deleted
topformflat Example, level=2 void foo() { for(...) { x -= 5; bar(); } while(...) { j++; } } void bar() { z |= 17; foo(); } void baz() {...}
Science: Most bugs exhibitable by small inputs On any input size, the result is almost always small –for C++ input to a compiler, 1-2 pages of code. Seems to be a phenomenon of computation –there actually is Science in Computer Science! but not always –delta worked for a week and still had 50 files –a buffer had to fill up and then flush
The “Configuration File Trick” Delta generalizes to many situations if you –parameterize the process with a file –minimize the file. Simon Goldsmith was instrumenting Java system binaries –“during class-loading JVM would seg-fault; nothing really comprehensible would happen” –wrote a script to read a config file for which instrumented classes to put into the jar file –use delta to minimize the config file
Simulated Annealing –Large, non-convex sub-space –Gradient of goodness –Random local moves likely to find another point in the sub-space –Moves parameterizable by a temperature. Some say the ability to sometimes get worse is essential –I say: locality, randomness, and temperature
Delta as Simulated Annealing space: files that pass your test goodness: smaller file is better local moves: chop out a chunk of file –note that we never “get worse” –so delta is greedy temperature: chunk size –we have an exponential “annealing schedule”, which is not unusual, says wikipedia anyway.
Delta surprisingly effective Especially given how ignorant and general it is Most ideas for improvements are how to make the local moves better at staying in the space –These ideas generally require knowing what the file means. Important point: But note how well delta already does knowing nothing! –and topformflat only knows nesting and quotes!
Improvement: use knowledge of dependencies to improve moves decluse If you know the language semantics, reject moves that would violate it, or only make moves that would produce a legal file
Fan Mail From: Flash Sheridan This is just a quick thank-you note for Delta.... it immediately reduced a... bug file from 16K lines to ten (GCC bug 22604). Oddly enough, it initially found a different bug (22603), since I'd only specified "internal compiler error", not "segmentation fault".
Fan Mail, p.2 From: Flash Sheridan Delta has become even more valuable since my initial thank-you note. I'm not sure it's helped with all of the GCC bugs I've been filing... but I couldn't have filed most of them without Delta. Delta has always been able to find a radically smaller file, which I have been able to attach to my bug report.
Fan Mail, p.3 From: Richard Guenther delta is saving a lot of gcc developers life ;) I would guess 1 of 3 bugs sumitted to the gcc bugzilla get their testcase reduced using delta.... a little bit more accurate would be to say we're using delta to reduce all testcases from the gcc bugzilla in case they get entered unreduced.
Delta: This simple dumb script is everywhere! One class devoted to it in both Berkeley and Stanford Software Engineering Courses –Berkeley: “We've just assigned a delta-related homework to the students today” –Stanford: “I gave them a homework assignment for CS295 using delta. Feedback was positive but unquantified.” Why did it take so long to think of this simple thing?