Delta: Heuristically Minimize “Interesting” Files delta.tigris.org Daniel S. Wilkerson work with Scott McPeak.

Slides:



Advertisements
Similar presentations
FORTRAN Short Course Week 1 Kate T-C February 17, 2008.
Advertisements

Chapter 7: Introduction to Debugging TECH Prof. Jeff Cheng.
CS16 Week 2 Part 2 Kyle Dewey. Overview Type coercion and casting More on assignment Pre/post increment/decrement scanf Constants Math library Errors.
(Quickly) Testing the Tester via Path Coverage Alex Groce Oregon State University (formerly NASA/JPL Laboratory for Reliable Software)
CSE332: Data Abstractions Lecture 2: Math Review; Algorithm Analysis Dan Grossman Spring 2010.
David Notkin Autumn 2009 CSE303 Lecture 7 bash today, C tomorrow Quick reprise: debugging, performance What’s homework 2B? (yes, it’s posted) Some looks.
16-Jun-15 Exceptions. Errors and Exceptions An error is a bug in your program dividing by zero going outside the bounds of an array trying to use a null.
Debugging CPSC 315 – Programming Studio Fall 2008.
B-1 Lecture 2: Problems, Algorithms, and Programs © 2000 UW CSE University of Washington Computer Programming I.
Chapter 10-Arithmetic-logic units
Class 1: What this course is about. Assignments Reading: Chapter 1, pp 1-33 Do in Class 1: –Exercises on pages 13, 14, 22, 28 To hand in in Class 2: –Exercises.
Clustering and greedy algorithms Prof. Noah Snavely CS1114
1 Functional Testing Motivation Example Basic Methods Timing: 30 minutes.
Python. What is Python? A programming language we can use to communicate with the computer and solve problems We give the computer instructions that it.
C++ Crash Course Class 1 What is programming?. What’s this course about? Goal: Be able to design, write and run simple programs in C++ on a UNIX machine.
The preprocessor and the compilation process COP3275 – PROGRAMMING USING C DIEGO J. RIVERA-GUTIERREZ.
Peter Andreae Computer Science Victoria University of Wellington Copyright: Peter Andreae, Victoria University of Wellington Java Programs COMP 102 #3.
Introducing Java.
P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Compilers, Interpreters and Debuggers Ruibin Bai (Room AB326) Division of Computer Science.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
CS 11 C track: lecture 5 Last week: pointers This week: Pointer arithmetic Arrays and pointers Dynamic memory allocation The stack and the heap.
Unit 1 – Improving Productivity Instructions ~ 100 words per box.
CS 122 Engineering Computation Lab Lab 2 Dan De Sousa and Tim Cheeseman Department of Computer Science Drexel University April 2009 ©By the author. All.
H.Melikian Introduction on C C is a high-level programming language that forms the basis to other programming languages such as C++, Perl and Java. It.
Computer Programming I An Introduction to the art and science of programming with C++
CS 114 – Class 02 Topics  Computer programs  Using the compiler Assignments  Read pages for Thursday.  We will go to the lab on Thursday.
Errors And How to Handle Them. GIGO There is a saying in computer science: “Garbage in, garbage out.” Is this true, or is it just an excuse for bad programming?
Computer Programming TCP1224 Chapter 3 Completing the Problem-Solving Process and Getting Started with C++
Chapter 1 Working with strings. Objectives Understand simple programs using character strings and the string library. Get acquainted with declarations,
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
Unit 1 – Improving Productivity Instructions ~ 100 words per box.
Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.
Moving Around in Scratch The Basics… -You do want to have Scratch open as you will be creating a program. -Follow the instructions and if you have questions.
CS 350 – Software Design The Strategy Pattern – Chapter 9 Changes to software, like other things in life, often focus on the immediate concerns and ignore.
CSE403 Software Engineering Autumn 2001 More Testing Gary Kimura Lecture #10 October 22, 2001.
Debugging Strategies from Software Carpentry. Agan's Rules Many people make debugging harder than it needs to be by: Using inadequate tools Not going.
C Hints and Tips The preprocessor and other fun toys.
Hey, Ferb, I know what we’re gonna do today! Aims: Use formatted printing. Use the “while” loop. Understand functions. Objectives: All: Understand and.
Simplifying failure Inducing input Vikas, Purdue.
Introduction to Unix – CS 21 Lecture 4. Lecture Overview * cp, mv, and rm Looking into files The file command head and tail cat and more What we’ve seen.
EQ: How can we learn the basics of formatting a college research paper in Microsoft Word? Mini Unit: Typing a Paper Diogene Date: 4/20/2015 Course: ELA-Grade.
COMP 171: Data Types John Barr. Review - What is Computer Science? Problem Solving  Recognizing Patterns  If you can find a pattern in the way you solve.
 When you receive a new you will be shown a highlighted in yellow box where your can be found  To open your new just double click.
Careers. Back in the Day There were 2 careers – Hunter – Gatherer A few years later – Computers were invented Now there are 3 career paths – Hunter –
The Information School of the University of Washington 13-Oct-2004cse debug1 Debugging and Troubleshooting INFO/CSE 100, Spring 2005 Fluency in Information.
1 WELCOME TO CIS 1068! Instructor: Alexander Yates.
Arithmetic-logic units1 An arithmetic-logic unit, or ALU, performs many different arithmetic and logic operations. The ALU is the “heart” of a processor—you.
The single most important skill for a computer programmer is problem solving Problem solving means the ability to formulate problems, think creatively.
Efficiently Solving Computer Programming Problems Doncho Minkov Telerik Corporation Technical Trainer.
Announcements You will receive your scores back for Assignment 2 this week. You will have an opportunity to correct your code and resubmit it for partial.
Simplifying and Isolating Failure-Inducing Input Andreas Zeller and Ralf Hildebrandt IEEE Transactions on Software Engineering (TSE) 2002.
Programming for Interactivity Professor Bill Tomlinson Tuesday & Wednesday 6:00-7:50pm Fall 2005.
Perl Subroutines User Input Perl on linux Forks and Pipes.
C++ Functions A bit of review (things we’ve covered so far)
Fundamental Programming Fundamental Programming Data Processing and Expressions.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Assignment 5 is posted. Exercise 8 is very similar to what you will be doing with assignment 5. Exam.
Test-Case Reduction for C Compiler Bugs
14 Compilers, Interpreters and Debuggers
Designing For Testability
University of Washington Computer Programming I
CS 240 – Lecture 11 Pseudocode.
ASU 101: The ASU Experience Computer Science Perspective
Cryptography This week we are going to use OpenSSL
CSCE 315 – Programming Studio, Fall 2017 Tanzir Ahmed
Embedded System Development Lecture 13 4/11/2007
CSE403 Software Engineering Autumn 2000 More Testing
Tonga Institute of Higher Education IT 141: Information Systems
Tonga Institute of Higher Education IT 141: Information Systems
Week 1 - Friday COMP 1600.
Presentation transcript:

Delta: Heuristically Minimize “Interesting” Files delta.tigris.org Daniel S. Wilkerson work with Scott McPeak

This quater million line file crashes my tool! We had a quarter million line (preprocessed) C++ file that crashed our C++ front-end (Elsa) How long would it take you to minimize that by hand? Delta reduced it in a few hours to a page or two of code While we did something else!

Delta Debugging Algorithm Andreas Zeller’s Delta Debugging Algorithm For file minimization, reduces to this: for each granularity g from 0 to log 2 N –partition the file into 2 g parts –for each part test if the file minus part is still interesting if so, permanently throw out that part Result is “one minimal” –removing any one line will make test fail

Example: both blue needed a b c d e f g h

both blue needed: g = 0 a b c d e f g h can’t delete the box since it contains both b and e

both blue needed: g = 1 a b c d e f g h can’t delete; contains b can’t delete; contains e

both blue needed: g = 2 a b c d e f g h can delete

both blue needed: g = 3 a b c d e f g h can delete

both blue needed: final a b c d e f g h

You could do this manually... and be much more clever...but delta is often faster I find it surprising that minimizing a file exibiting a certain behavior, brute force mostly wins over cleverness “Computers are as dumb as hell but they go like 60” -- Richard Feynman

Do a controlled experiment An experiment does many things –the interesting bit –and the boilerplate just to make it go A control is another experiment –that only does the boilerplate Do both and “subtract”; finds interesting bit gcc -c $F control: $F passes gcc &&oink $F | grep 'error:...‘ but not oink

topformflat: “explaining hierarchical structure” To delta, a file is a sequence of lines topformflat “explains” the nesting of C/C++ Simple flex filter that copies input to output –but doesn’t print newlines nested deeper than a nesting-depth argument Strategy: repeatedly minimize with increasing nesting depths

topformflat Example void foo() { for(...) { x -= 5; bar(); } while(...) { j++; } void bar() { z |= 17; foo(); } void baz() {...}

topformflat Example, level=0 void foo() {for(...){x -= 5;bar();}while(...){j++;}} void bar() {z |= 17;foo();} void baz() {...}

topformflat Example, level=1 void foo() { for(...) { x -= 5; bar(); } while(...) { j++; } } void bar() { z |= 17; foo(); } void baz() {...} deleted

topformflat Example, level=2 void foo() { for(...) { x -= 5; bar(); } while(...) { j++; } } void bar() { z |= 17; foo(); } void baz() {...}

Science: Most bugs exhibitable by small inputs On any input size, the result is almost always small –for C++ input to a compiler, 1-2 pages of code. Seems to be a phenomenon of computation –there actually is Science in Computer Science! but not always –delta worked for a week and still had 50 files –a buffer had to fill up and then flush

The “Configuration File Trick” Delta generalizes to many situations if you –parameterize the process with a file –minimize the file. Simon Goldsmith was instrumenting Java system binaries –“during class-loading JVM would seg-fault; nothing really comprehensible would happen” –wrote a script to read a config file for which instrumented classes to put into the jar file –use delta to minimize the config file

Simulated Annealing –Large, non-convex sub-space –Gradient of goodness –Random local moves likely to find another point in the sub-space –Moves parameterizable by a temperature. Some say the ability to sometimes get worse is essential –I say: locality, randomness, and temperature

Delta as Simulated Annealing space: files that pass your test goodness: smaller file is better local moves: chop out a chunk of file –note that we never “get worse” –so delta is greedy temperature: chunk size –we have an exponential “annealing schedule”, which is not unusual, says wikipedia anyway.

Delta surprisingly effective Especially given how ignorant and general it is Most ideas for improvements are how to make the local moves better at staying in the space –These ideas generally require knowing what the file means. Important point: But note how well delta already does knowing nothing! –and topformflat only knows nesting and quotes!

Improvement: use knowledge of dependencies to improve moves decluse If you know the language semantics, reject moves that would violate it, or only make moves that would produce a legal file

Fan Mail From: Flash Sheridan This is just a quick thank-you note for Delta.... it immediately reduced a... bug file from 16K lines to ten (GCC bug 22604). Oddly enough, it initially found a different bug (22603), since I'd only specified "internal compiler error", not "segmentation fault".

Fan Mail, p.2 From: Flash Sheridan Delta has become even more valuable since my initial thank-you note. I'm not sure it's helped with all of the GCC bugs I've been filing... but I couldn't have filed most of them without Delta. Delta has always been able to find a radically smaller file, which I have been able to attach to my bug report.

Fan Mail, p.3 From: Richard Guenther delta is saving a lot of gcc developers life ;) I would guess 1 of 3 bugs sumitted to the gcc bugzilla get their testcase reduced using delta.... a little bit more accurate would be to say we're using delta to reduce all testcases from the gcc bugzilla in case they get entered unreduced.

Delta: This simple dumb script is everywhere! One class devoted to it in both Berkeley and Stanford Software Engineering Courses –Berkeley: “We've just assigned a delta-related homework to the students today” –Stanford: “I gave them a homework assignment for CS295 using delta. Feedback was positive but unquantified.” Why did it take so long to think of this simple thing?