Statistical Debugging Ben Liblit, University of Wisconsin–Madison.

Slides:

Advertisements

Similar presentations

Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.

Advertisements

Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.

1 Bug Isolation via Remote Program Sampling Ben LiblitAlex Aiken Alice X. ZhengMichael Jordan Presented By : Arpita Gandhi.

Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.

Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.

Chapter 9 Project Analysis Chapter Outline

Introduction to Basic Statistical Methodology. CHAPTER 1 ~ Introduction ~

Chapter 18 Sampling Distribution Models

Bug Isolation via Remote Program Sampling Ben Liblit, Alex Aiken, Alice X.Zheng, Michael I.Jordan Presented by: Xia Cheng.

(Quickly) Testing the Tester via Path Coverage Alex Groce Oregon State University (formerly NASA/JPL Laboratory for Reliable Software)

Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University.

The Future of Correct Software George Necula. 2 Software Correctness is Important ► Where there is software, there are bugs ► It is estimated that software.

Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.

PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection S. Lu, P. Zhou, W. Liu, Y. Zhou, J. Torrellas University.

Code Generation CS 480. Can be complex To do a good job of teaching about code generation I could easily spend ten weeks But, don’t have ten weeks, so.

Bug Isolation via Remote Program Sampling Ben LiblitAlex Aiken Alice ZhengMike Jordan.

Control Structures - Repetition Chapter 5 2 Chapter Topics Why Is Repetition Needed The Repetition Structure Counter Controlled Loops Sentinel Controlled.

Prof. Aiken CS 169 Lecture 71 Version Control CS169 Lecture 7.

Dennis Shasha From a book co-written with Manda Wilson

Probability Models Chapter 17.

Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.

Let’s flip a coin. Making Data-Based Decisions We’re going to flip a coin 10 times. What results do you think we will get?

Chapter 4 – Modeling Basic Operations and Inputs  Structural modeling: what we’ve done so far ◦ Logical aspects – entities, resources, paths, etc. 

1 9/23/2015 MATH 224 – Discrete Mathematics Basic finite probability is given by the formula, where |E| is the number of events and |S| is the total number.

Scalable Statistical Bug Isolation Ben Liblit, Mayur Naik, Alice Zheng, Alex Aiken, and Michael Jordan, 2005 University of Wisconsin, Stanford University,

Scalable Statistical Bug Isolation Ben Liblit, Mayur Naik, Alice Zheng, Alex Aiken, and Michael Jordan University of Wisconsin, Stanford University, and.

15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.

This One Time, at PL Camp... Summer School on Language-Based Techniques for Integrating with the External World University of Oregon Eugene, Oregon July.

Testing Michael Ernst CSE 140 University of Washington.

Testing CSE 140 University of Washington 1. Testing Programming to analyze data is powerful It’s useless if the results are not correct Correctness is.

Bug Localization with Machine Learning Techniques Wujie Zheng

Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.

Scalable Statistical Bug Isolation Authors: B. Liblit, M. Naik, A.X. Zheng, A. Aiken, M. I. Jordan Presented by S. Li.

Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.

Copyright © 2009 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.

1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng

What is Testing? Testing is the process of finding errors in the system implementation. –The intent of testing is to find problems with the system.

Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.

Hack for HHVM Converting Facebook Julien Verlaguet Software Engineer.

Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide

Bug Isolation via Remote Sampling. Lemonade from Lemons Bugs manifest themselves every where in deployed systems. Each manifestation gives us the chance.

Welcome to MM570 Psychological Statistics

Lecture 14 Page 1 CS 236 Online Variable Initialization Some languages let you declare variables without specifying their initial values And let you use.

Testing CSE 160 University of Washington 1. Testing Programming to analyze data is powerful It’s useless (or worse!) if the results are not correct Correctness.

1 Binomial Random Variables Lecture 5  Many experiments are like tossing a coin a fixed number of times and recording the up-face.  The two possible.

Statistical Debugging CS Motivation Bugs will escape in-house testing and analysis tools –Dynamic analysis (i.e. testing) is unsound –Static analysis.

Cooperative Bug Isolation CS Outline Something different today... Look at monitoring deployed code –Collecting information from actual user runs.

Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.

Automated Adaptive Bug Isolation using Dyninst Piramanayagam Arumuga Nainar, Prof. Ben Liblit University of Wisconsin-Madison.

Programming in Java (COP 2250) Lecture 12 & 13 Chengyong Yang Fall, 2005.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.

CC213 Programming Applications Week #2 2 Control Structures Control structures –control the flow of execution in a program or function. Three basic control.

Bug Isolation via Remote Program Sampling Ben LiblitAlex Aiken Alice X. ZhengMichael I. Jordan UC Berkeley.

Constraint Framework, page 1 Collaborative learning for security and repair in application communities MIT site visit April 10, 2007 Constraints approach.

Debugging Intermittent Issues

CS 153: Concepts of Compiler Design November 28 Class Meeting

Testing UW CSE 160 Spring 2018.

Random numbers Taken from notes by Dr. Neil Moore

Control Structures - Repetition

CSCI1600: Embedded and Real Time Software

Sampling User Executions for Bug Isolation

Public Deployment of Cooperative Bug Isolation

Testing UW CSE 160 Winter 2016.

CSE 303 Concepts and Tools for Software Development

Explaining issues with DCremoval( )

CSCI1600: Embedded and Real Time Software

Presentation transcript:

Statistical Debugging Ben Liblit, University of Wisconsin–Madison

What’s This All About?  Statistical Debugging & Cooperative Bug Isolation  Observe deployed software in the hands of real end users  Build statistical models of success & failure  Guide programmers to the root causes of bugs  Make software suck less  Lecture plan 1. Motivation for post-deployment debugging 2. Instrumentation and feedback 3. Statistical modeling and (some) program analysis 4. Crazy hacks, cool tricks, & practical considerations

Credit Where Credit is Due  Alex Aiken  David Andrzejewski  Piramanayagam Arumuga Nainar  Ting Chen  Greg Cooksey  Evan Driscoll  Jason Fletchall  Michael Jordan  Anne Mulhern  Garrett Kolpin  Akash Lal  Junghee Lim  Mayur Naik  Jake Rosin  Umair Saeed  Alice Zheng  Xiaojin Zhu  … and an anonymous cast of thousands!  Or maybe just hundreds?  I don’t really know

Motivations: Software Quality in the Real World

Bill Gates, quoted in FOCUS Magazine “There are no significant bugs in our released software that any significant number of users want fixed.”

A Caricature of Software Development RequirementsArchitecture & DesignImplementationTesting & VerificationMaintenance

A Caricature of Software Development RequirementsArchitecture & DesignImplementationTesting & VerificationMaintenance Release!

Software Releases in the Real World [Disclaimer: this may also be a caricature.]

Software Releases in the Real World 1. Coders & testers in tight feedback loop  Detailed monitoring, high repeatability  Testing approximates reality 2. Testers & management declare “Ship it!”  Perfection is not an option  Developers don’t decide when to ship

Software Releases in the Real World 3. Everyone goes on vacation  Congratulate yourselves on a job well done!  What could possibly go wrong? 4. Upon return, hide from tech support  Much can go wrong, and you know it  Users define reality, and it’s not pretty  Where “not pretty” means “badly approximated by testing”

Testing as Approximation of Reality

Always One More Bug

The Good News: Users Can Help  Important bugs happen often, to many users  User communities are big and growing fast  User runs  testing runs  Users are networked  We can do better, with help from users!  Users know define what bugs matter most  Common gripe: “Software companies treat their users like beta testers”  OK, let’s make them better beta testers

Measure Reality and Respond

Bug and Crash Reporting Systems  Snapshot of Mozilla’s Bugzilla bug database  Entire history of Mozilla; all products and versions  60,866 open bug reports  109,756 additional reports marked as duplicates  Snapshot of Mozilla’s Talkback crash reporter  Firefox for the last ten days  101,812 unique users  183,066 crash reports  6,736,697 hours of user-driven “testing”

Real Engineers Measure Things; Are Software Engineers Real Engineers?

Real Engineering Constraints  Millions of lines of code  Loose semantics of buggy programs  Limited performance overhead  Limited disk, network bandwidth  Incomplete & inconsistent information  Mix of controlled, uncontrolled code  Threads  Privacy and security

High-Level Approach 1. Guess “potentially interesting” behaviors  Compile-time instrumentation 2. Collect sparse, fair subset of complete info  Generic sampling transformation  Feedback profile + outcome label 3. Find behavioral changes in good/bad runs  Statistical debugging

Instrumentation Framework

Douglas Adams, Mostly Harmless “The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong, it usually turns out to be impossible to get at or repair.”

Bug Isolation Architecture Program Source Compiler Shipping Application Sampler Predicates Counts & /  Statistical Debugging Top bugs with likely causes

Our Model of Behavior

Predicate Injection: Guessing What’s Interesting Program Source Compiler Shipping Application Sampler Predicates Counts & /  Statistical Debugging Top bugs with likely causes

Branch Predicates Are Interesting if (p) … else …

Branch Predicate Counts if (p) // p was true (nonzero) else // p was false (zero)  Syntax yields instrumentation site  Site yields predicates on program behavior  Exactly one predicate true per visit to site

Returned Values Are Interesting n = fprintf(…);  Did you know that fprintf() returns a value?  Do you know what the return value means?  Do you remember to check it?

Returned Value Predicate Counts n = fprintf(…); // return value 0 ?  Syntax yields instrumentation site  Site yields predicates on program behavior  Exactly one predicate true per visit to site

Pair Relationships Are Interesting int i, j, k; … i = …;

Pair Relationship Predicate Counts int i, j, k; … i = …; // compare new value of i with… //other vars: j, k, … //old value of i //“important” constants

Many Other Behaviors of Interest  Assert statements  Perhaps automatically introduced, e.g. by CCured  Unusual floating point values  Did you know there are nine kinds?  Coverage of modules, functions, basic blocks, …  Reference counts: negative, zero, positive, invalid  I use the GNOME desktop, but it terrifies me!  Kinds of pointer: stack, heap, null, …  Temporal relationships: x before/after y  More ideas? Toss them all into the mix!

Summarization and Reporting  Observation stream  observation count  How often is each predicate observed true?  Removes time dimension, for good or ill  Bump exactly one counter per observation  Infer additional predicates (e.g. ≤, ≠, ≥) offline  Feedback report is: 1. Vector of predicate counters 2. Success/failure outcome label  Still quite a lot to measure  What about performance?

Fair Sampling Transformation Program Source Compiler Shipping Application Sampler Predicates Counts & /  Statistical Debugging Top bugs with likely causes

Sampling the Bernoulli Way  Decide to examine or ignore each site…  Randomly  Independently  Dynamically  Cannot use clock interrupt: no context  Cannot be periodic: unfair temporal aliasing  Cannot toss coin at each site: too slow

Amortized Coin Tossing  Randomized global countdown  Small countdown  upcoming sample  Selected from geometric distribution  Inter-arrival time for biased coin toss  How many tails before next head?  Mean sampling rate is tunable parameter

Geometric Distribution  D= mean of distribution = expected sample density

Weighing Acyclic Regions  Break CFG into acyclic regions  Each region has:  Finite number of paths  Finite max number of instrumentation sites  Compute max weight in bottom-up pass

Weighing Acyclic Regions  Clone acyclic regions  “Fast” variant  “Slow” variant  Choose at run time  Retain decrements on fast path for now  Stay tuned… >4?

Optimizations I  Identify and ignore “weightless” functions  Identify and ignore “weightless” cycles  Cache global countdown in local variable  Global  local at function entry & after each call  Local  global at function exit & before each call

Optimizations II  Avoid cloning  Instrumentation-free prefix or suffix  Weightless or singleton regions  Static branch prediction at region heads  Partition sites among several binaries  Many additional possibilities…

Path Balancing Optimization  Decrements on fast path are a bummer  Goal: batch them up  But some paths are shorter than others  Idea: add extra “ghost” instrumentation sites  Pad out shorter paths  All paths now equal

Path Balancing Optimization  Fast path is faster  One bulk counter decrement on entry  Instrumentation sites have no code at all  Slow path is slower  More decrements  Consume more randomness

Variations on Next-Sample Countdown  Fixed reset value  Biased, but useful for benchmarking  Skip sampling transformation entirely  Observe every site every time  Used for controlled, in-house experiments  Can simulate arbitrary sampling rates offline  Non-uniform sampling  Decrement countdown more than once  Multiple countdowns at different rates

What Does This Give Us?  Absolutely certain of what we do see  Subset of dynamic behavior  Success/failure label for entire run  Uncertain of what we don’t see  Given enough runs, samples ≈ reality  Common events seen most often  Rare events seen at proportionate rate

Statistical Debugging Basics

Penn Jillette “What is luck? Luck is probability taken personally. It is the excitement of bad math.”

Playing the Numbers Game Program Source Compiler Shipping Application Sampler Predicates Counts & /  Statistical Debugging Top bugs with likely causes

Find Causes of Bugs

Sharing the Cost of Assertions  What to sample: assert() statements  Look for assertions which sometimes fail on bad runs, but always succeed on good runs  Overhead in assertion-dense CCured code  Unconditional: 55% average, 181% max  1 / 100 sampling: 17% average, 46% max  1 / 1000 sampling: 10% average, 26% max

Isolating a Deterministic Bug  Hunt for crashing bug in ccrypt-1.2  Sample function return values  Triple of counters per call site: 0  Use process of elimination  Look for predicates true on some bad runs, but never true on any good run

Winnowing Down the Culprits  1710 counters  3 × 570 call sites  1569 zero on all runs  141 remain  139 nonzero on at least one successful run  Not much left!  file_exists() > 0  xreadline() == 0