Cooperative Bug Isolation CS 6340. 2 Outline Something different today... Look at monitoring deployed code –Collecting information from actual user runs.

Slides:



Advertisements
Similar presentations
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Advertisements

Imbalanced data David Kauchak CS 451 – Fall 2013.
Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.
Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.
1 Bug Isolation via Remote Program Sampling Ben LiblitAlex Aiken Alice X. ZhengMichael Jordan Presented By : Arpita Gandhi.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 6.
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
Lecture 3 Outline: Thurs, Sept 11 Chapters Probability model for 2-group randomized experiment Randomization test p-value Probability model for.
G. Alonso, D. Kossmann Systems Group
Statistical Debugging Ben Liblit, University of Wisconsin–Madison.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Concolic Modularity Testing Derrick Coetzee University of California, Berkeley CS 265 Final Project Presentation.
Statistical Debugging Ben Liblit, University of Wisconsin–Madison.
Bug Isolation via Remote Program Sampling Ben Liblit, Alex Aiken, Alice X.Zheng, Michael I.Jordan Presented by: Xia Cheng.
Bug Isolation in the Presence of Multiple Errors Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan UC Berkeley and Stanford University.
The Future of Correct Software George Necula. 2 Software Correctness is Important ► Where there is software, there are bugs ► It is estimated that software.
Discrete Event Simulation How to generate RV according to a specified distribution? geometric Poisson etc. Example of a DEVS: repair problem.
CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)
What Went Wrong? Alex Groce Carnegie Mellon University Willem Visser NASA Ames Research Center.
Computing Trust in Social Networks
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
1 Validation and Verification of Simulation Models.
1.  Why understanding probability is important?  What is normal curve  How to compute and interpret z scores. 2.
How Science Works Glossary AS Level. Accuracy An accurate measurement is one which is close to the true value.
Bug Isolation via Remote Program Sampling Ben LiblitAlex Aiken Alice ZhengMike Jordan.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Dennis Shasha From a book co-written with Manda Wilson
1 Measurement Theory Ch 3 in Kan Steve Chenoweth, RHIT.
Simulation II IE 2030 Lecture 18. Outline: Simulation II Advanced simulation demo Review of concepts from Simulation I How to perform a simulation –concepts:
Let’s flip a coin. Making Data-Based Decisions We’re going to flip a coin 10 times. What results do you think we will get?
 1  Outline  stages and topics in simulation  generation of random variates.
Evidence Based Medicine
Scalable Statistical Bug Isolation Ben Liblit, Mayur Naik, Alice Zheng, Alex Aiken, and Michael Jordan, 2005 University of Wisconsin, Stanford University,
Scalable Statistical Bug Isolation Ben Liblit, Mayur Naik, Alice Zheng, Alex Aiken, and Michael Jordan University of Wisconsin, Stanford University, and.
Bug Localization with Machine Learning Techniques Wujie Zheng
School of Electrical Engineering and Computer Science University of Central Florida Anomaly-Based Bug Prediction, Isolation, and Validation: An Automated.
Analysis of Algorithms
Scalable Statistical Bug Isolation Authors: B. Liblit, M. Naik, A.X. Zheng, A. Aiken, M. I. Jordan Presented by S. Li.
The Daikon system for dynamic detection of likely invariants MIT Computer Science and Artificial Intelligence Lab. 16 January 2007 Presented by Chervet.
Lecture 19 Nov10, 2010 Discrete event simulation (Ross) discrete and continuous distributions computationally generating random variable following various.
Psyc 235: Introduction to Statistics DON’T FORGET TO SIGN IN FOR CREDIT!
1.3 Simulations and Experimental Probability (Textbook Section 4.1)
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Chapter 1 Overview and Descriptive Statistics 1111.1 - Populations, Samples and Processes 1111.2 - Pictorial and Tabular Methods in Descriptive.
Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University.
Chapter 10 Verification and Validation of Simulation Models
Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Data Structures & Algorithms
POSC 202A: Lecture 4 Probability. We begin with the basics of probability and then move on to expected value. Understanding probability is important because.
Bug Isolation via Remote Sampling. Lemonade from Lemons Bugs manifest themselves every where in deployed systems. Each manifestation gives us the chance.
CSCI1600: Embedded and Real Time Software Lecture 28: Verification I Steven Reiss, Fall 2015.
Aditya Thakur Rathijit Sen Ben Liblit Shan Lu University of Wisconsin–Madison Workshop on Dynamic Analysis 2009 Cooperative Crug Isolation.
Week 5-6 MondayTuesdayWednesdayThursdayFriday Testing III No reading Group meetings Testing IVSection ZFR due ZFR demos Progress report due Readings out.
Statistical Debugging CS Motivation Bugs will escape in-house testing and analysis tools –Dynamic analysis (i.e. testing) is unsound –Static analysis.
STROUD Worked examples and exercises are in the text Programme 29: Probability PROGRAMME 29 PROBABILITY.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
CHARACTERIZING CLOUD COMPUTING HARDWARE RELIABILITY Authors: Kashi Venkatesh Vishwanath ; Nachiappan Nagappan Presented By: Vibhuti Dhiman.
Week 6 MondayTuesdayWednesdayThursdayFriday Testing III Reading due Group meetings Testing IVSection ZFR due ZFR demos Progress report due Readings out.
Automated Adaptive Bug Isolation using Dyninst Piramanayagam Arumuga Nainar, Prof. Ben Liblit University of Wisconsin-Madison.
Bug Isolation via Remote Program Sampling Ben LiblitAlex Aiken Alice X. ZhengMichael I. Jordan UC Berkeley.
Debugging Intermittent Issues
Introduction to Statistics for Engineers
Statistical Debugging
Chapter 10 Verification and Validation of Simulation Models
Sampling User Executions for Bug Isolation
Public Deployment of Cooperative Bug Isolation
CHAPTER 26: Inference for Regression
The Binomial and Geometric Distributions
Statistical Debugging
Presentation transcript:

Cooperative Bug Isolation CS 6340

2 Outline Something different today... Look at monitoring deployed code –Collecting information from actual user runs –Gives a true picture of reality Two themes –How do we get the data? –What do we do with it?

3 Build and Monitor

4 The Goal: Analyze Reality Where is the black box for software? –Historically, some efforts, but spotty –Now it’s really happening: crash reporting systems Actual runs are a vast resource –Number of real runs >> number of testing runs –And the real-world executions are most important This lecture: post-deployment bug hunting

5 Engineering Constraints Big systems –Millions of lines of code –Mix of controlled, uncontrolled code –Threads Remote monitoring –Limited disk & network bandwidth Incomplete information –Limit performance overhead –Privacy and security

6 The Approach 1.Guess “potentially interesting” behaviors –Compile-time instrumentation 2.Collect sparse, fair subset of these behaviors –Generic sampling transformation –Feedback profile + outcome label 3.Find behavioral changes in good/bad runs –Statistical debugging

7 Bug Isolation Architecture Program Source Compiler Sampler Guesses Shipping Application Profile & /  Statistical Debugging Top bugs with likely causes

8 Our Model of Behavior We assume any interesting behavior is expressible as a predicate P on program state at a particular program point. Observation of behavior = observing P

9 Branches Are Interesting if (p) … else …

10 Branch Predicate Counts ++branch_17[!!p]; if (p) … else … Predicates are folded down into counts C idiom: !!p ensures subscript is 0 or 1

11 Return Values Are Interesting n = fprintf(…);

12 Returned Value Predicate Counts n = fprintf(…); ++call_41[(n==0)+(n>=0)]; Track predicates: n 0

13 Scalar Relationships int i, j, k; … i = …; The relationship of i to other integer-valued variables in scope after the assignment is potentially interesting...

14 Pair Relationship Predicate Counts int i, j, k; … i = …; ++pair_6[(i==j)+(i>=j)]; ++pair_7[(i==k)+(i>=k)]; ++pair_8[(i==5)+(i>=5)]; Test i against all other constants & variables in scope. Is i j?

15 Summarization and Reporting Instrument the program with predicates –We have a variety of instrumentation schemes Feedback report is: –Vector of predicate counters –Success/failure outcome label No time dimension, for good or ill Still quite a lot to measure –What about performance? P1 P2 P3 P4 P5...

16 Sampling Decide to examine or ignore each site… –Randomly –Independently –Dynamically Why? –Fairness –We need accurate picture of rare events

17 Problematic Approaches Sample every kth predicate –Violates independence Use clock interrupt –Not enough context –Not very portable Toss a coin at each instrumentation site –Too slow

18 Amortized Coin Tossing Observation –Samples are rare, say 1/100 –Amortize cost by predicting time until next sample Randomized global countdown –Small countdown  upcoming sample Selected from geometric distribution –Inter-arrival time for biased coin toss –How many tails before next head?

19 Geometric Distribution 1 / D = expected sample density

20 Weighing Acyclic Regions Acyclic code region has: a finite number of paths a finite maximum number of instrumentation sites executed

21 Weighing Acyclic Regions Clone acyclic regions –“Fast” variant –“Slow” variant Choose at run time based on countdown to next sample >4?

22 Summary: Feedback Reports Subset of dynamic behavior –Counts of true/false predicate observations –Sampling gives low overhead Often unmeasurable at 1/100 –Success/failure label for entire run Certain of what we did observe –But may miss some events Given enough runs, samples ≈ reality –Common events seen most often –Rare events seen at proportionate rate

23 Bug Isolation Architecture Program Source Compiler Sampler Guesses Shipping Application Profile & /  Statistical Debugging Top bugs with likely causes

24 Find Causes of Bugs We gather information about many predicates –298,482 for BC Most of these are not predictive of anything How do we find the useful predicates?

25 Finding Causes of Bugs How likely is failure when P is observed true? F(P) = # failing runs where P observed true S(P) = # successful runs where P observed true Failure(P) = F(P) + S(P) F(P)

26 Not Enough... Failure( f == NULL ) = 1.0 Failure( x == 0 )= 1.0 if (f == NULL) { x = 0; *f; } Predicate x == 0 is an innocent bystander –Program is already doomed

27 Context What is the background chance of failure, regardless of P’s value? F(P observed) = # failing runs observing P S(P observed) = # successful runs observing P Context(P) = F(P observed) + S(P observed) F(P observed)

28 A Useful Measure Does the predicate being true increase the chance of failure over the background rate? Increase(P) = Failure(P) – Context(P) A form of likelihood ratio testing...

29 if (f == NULL) { x = 0; *f; } Increase() Works... Increase( f == NULL ) = 1.0 Increase( x == 0 )= 0.0

30 A First Algorithm 1.Discard predicates having Increase(P) ≤ 0 –E.g. dead, invariant, bystander predicates –Exact value is sensitive to small F(P) –Use lower bound of 95% confidence interval 1.Sort remaining predicates by Increase(P) –Again, use 95% lower bound –Likely causes with determinacy metrics

31 void more_arrays () { … /* Copy the old arrays. */ for (indx = 1; indx < old_count; indx++) arrays[indx] = old_ary[indx]; /* Initialize the new elements. */ for (; indx < v_count; indx++) arrays[indx] = NULL; … } Isolating a Single Bug in BC #1: indx > scale #2: indx > use_math #1: indx > scale #2: indx > use_math #1: indx > scale #2: indx > use_math #3: indx > opterr #4: indx > next_func #5: indx > i_base #1: indx > scale #2: indx > use_math #3: indx > opterr #4: indx > next_func #5: indx > i_base

32 It Works! Well... at least for a program with 1 bug But –Need to deal with multiple, unknown bugs –Redundancy in the predicate list is a major problem

33 F(P) + S(P) = 349 Using the Information Multiple predicate metrics are useful –Increase(P), Failure(P), F(P), S(P) Context(P) =.25 Increase(P) =.68 ±.05

34 The Bug Thermometer Increase Context Successful Runs Confidence log(|runs P observed true|)

35 Sample Report

36 Multiple Bugs: The Goal Isolate the best predictor for each bug, with no prior knowledge of the number of bugs.

37 Multiple Bugs: Some Issues A bug may have many redundant predictors –Only need one –But would like to know correlated predictors Bugs occur on vastly different scales –Predictors for common bugs may dominate, hiding predictors of less common problems

38 An Idea Simulate the way humans fix bugs Find the first (most important) bug Fix it, and repeat

39 An Algorithm Repeat the following: 1.Compute Increase(), Context(), etc. for all preds. 2.Rank the predicates 3.Add the top-ranked predicate P to the result list 4.Remove P & discard all runs where P is true –Simulates fixing the bug corresponding to P –Discard reduces rank of correlated predicates

40 Bad Idea #1: Ranking by Increase(P) High Increase() but very few failing runs! These are all sub-bug predictors: they cover a special case of a more general problem.

41 Bad Idea #2: Ranking by Fail(P) Many failing runs but low Increase()! Tend to be super-bug predictors: predicates that cover several different bugs rather poorly.

42 A Helpful Analogy In the language of information retrieval –Increase(P) has high precision, low recall –Fail(P) has high recall, low precision Standard solution: –Take the harmonic mean of both –Rewards high scores in both dimensions

43 Sorting by the Harmonic Mean It works!

44 Experimental Results: Exif Three predicates selected from 156,476 Each predicate predicts a distinct crashing bug Found the bugs quickly using these predicates

45 Experimental Results: Rhythmbox 15 predicates from 857,384 Also isolated crashing bugs...

46 Public Deployment

47 Lessons Learned A lot can be learned from actual executions –Users are executing them anyway –We should capture some of that information Crash reporting is a step in the right direction –But doesn’t characterize successful runs –Stack is useful for only about 50% of bugs