Download presentation
Presentation is loading. Please wait.
Published byBelinda Kelly Modified over 9 years ago
1
WuKong: Automatically Detecting and Localizing Bugs that Manifest at Large System Scales Bowen ZhouJonathan Too Milind KulkarniSaurabh Bagchi Purdue University
2
Ever Changing Behavior of Software Software has to be adaptive to accommodate for different platforms, inputs and configurations. As a side effect, manifestation of a bug may depend on a particular platform, input or configuration. 2
3
Ever Changing Behavior of Software 3
4
Software Development Process 4 Develop a new feature and its unit tests Test the new feature on a local machine Push the feature into productoin systems Break production systems Roll back the feature Not tested in production systems!!!
5
Bugs in Production Run Properties – Remains unnoticed when the application is tested on developer's workstation – Breaks production system when the application is running on a cluster and/or serving real user requests Examples – Configuration Error – Integer Overflow 5
6
Bugs in Production Run Properties – Remains unnoticed when the application is tested on developer's workstation – Breaks production system when the application is running on a cluster and/or serving real user requests Examples – Configuration Error – Integer Overflow Scale-Dependent Bugs 6
7
Modeling Program Behavior for Finding Bugs Dubbed as Statistical Debugging [Bronevetsky DSN ‘10] [Mirgorodskiy SC ’06] [Chilimbi ICSE ‘09] [Liblit PLDI ‘03] – Represents program behavior as a set of features that can be measured in runtime – Builds a model to describe and predict the features based on data collected from many runs – Detects abnormal features that deviate from the model's prediction beyond a certain threshold 7
8
Modeling Program Behavior for Finding Bugs Dubbed as Statistical Debugging [Bronevetsky DSN ‘10] [Mirgorodskiy SC ’06] [Chilimbi ICSE ‘09] [Liblit PLDI ‘03] – Represents program behavior as a set of features that can be measured in runtime – Builds a model to describe and predict the features based on data collected from many runs – Detects abnormal features that deviate from the model's prediction beyond a certain threshold 8 Does not account for scale-induced variation in program behavior
9
Modeling Scale-dependent Behavior 9 RUN # # OF TIMES LOOP EXECUTES Is there a bug in one of the production runs? Training runsProduction runs
10
Modeling Scale-dependent Behavior 10 SCALE # OF TIMES LOOP EXECUTES Training runsProduction runs Accounting for scale makes trends clear, errors at large scales obvious
11
Modeling Scale-dependent Behavior Our Previous Research – Vrisha [HPDC '11] Builds a collective model for all features of a program to detect bugs at any feature – Abhranta [HotDep '12] Tweaks Vrisha's model to allow per-feature bug detection and localization 11
12
Modeling Scale-dependent Behavior Our Previous Efforts – Vrisha [HPDC '11] Builds a collective model for all features of a program to detect bugs at any feature – Abhranta [HotDep '12] Tweaks Vrisha's model to allow per-feature bug detection and localization 12 They have limitations...
13
Modeling Scale-dependent Behavior Big gap in scale – e.g. training runs on up to 128 nodes, production runs on 1024 nodes Noisy features – Too many false positives render the model useless 13
14
Reconstructing Scale-dependent Behavior: the WuKong way Covers a wide range of program features Predicts the expected value in a large-scale run for each feature separately Prunes unpredictable features to improve localization quality Provides a shortlist of suspicious features in its localization roadmap 14
15
The Workflow 15 APP PIN RUN 1 APP PIN RUN 3 APP PIN RUN 2 APP PIN RUN 4 APP PIN RUN N... SCALE FEATURE RUN 1 SCALE FEATURE RUN 3 SCALE FEATURE RUN 2 SCALE FEATURE RUN 4 SCALE FEATURE RUN N... SCALE FEATURE MODEL SCALE FEATURE Production Training = ?
16
Feature Collection 16
17
Features considered by WuKong void foo(int a) { if (a > 0) { } else { } if (a > 100) { int i = 0; while (i < a) { if (i % 2 == 0) { } ++i; } 17
18
Features considered by WuKong void foo(int a) { 1:if (a > 0) { } else { } 2:if (a > 100) { int i = 0; 3:while (i < a) { 4:if (i % 2 == 0) { } ++i; } 18 2 1 3 4
19
Modeling 19
20
Predict Feature from Scale X ~ vector of scale parameters X 1...X N Y ~ number of times a particular feature occurs The model to predict Y from X: Compute the prediction error: 20
21
Predict Feature from Scale X ~ vector of scale parameters X 1...X N Y ~ number of times a particular feature occurs The model to predict Y from X: Compute the prediction error: 21
22
Bug Localization 22
23
Locate Buggy Features First, we need to know if the production run is buggy, by doing detection as follows: If there is a bug in this run, we can start looking at the prediction error of each feature: – Rank all features by their prediction error to provide a localization roadmap that contains the top N features 23 Error of feature i in the production run Constant parameterMax error of feature i in all training runs
24
Improve Localization Quality by Feature Pruning 24
25
Noisy Feature Pruning Some features cannot be effectively predicted by the above model – Random – Not scale-determined – Discontinuous The trade-off – Keep those feature would pollute the diagnosis by pushing real faults down the list – Remove these features could miss some faults if the faults happens to be in such features 25
26
Noisy Feature Pruning How to remove them? For each feature: 1.Do a cross validation with training runs 2.Remove the feature if it triggers greater-than- 100% prediction error in more than (100-x)% of training runs Parameter x > 0 is for tolerating outliers in training runs 26
27
Evaluation Fault injection in Sequoia AMG2006 – Up to 1024 processes – Randomly selected conditionals to be flipped Two case studies – Integer overflow in a MPI library – Deadlock in a P2P file sharing application 27
28
Evaluation Fault injection in Sequoia AMG2006 – Up to 1024 processes – Randomly selected conditionals to be flipped Two case studies – Integer overflow in a MPI library – Deadlock in a P2P file sharing application 28
29
Fault Injection Study Fault – Injected at process 0 – Randomly pick a feature to flip Data – Training (w/o fault): 110 runs, 8-128 processes – Production (w/ fault): 100 runs, 1024 processes 29
30
Fault Injection Study Result – Total100 – Noncrashing57 – Detected53 – Located49 30 Successful Localized: 92.5%
31
Evaluation Fault injection in Sequoia AMG2006 – Up to 1024 processes – Randomly selected conditionals to be flipped Two case studies – Integer overflow in a MPI library – Deadlock in a P2P file sharing application 31
32
Evaluation Fault injection in Sequoia AMG2006 – Up to 1024 processes – Randomly selected conditionals to be flipped Two case studies – Integer overflow in a MPI library – Deadlock in a P2P file sharing application 32
33
Case Study: A Deadlock in Transmission’s DHT Implemenation 33
34
Case Study: A Deadlock in Transmission’s DHT Implemenation 34
35
Case Study: A Deadlock in Transmission’s DHT Implemenation 35 Feature 53, 66
36
Conclusion Debugging scale-dependent program behavior is a difficult and important problem WuKong incorporates scale of run into a predictive model for each individual program feature for accurate bug diagnosis We demonstrated the effectiveness of WuKong through a large-scale fault injection study and two case studies of real bugs 36
37
Q&A bzhou@purdue.edu 37
38
Backup 38
39
Runtime Overhead 39 Geometric Mean: 11.4%
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.