Download presentation
Presentation is loading. Please wait.
1
02/13/20071 Indexing Noncrashing Failures: A Dynamic Program Slicing-Based Approach Chao Liu, Xiangyu Zhang, Jiawei Han, Yu Zhang, Bharat K. Bhargava University of Illinois at Urbana-Champaign Purdue University Supported by NSF 0242840, 0219110
2
2 Overview Problem: Automatically cluster program failures that are due to the same bug. Solution: By looking at the similarity between the dynamic slices of program failures.
3
3 Outline Motivation Failure Indexing in Formulation Dynamic Slicing-Based Failure Indexing Experiments Conclusion
4
4 Automated Failure Reporting End-users as Beta testers Valuable information about failure occurrences in reality 24.5 million/day in Redmond (if all users send) – John Dvorak, PC Magazine Widely adopted because of its usefulness Microsoft Windows, Linux Gentoo, Mozilla applications … Any applications can implement this functionality
5
5 Failure Report Automatic reports (windows/mozilla) Application name, version (e.g., winword.exe). Module name, version (e.g., mso.dll) Offset into module (for example, 00003cbb). Calling context. Manual reports (bugzilla) Textual description of the symptoms Failure inducing input
6
6 After Failures Collected … Failure triage Failure prioritization: What are the most severe bugs? Worst 1% bugs = 50% failures Duplicate failure removal Same failures can be reported multiple times Patch suggestion Automatically locating the patch by querying the patch database with the reported failure
7
7 Cluster failure reports that may correspond to the same fault. A Solution: Failure Indexing Most Severe Less Severe Least Severe Failure Report s + + +++ + + + + + + + + + + + + + + + + + X Y 0
8
8 Current Status of Failure Indexing Great success in indexing crashing failures Same crashing venues likely imply the same failure E.g., Microsoft Dr. Watson System, Mozilla Quality Feedback Agent … Elusive: How to index noncrashing failures Noncrashing failures are mainly due to semantic bugs Hard to index because crashing contexts not available anymore
9
9 Noncrashing Failures Examples. Unwanted dialogs. Undesired visual outputs, e.g. colors, layouts. Periodical loss of focus. Periodical loss of connection. Abnormal memory consumption. Abnormal performance. Caused by semantic bugs.
10
10 Semantic Bugs Dominate Semantic Bugs: Application specific Only few are detectable Mostly require annotations or specifications Memory-related Bugs: Many are detectable Others Concurrency bugs Bug Distribution [Li et al., ICSE’07] 264 bugs in Mozilla and 98 bugs in Apache manually checked 29,000 bugs in Bugzilla automatically checked Courtesy of Zhenmin Li
11
11 Existing Approaches to Indexing Noncrashing Failures T-Proximity [Podgurski et al., ICSE 2003] Failures exhibiting similar behaviors (e.g., similar branchings) are indexed together Entire execution is considered R-Proximity [Liu and Han, FSE 2006] Failures likely due to the same bug are indexed together Bug location for each failure is automatically found through statistical debugging tool SOBER [Liu et al., FSE 2005]
12
12 Comments on Existing Approaches Ideal Solution (possible through manual effort) Index by root causes (i.e., the exact fault location) Finding root causes for every failure is exactly what failure indexing wants to circumvent T-Proximity Indexing based on the entire execution But usually only a small part of an execution is failure-relevant R-Proximity Indexing by likely fault location – failure-relevant Better quality than T-Proximity, but requires a set of passing executions to find the likely fault location Theme of this paper Can we index noncrashing failures as effectively as R- Proximity without any successful executions?
13
13 Outline Motivation Failure Indexing in Formulation Dynamic Slicing-Based Failure Indexing Experiments Conclusion
14
14 Failure Indexing in Formulation A failure indexing technique is a function pair : Signature function that represents a failing execution in certain ways : Distance function that calculates the dissimilarity between two failure signatures Indexing result A proximity matrix where the (i, j) cell is the dissimilarity between failure and, i.e., Failures and are indexed together if is small
15
15 Metrics for Indexing Effectiveness No quantitative metric for indexing effectiveness exists Indexing effectiveness Cohesion: To what extent failures due to the same bug are close to each other Separation: To what extent failures due to different bug are separated from each other Silhouette coefficient A measure adapted from data mining A value ranges from -1 to 1, the higher the better More details in paper (Section 2.2)
16
16 Outline Motivation Failure Indexing in Formulation Dynamic Slicing-Based Failure Indexing Experiments Conclusion
17
17 Dynamic Slicing-Based Failure Indexing Dynamic slicing as the failure signature function
18
18 Dynamic Slicing Full dynamic slice (FS) is the set of statements that DID affect the value of a variable at a program point for ONE specific execution. [Korel and Laski, 1988 ] …… 10. A = …... 20. B = …… 30. P = 31. If (P<0) {...... 35.A = A + 1 36. } 37. B=B+1 …… 40. Error(A) FS (A@40) = {10, 30, 35, 40}
19
19 Data Slicing Full dynamic slice (FS) is the set of statements that DID affect the value of a variable at a program point for ONE specific execution. [Korel and Laski, 1988 ] Data slice (DS): only data dependence is considered. …… 10. A = …... 20. B = …… 30. P = 31. If (P<0) {...... 35.A = A + 1 36. } 37. B=B+1 …… 40. Error(A) DS (A@40) = {10, 35, 40}
20
20 Distance between Dynamic Slices For any two non-empty dynamic slices and of the same program, the distance between them is
21
21 Outline Motivation Failure Indexing in Formulation Dynamic Slicing-Based Failure Indexing Experiments Conclusion
22
22 Experiment Result Experiment setup Benchmark (gzip 1.2.3) obtained from the Software-artifact Infrastructure Repository (SIR from Nebraska Lincoln), together with a test suite 6,184 lines of C code Ground-truth determination group 1 group 2 group 1 &2 -
23
23 Two Semantic Bugs in Gzip-1.2.3 Ground Truth: 217 input test cases (executions) in total 82 cases fail due to both faults, no crashes 65 fail due to Fault 1, 17 fail due to Fault 2 deflate.c /*Fault 1*/ /*Fault 2*/
24
24 Indexing Result R-Proximity is the most effective Expected because it uses information from both passing and failing executions T-Proximity is the worst Expected because it essentially indexes the entire execution, rather than the failure relevant part FS-Proximity and DS- proximity More effective than T- Proximity because indexing on failure- relevant information Less effective than R- Proximity because of no access to passing executions Red crosses are for failures due to Fault 1 Blue circles are for failures due to Fault 2 Proximity Graph(PG): the axes are meaningless, if two objects are distant in the PG, they are distant in their original space
25
25 Indexing Result- A Closer Look (1) Data slices can precisely capture the error propagation mechanism of Fault two. Red crosses are for failures due to Fault 1 Blue circles are for failures due to Fault 2
26
26 Indexing Result- A Closer Look (2) Data slices can precisely capture the two different error propagation mechanisms of Fault 1 Red crosses are for failures due to Fault 1 Blue circles are for failures due to Fault 2
27
27 Observations Dynamic slicing based failure proximity is more effective than T-Proximity DS-Proximity is more accurate than FS- Proximity DS-Proximity is able to produce more cohesive individual clusters. However, clusters belong to the same bug may be distant due to the different error propagations. Not as good as R-Proximity But does not require passing reports.
28
28 Outline Motivation Failure Indexing in Formulation Dynamic Slicing-Based Failure Indexing Experiments Conclusion
29
29 Conclusions Indexing noncrashing failures An increasingly important question as crashing failures are tackled more and more nicely Not intensively studied yet Dynamic slicing-based failure indexing Effective and does not rely on passing executions A framework to develop and evaluate more indexing techniques Decomposition of an indexing technique into signature function and distance function – Many instantiations Quantitative evaluation metrics for scientific study
30
30 Further discussion,contact chaoliu@uius.edu xyzhang@cs.purdue.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.