Presentation is loading. Please wait.

Presentation is loading. Please wait.

02/13/20071 Indexing Noncrashing Failures: A Dynamic Program Slicing-Based Approach Chao Liu, Xiangyu Zhang, Jiawei Han, Yu Zhang, Bharat K. Bhargava University.

Similar presentations


Presentation on theme: "02/13/20071 Indexing Noncrashing Failures: A Dynamic Program Slicing-Based Approach Chao Liu, Xiangyu Zhang, Jiawei Han, Yu Zhang, Bharat K. Bhargava University."— Presentation transcript:

1 02/13/20071 Indexing Noncrashing Failures: A Dynamic Program Slicing-Based Approach Chao Liu, Xiangyu Zhang, Jiawei Han, Yu Zhang, Bharat K. Bhargava University of Illinois at Urbana-Champaign Purdue University Supported by NSF 0242840, 0219110

2 2 Overview Problem:  Automatically cluster program failures that are due to the same bug. Solution:  By looking at the similarity between the dynamic slices of program failures.

3 3 Outline Motivation Failure Indexing in Formulation Dynamic Slicing-Based Failure Indexing Experiments Conclusion

4 4 Automated Failure Reporting End-users as Beta testers  Valuable information about failure occurrences in reality  24.5 million/day in Redmond (if all users send) – John Dvorak, PC Magazine Widely adopted because of its usefulness  Microsoft Windows, Linux Gentoo, Mozilla applications …  Any applications can implement this functionality

5 5 Failure Report Automatic reports (windows/mozilla)  Application name, version (e.g., winword.exe).  Module name, version (e.g., mso.dll)  Offset into module (for example, 00003cbb).  Calling context. Manual reports (bugzilla)  Textual description of the symptoms  Failure inducing input

6 6 After Failures Collected … Failure triage  Failure prioritization: What are the most severe bugs? Worst 1% bugs = 50% failures  Duplicate failure removal Same failures can be reported multiple times  Patch suggestion Automatically locating the patch by querying the patch database with the reported failure

7 7 Cluster failure reports that may correspond to the same fault. A Solution: Failure Indexing Most Severe Less Severe Least Severe Failure Report s + + +++ + + + + + + + + + + + + + + + + + X Y 0

8 8 Current Status of Failure Indexing Great success in indexing crashing failures  Same crashing venues likely imply the same failure  E.g., Microsoft Dr. Watson System, Mozilla Quality Feedback Agent … Elusive: How to index noncrashing failures  Noncrashing failures are mainly due to semantic bugs  Hard to index because crashing contexts not available anymore

9 9 Noncrashing Failures Examples.  Unwanted dialogs.  Undesired visual outputs, e.g. colors, layouts.  Periodical loss of focus.  Periodical loss of connection.  Abnormal memory consumption.  Abnormal performance. Caused by semantic bugs.

10 10 Semantic Bugs Dominate Semantic Bugs: Application specific Only few are detectable Mostly require annotations or specifications Memory-related Bugs: Many are detectable Others Concurrency bugs Bug Distribution [Li et al., ICSE’07] 264 bugs in Mozilla and 98 bugs in Apache manually checked 29,000 bugs in Bugzilla automatically checked Courtesy of Zhenmin Li

11 11 Existing Approaches to Indexing Noncrashing Failures T-Proximity [Podgurski et al., ICSE 2003]  Failures exhibiting similar behaviors (e.g., similar branchings) are indexed together  Entire execution is considered R-Proximity [Liu and Han, FSE 2006]  Failures likely due to the same bug are indexed together  Bug location for each failure is automatically found through statistical debugging tool SOBER [Liu et al., FSE 2005]

12 12 Comments on Existing Approaches Ideal Solution (possible through manual effort)  Index by root causes (i.e., the exact fault location)  Finding root causes for every failure is exactly what failure indexing wants to circumvent T-Proximity  Indexing based on the entire execution  But usually only a small part of an execution is failure-relevant R-Proximity  Indexing by likely fault location – failure-relevant  Better quality than T-Proximity, but requires a set of passing executions to find the likely fault location Theme of this paper  Can we index noncrashing failures as effectively as R- Proximity without any successful executions?

13 13 Outline Motivation Failure Indexing in Formulation Dynamic Slicing-Based Failure Indexing Experiments Conclusion

14 14 Failure Indexing in Formulation A failure indexing technique is a function pair  : Signature function that represents a failing execution in certain ways  : Distance function that calculates the dissimilarity between two failure signatures Indexing result  A proximity matrix where the (i, j) cell is the dissimilarity between failure and, i.e.,  Failures and are indexed together if is small

15 15 Metrics for Indexing Effectiveness No quantitative metric for indexing effectiveness exists Indexing effectiveness  Cohesion: To what extent failures due to the same bug are close to each other  Separation: To what extent failures due to different bug are separated from each other Silhouette coefficient  A measure adapted from data mining  A value ranges from -1 to 1, the higher the better  More details in paper (Section 2.2)

16 16 Outline Motivation Failure Indexing in Formulation Dynamic Slicing-Based Failure Indexing Experiments Conclusion

17 17 Dynamic Slicing-Based Failure Indexing Dynamic slicing as the failure signature function

18 18 Dynamic Slicing Full dynamic slice (FS) is the set of statements that DID affect the value of a variable at a program point for ONE specific execution. [Korel and Laski, 1988 ] …… 10. A = …... 20. B = …… 30. P = 31. If (P<0) {...... 35.A = A + 1 36. } 37. B=B+1 …… 40. Error(A) FS (A@40) = {10, 30, 35, 40}

19 19 Data Slicing Full dynamic slice (FS) is the set of statements that DID affect the value of a variable at a program point for ONE specific execution. [Korel and Laski, 1988 ] Data slice (DS): only data dependence is considered. …… 10. A = …... 20. B = …… 30. P = 31. If (P<0) {...... 35.A = A + 1 36. } 37. B=B+1 …… 40. Error(A) DS (A@40) = {10, 35, 40}

20 20 Distance between Dynamic Slices For any two non-empty dynamic slices and of the same program, the distance between them is

21 21 Outline Motivation Failure Indexing in Formulation Dynamic Slicing-Based Failure Indexing Experiments Conclusion

22 22 Experiment Result Experiment setup  Benchmark (gzip 1.2.3) obtained from the Software-artifact Infrastructure Repository (SIR from Nebraska Lincoln), together with a test suite  6,184 lines of C code  Ground-truth determination group 1 group 2 group 1 &2 -

23 23 Two Semantic Bugs in Gzip-1.2.3 Ground Truth:  217 input test cases (executions) in total  82 cases fail due to both faults, no crashes  65 fail due to Fault 1, 17 fail due to Fault 2 deflate.c /*Fault 1*/ /*Fault 2*/

24 24 Indexing Result R-Proximity is the most effective  Expected because it uses information from both passing and failing executions T-Proximity is the worst  Expected because it essentially indexes the entire execution, rather than the failure relevant part FS-Proximity and DS- proximity  More effective than T- Proximity because indexing on failure- relevant information  Less effective than R- Proximity because of no access to passing executions Red crosses are for failures due to Fault 1 Blue circles are for failures due to Fault 2 Proximity Graph(PG): the axes are meaningless, if two objects are distant in the PG, they are distant in their original space

25 25 Indexing Result- A Closer Look (1) Data slices can precisely capture the error propagation mechanism of Fault two. Red crosses are for failures due to Fault 1 Blue circles are for failures due to Fault 2

26 26 Indexing Result- A Closer Look (2) Data slices can precisely capture the two different error propagation mechanisms of Fault 1 Red crosses are for failures due to Fault 1 Blue circles are for failures due to Fault 2

27 27 Observations Dynamic slicing based failure proximity is more effective than T-Proximity DS-Proximity is more accurate than FS- Proximity DS-Proximity is able to produce more cohesive individual clusters.  However, clusters belong to the same bug may be distant due to the different error propagations. Not as good as R-Proximity But does not require passing reports.

28 28 Outline Motivation Failure Indexing in Formulation Dynamic Slicing-Based Failure Indexing Experiments Conclusion

29 29 Conclusions Indexing noncrashing failures  An increasingly important question as crashing failures are tackled more and more nicely  Not intensively studied yet Dynamic slicing-based failure indexing  Effective and does not rely on passing executions A framework to develop and evaluate more indexing techniques  Decomposition of an indexing technique into signature function and distance function – Many instantiations  Quantitative evaluation metrics for scientific study

30 30 Further discussion,contact chaoliu@uius.edu xyzhang@cs.purdue.edu


Download ppt "02/13/20071 Indexing Noncrashing Failures: A Dynamic Program Slicing-Based Approach Chao Liu, Xiangyu Zhang, Jiawei Han, Yu Zhang, Bharat K. Bhargava University."

Similar presentations


Ads by Google