1 The Distribution of Faults in a Large Industrial Software System Thomas Ostrand Elaine Weyuker AT&T Labs -- Research Florham Park, NJ.

1 The Distribution of Faults in a Large Industrial Software System Thomas Ostrand Elaine Weyuker AT&T Labs -- Research Florham Park, NJ

2 The $59 Billion* Question Can we predict where severe bugs are likely to be? Can we identify characteristics of code units, files, or modules that indicate a higher probability of bugs? i.e., can we characterize the fault-proneness of code units? *The Economic Impacts of Inadequate Infrastructure for Software Testing, May 2002, NIST, May 2002

3 Software Engineering Folklore A relatively small number of modules have most of the faults If a relatively small number of modules have most faults, the reason is that they also contain most of the code Large modules are buggier than small ones Buggy early → Buggy late (inherently bad modules) New code is {more, less} buggy than old code Code metrics are {good, bad} predictors of code quality

4 The Case Study We studied the fault database of a large currently in-use under continuing development inventory tracking system.

5 System Information System type: Inventory tracking Lifespan: First release ~1998. Subsequent releases every 3 months. Development stages: requirements, design, development, unit testing, integration testing, system testing, beta release, controlled release, and general release. Code: About ¾ of the files in java, with smaller numbers written in shell script, makefiles, xml, html, perl, c, sql, and awk. Fault data studied from 13 successive releases.

6 Predicting Fault Proneness: Possible factors File Size: Is the density of faults found in a file related to the file’s size? Faults found during early development stages: Does the number of faults found in early stages predict the number that will be found in later stages? Faults found during early releases: Does the number of faults found in early releases predict the number that will be found in later releases? Age of file: Are new files more likely to have faults than files that existed in earlier releases?

7 Number of Files Release Number

8 Size of System (KLOCs, including comments) Release Number

9 Number of files and number of faults detected in each release Release Number Number of Files & Faults

10 Size of System (KLOCs) and Number of Faults Release Number KLOCs & Number of Faults

11 SW Engineering Folklore A relatively small number of modules have most of the faults If a relatively small number of modules have most faults, the reason is that they also contain most of the code Large modules are buggier than small ones Buggy early → Buggy late (inherently bad modules) New code is {more, less} buggy than old code Code metrics are {good, bad} predictors of code quality

12 Distribution of Faults over Files Percent of Faulty Files Release Number Percent of Files that have faults Number of files in release

13 Distribution of Faults over Files Number of Faulty Files Release Number Number of Files that have faults

15 Concentration of Faults in Rel 12

19 Fault density vs. file size (Fenton & Ohlsson, TSE 1997)

20 Fault density vs. module size (Hatton, Software 1997)

22 Does the number of faults found in early development stages predict the number that will be found in later stages? Does the number of faults found in early releases predict the number that will be found in later releases?

23 Faults by Development Stages Release Number Number of Faults

24 Once Faulty, Always Faulty? From Stage to Stage In every release, ALL of the post-release faults were found in files whose contribution to the integration and system test faults was relatively low (6% - 28%). In other words, 94% -72% of the late pre-release faults were in files that had NO post-release faults. Fenton & Ohlsson observed similar results.

25 Once Faulty, Always Faulty? From Stage to Stage However, across all the releases, there were only a total of 128 post-release faults. No release had more than 20 post- release faults, and half had no more than 10 post-release faults. Not enough data to draw meaningful conclusions.

26 Number of Post-Release Faults Releasein Files with 0 Late Pre-Rel Faults in Files with  1 Late Pre-Rel Faults 100 210 300 431 594 685 7180 853 955 1075 11108 12146 1374 Total8741

27 Faults by Stages Release Number Number of Faults

28 Faults by Stages (Sorted by Decreasing Number Early Pre-release Faults) Release Number Number of Faults

29 Once Faulty, Always Faulty? From Release to Release High-fault files of a release: Top 20% of files ordered by decreasing number of faults. Over all releases, roughly 35% of these files were also high-fault files in the preceding and/or succeeding releases. For Release 12, more than 40% of its high-fault files were also high-fault files in Release 1.

30 Persistence of high fault count between releases Percent of high-fault files inthis release that are high-faultin prev/next release Release

32 Old Files and New Files For Release n: an old file is a file that existed in some Release i < n, and still exists in Release n. a new file is a file that did not exist in any Release i < n, and is in Release n.

33 Are new files more likely to have faults than files that existed in earlier releases? Do new files have higher fault density than old files?

34 Old Files, New Files Percent Containing (any number of) Faults Release Number Percent Containing Faults

35 Old Files, New Files Faults/KLOC over all files of the release Release Number Faults/KLOC

36 Summary of Fault Proneness Observations Faults are concentrated in a relatively small number of files, and become more heavily concentrated as the system matures. Large files do not generally have higher fault density than small files; the opposite seems to be true. Files with high fault counts during pre-release do not generally have high fault counts during post- release.

37 Fault Proneness Observations Files with the largest numbers of faults in an early release, seem to be more likely to have large numbers of faults in the next release and later releases. Newly written files are more likely to be faulty than old files, and to have higher fault density than old files.

38 Continuing Work Study additional systems Statistical analysis Study relation between bugs in successive releases (Do persistently high-fault files have related bugs in successive releases?) Do the numbers change if we calculate them for different levels of fault severity? Are code metrics good predictors of faults?

1 The Distribution of Faults in a Large Industrial Software System Thomas Ostrand Elaine Weyuker AT&T Labs -- Research Florham Park, NJ.

Similar presentations

Presentation on theme: "1 The Distribution of Faults in a Large Industrial Software System Thomas Ostrand Elaine Weyuker AT&T Labs -- Research Florham Park, NJ."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 The Distribution of Faults in a Large Industrial Software System Thomas Ostrand Elaine Weyuker AT&T Labs -- Research Florham Park, NJ.

Similar presentations

Presentation on theme: "1 The Distribution of Faults in a Large Industrial Software System Thomas Ostrand Elaine Weyuker AT&T Labs -- Research Florham Park, NJ."— Presentation transcript:

Similar presentations

About project

Feedback