Download presentation
Presentation is loading. Please wait.
Published byMyles Wood Modified over 9 years ago
1
Automated Fault Prediction The Ins, The Outs, The Ups, The Downs Elaine Weyuker June 11, 2015
2
To determine which files of a large software system with multiple releases are likely to contain the largest numbers of bugs in the next release.
3
Help testers prioritize testing efforts. Help developers decide when to do design and code reviews and what to reimplement. Help managers allocate resources.
4
Verified that bugs were non-uniformly distributed among files. Identified properties that were likely to affect fault- proneness, and then built a statistical model and ultimately a tool to make predictions.
5
● Size of file (KLOCs) ● Number of changes to the file in the previous 2 releases. ● Number of bugs in the file in the last release. ● Age of file (Number of releases in the system) ● Language the file is written in.
6
● All of the systems we’ve studied to date use a configuration management system which integrates version control and change management functionality, including bug history. ● Data is automatically extracted from the associated data repository and passed to the prediction engine.
7
Used Negative Binomial Regression Also considered machine learning algorithms including: ◦ Recursive Partitioning ◦ Random Forests ◦ BART (Bayesian Additive Regression Trees)
8
● Consists of two parts. ● The back end extracts data needed to make the predictions. ● The front end makes the predictions and displays them.
9
Extracts necessary data from the repository. Predicts how many bugs will be in each file in the next release of the system. Sorts to files in decreasing order of the number of predicted bugs. Displays results to user.
10
Percentage of actual bugs that occurred in the N% of the files predicted to have the largest number of bugs. (N=20) Considered other measures less sensitive to the specific value of N.
11
SystemYears Followed ReleasesLOC% Faults Top 20% NP417538K83% WN29438K83% VT2.259329K75% TS9+35442K81% TW9+35384K93% TE727327K76% IC4181520K91% AR418281K87% IN4182116K93%
14
The Tool
15
Prediction Engine Statistical Analysis Version Mgmt /Fault Database (previous releases) Release to be predicted User-supplied parameters Fault-proneness predictions
16
User enters system name. User asks for fault predictions for release “ Bluestone2008.1 ” Available releases are found in the version mgmt database. User chooses the releases to analyze. User selects 4 file types. User specifies that all problems reported in System Test phase are faults.
17
User confirms configuration User enters filename to save the configuration. User clicks Save & Run button, to start the prediction process.
18
Initial prediction view for Bluestone2008.1 All files are listed in decreasing order of predicted faults
19
Listing is restricted to eC files
20
Listing is restricted to 10% of eC files
21
Prediction tool is fully-operational ◦ 750 lines Python for interface ◦ 2150 lines C, 75K bytes compiled for prediction engine Current version’s backend (written in C) is specific for the internal AT&T configuration management system but can be adapted to other configuration management systems. All that is needed is a source of the data required by the prediction model.
22
Variations of the Fault Prediction Model
23
Developers ◦ Counts ◦ Individuals Amount of Code Change Calling Structure
24
1. Standard model 2. Developer counts 3. Individual developers 4. Line-level change metrics 5. Calling structure Overview
25
Underlying statistical model ◦ Negative binomial regression Output (dependent) variable ◦ Predicted fault count in each file of release n Predictor (independent) variables ◦ KLOC (n) ◦ Previous faults (n-1) ◦ Previous changes (n-1, n-2) ◦ File age (number of releases) ◦ File type (C,C++,java,sql,make,sh,perl,...) The Standard Model
26
How many different people have worked on the file in the most recent previous release? How many different people have worked on the file in all previous releases? This is a cumulative count. How many people who changed the file were working on it for the first time? Developer counts
27
Faults per file in releases of System BTS
28
Standard Model
29
Developers Changing File in Previous Release
30
New Developers Changing File in Previous Release
31
Total Developers Changing File in All Previous Releases
32
Total developers touching file in all previous releases
33
None of the developer count attributes uniformly increases prediction accuracy. In all cases, adding a developer count attribute to the standard model sometimes leads to less accurate predictions than the standard model alone. The benefit is never major. Summary
34
The standard model includes a count of the number of changes made in the previous two releases. It does not take into account how much code was changed. We will now look at the impact on predictive accuracy of adding to the model fine-grained information about change size. Code Change
35
Number of changes made to a file during a previous release Number of lines added Number of lines deleted Number of lines modified Relative size of change (line changes/LOC) Changed/not changed Measures of Code Change
36
18 releases, 5 year lifespan IC: Large provisioning system 6 languages: Java (60%), C, C++, SQL, SQL-C, SQL-C++ 3000+ files 1.5Mil LOC Average of 395 faults/release AR: Utility, data aggregation system >10 languages: Java (77%), Perl, xml, sh,... 800 files 280K LOC Average of 90 faults/release Two Subject Systems
37
Distribution of files, averages over all releases.
38
System IC Faults per File, by Release
39
System AR Faults per File, by Release
40
Univariate models Base model: log(KLOC), File age, File type Augmented models: ◦ Previous Changes ◦ Previous {Adds / Deletes / Mods} ◦ Previous {Adds / Deletes / Mods} / LOC (relative churn) ◦ Previous Developers Prediction Models with Line-level Change Counts
41
Fault-percentile averages for univariate predictor models: System IC
42
Base Model and Added Variables: System IC Base model: KLOC, File age (number of releases), File type (C,C++,java,sql,make,sh,perl,...)
43
Base Model and Added Variables: System AR
44
Change information provides important information for fault predictions {Adds+Deletes+Mods} improves the accuracy of a model that doesn’t include any change information BUT a simple count of prior changes slightly outperforms {Adds+Deletes+Mods} Prior changed (a simple binary variable) is nearly as good as either, when added to a model without change info Lines added is the most effective single change predictor Lines deleted is least effective single change predictor Relative changes is no better than absolute changes for predicting total fault count Summary
45
Individual Developers How can we measure the effect that a single developer has on the faultiness of a file? If developer d modifies k files in release N how many of those files have bugs in release N+1? how many bugs are in those files in release N+1?
46
The BuggyFile Ratio If d modifies k files in release N, and if b of them have bugs in release N+1, the buggyfile ratio for d is b/k System IC has 107 programmers. Over 15 releases, their buggyfile ratios vary between 0 and 1 The average is about 0.4
47
Average buggyfile ratio, all programmers
48
Buggyfile ratio for two programmers
49
Buggyfile ratio more typical cases
50
The Bug Ratio If d modifies k files in release N, and if there are B bugs in those files in release N+1, the bug ratio for d is B/k The bug ratio can vary between 0 and B Over 15 releases, we’ve seen a maximum bug ratio of about 8 The average is about 1.5
51
Bug RatioBuggyfile Ratio
52
Problems with these definitions A file can be changed by more than one developer. A file may be changed in Rel N and a fault detected in N+1, but that change may not have caused that fault. A programmer might change many files in the identical trivial ways (interface, variable name,...) The “best” programmers might be assigned to work on the most difficult files. For most programmers, the bug ratios vary widely from release to release.
53
Is individual programmer bug-proneness helpful for prediction? Is this information useful for helping a project succeed? Are there better ways to measure it? Is it ethical to measure it? Does attempting to measure it lead to poor performance and unhappy programmers? Some final thoughts
54
Are files that have high rate of interaction with other files more fault-prone? Calling Structure File Q Method 1 Method 2 File X File Y File Z Callees of File Q File A File B Callers of File Q
55
For each file: number of callers & callees number of new callers & callees number of prior new callers & callees number of prior changed callers & callees number of prior faulty callers & callees ratio of internal calls to total calls Calling Structure Attributes Investigated
56
Code and history attributes, no calling structure Code and history attributes, including calling structure Code attributes only, including calling structure Fault Prediction by Multi-variable Models
57
Models applied to C, C++, and C-SQL files of one of the systems studied. First model built from the single best attribute. Each succeeding model built by adding the attribute that most improves the prediction. Stop when no attribute improves. Fault Prediction by Multi-variable Models
58
Code and history attributes, no calling structure
59
Code, history, and calling structure attributes
60
Code and calling structure attributes but not numbers of faults or changes in previous releases.
61
Calling structure attributes do not increase the accuracy of predictions. History attributes (prior changes, prior faults) increase accuracy, either with or without calling structure. We only studied these issues for two of the systems. Summary
62
The Standard Model performs very well (on all nine industrial systems we have examined) The augmented models add very little or no additional accuracy Cumulative developers is the most effective addition to the Standard Model, but still doesn’t guarantee improved prediction or yield significant improvement. Overall Summary
63
◦ Will our standard model make accurate predictions for open- source systems? ◦ Will our standard model make accurate predictions agile systems? ◦ Can we predict which files will contain the faults with the highest severities? ◦ Can predictions be made for units smaller than files? ◦ Can run-time attributes be used to make fault predictions? (execution time, execution frequency, memory use, …) ◦ What is the most meaningful way to assess the effectiveness and accuracy of the predictions? What’s Ahead?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.