Automated Fault Prediction The Ins, The Outs, The Ups, The Downs Elaine Weyuker June 11, 2015.

Slides:

Advertisements

Similar presentations

Configuration management

Advertisements

Configuration management

Chapter 7 Introduction to Procedures. So far, all programs written in such way that all subtasks are integrated in one single large program. There is.

Test-Driven Development and Refactoring CPSC 315 – Programming Studio.

ICS103 Programming in C Lecture 1: Overview of Computers & Programming

Lecture 1: Overview of Computers & Programming

Software Quality Assurance Inspection by Ross Simmerman Software developers follow a method of software quality assurance and try to eliminate bugs prior.

1 Software Maintenance and Evolution CSSE 575: Session 8, Part 3 Predicting Bugs Steve Chenoweth Office Phone: (812) Cell: (937)

Software Quality Metrics

Review: Agile Software Testing in Large-Scale Project Talha Majeed COMP 587 Spring 2011.

Chapter 13 Forecasting.

SE 450 Software Processes & Product Metrics Activity Metrics.

16/27/2015 3:38 AM6/27/2015 3:38 AM6/27/2015 3:38 AMTesting and Debugging Testing The process of verifying the software performs to the specifications.

Software Testing. “Software and Cathedrals are much the same: First we build them, then we pray!!!” -Sam Redwine, Jr.

Chapter 8: I/O Streams and Data Files. In this chapter, you will learn about: – I/O file stream objects and functions – Reading and writing character-based.

EE694v-Verification-Lect5-1- Lecture 5 - Verification Tools Automation improves the efficiency and reliability of the verification process Some tools,

SIMULATION. Simulation Definition of Simulation Simulation Methodology Proposing a New Experiment Considerations When Using Computer Models Types of Simulations.

Testing - an Overview September 10, What is it, Why do it? Testing is a set of activities aimed at validating that an attribute or capability.

Automated Tests in NICOS Nightly Control System Alexander Undrus Brookhaven National Laboratory, Upton, NY Software testing is a difficult, time-consuming.

12 Steps to Useful Software Metrics

State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.

SEG Software Maintenance1 Software Maintenance “The modification of a software product after delivery to correct faults, to improve performance or.

A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi.

This chapter is extracted from Sommerville’s slides. Text book chapter

Advance Computer Programming Java Database Connectivity (JDBC) – In order to connect a Java application to a database, you need to use a JDBC driver. –

Introduction to High-Level Language Programming

CSCI 5801: Software Engineering

Testing. What is Testing? Definition: exercising a program under controlled conditions and verifying the results Purpose is to detect program defects.

© 2007 AT&T Knowledge Ventures. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Knowledge Ventures. Reflections and Perspectives on.

1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.

Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.

RUP Implementation and Testing

Software Engineering 2003 Jyrki Nummenmaa 1 CASE Tools CASE = Computer-Aided Software Engineering A set of tools to (optimally) assist in each.

Elaine Weyuker August  To determine which files of a large software system with multiple releases are likely to contain the largest numbers of.

Chapter 6 : Software Metrics

 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.

Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.

Configuration Management (CM)

Identifying and Using a Project’s Key Subprocess Metrics Jeff S. Holmes BTS Fort Worth.

By: TARUN MEHROTRA 12MCMB11.  More time is spent maintaining existing software than in developing new code.  Resources in M=3*(Resources in D)  Metrics.

Lecture 4 Software Metrics

Application Profiling Using gprof. What is profiling? Allows you to learn:  where your program is spending its time  what functions called what other.

1 The Personal Software Process Estimation Based on Real Data* * Would Martin Fowler approve? “I want you to take this personally…”

Chapter 3: Software Project Management Metrics

Effort Estimation ( 估计 ) And Scheduling ( 时序安排 ) Presented by Basker George.

CSc 461/561 Information Systems Engineering Lecture 5 – Software Metrics.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.

This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees. Sponsored by the U.S. Department.

CSC 480 Software Engineering Test Planning. Test Cases and Test Plans A test case is an explicit set of instructions designed to detect a particular class.

1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.

1 The Distribution of Faults in a Large Industrial Software System Thomas Ostrand Elaine Weyuker AT&T Labs -- Research Florham Park, NJ.

T EST T OOLS U NIT VI This unit contains the overview of the test tools. Also prerequisites for applying these tools, tools selection and implementation.

Downloading the MAXENT Software

Wednesday NI Vision Sessions

Steve Chenoweth Office Phone: (812) Cell: (937)

EEL5881 Software Engineering

Tom Ostrand Elaine Weyuker Bob Bell AT&T Labs – Research

ICS103 Programming in C Lecture 1: Overview of Computers & Programming

12 Steps to Useful Software Metrics

Algorithm Analysis CSE 2011 Winter September 2018.

Applied Software Implementation & Testing

SLOC and Size Reporting

I Know Where You're Hiding

Personal Software Process Software Estimation

Software Metrics “How do we measure the software?”

Software visualization and analysis tool box

CSE 303 Concepts and Tools for Software Development

Automation of Control System Configuration TAC 18

Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.

Presentation transcript:

Automated Fault Prediction The Ins, The Outs, The Ups, The Downs Elaine Weyuker June 11, 2015

 To determine which files of a large software system with multiple releases are likely to contain the largest numbers of bugs in the next release.

 Help testers prioritize testing efforts.  Help developers decide when to do design and code reviews and what to reimplement.  Help managers allocate resources.

Verified that bugs were non-uniformly distributed among files. Identified properties that were likely to affect fault- proneness, and then built a statistical model and ultimately a tool to make predictions.

● Size of file (KLOCs) ● Number of changes to the file in the previous 2 releases. ● Number of bugs in the file in the last release. ● Age of file (Number of releases in the system) ● Language the file is written in.

● All of the systems we’ve studied to date use a configuration management system which integrates version control and change management functionality, including bug history. ● Data is automatically extracted from the associated data repository and passed to the prediction engine.

 Used Negative Binomial Regression  Also considered machine learning algorithms including: ◦ Recursive Partitioning ◦ Random Forests ◦ BART (Bayesian Additive Regression Trees)

● Consists of two parts. ● The back end extracts data needed to make the predictions. ● The front end makes the predictions and displays them.

 Extracts necessary data from the repository.  Predicts how many bugs will be in each file in the next release of the system.  Sorts to files in decreasing order of the number of predicted bugs.  Displays results to user.

 Percentage of actual bugs that occurred in the N% of the files predicted to have the largest number of bugs. (N=20)  Considered other measures less sensitive to the specific value of N.

SystemYears Followed ReleasesLOC% Faults Top 20% NP417538K83% WN29438K83% VT K75% TS K81% TW K93% TE727327K76% IC K91% AR418281K87% IN K93%

The Tool

Prediction Engine Statistical Analysis Version Mgmt /Fault Database (previous releases) Release to be predicted User-supplied parameters Fault-proneness predictions

User enters system name. User asks for fault predictions for release “ Bluestone ” Available releases are found in the version mgmt database. User chooses the releases to analyze. User selects 4 file types. User specifies that all problems reported in System Test phase are faults.

User confirms configuration User enters filename to save the configuration. User clicks Save & Run button, to start the prediction process.

Initial prediction view for Bluestone All files are listed in decreasing order of predicted faults

Listing is restricted to eC files

Listing is restricted to 10% of eC files

 Prediction tool is fully-operational ◦ 750 lines Python for interface ◦ 2150 lines C, 75K bytes compiled for prediction engine  Current version’s backend (written in C) is specific for the internal AT&T configuration management system but can be adapted to other configuration management systems. All that is needed is a source of the data required by the prediction model.

Variations of the Fault Prediction Model

 Developers ◦ Counts ◦ Individuals  Amount of Code Change  Calling Structure

1. Standard model 2. Developer counts 3. Individual developers 4. Line-level change metrics 5. Calling structure Overview

 Underlying statistical model ◦ Negative binomial regression  Output (dependent) variable ◦ Predicted fault count in each file of release n  Predictor (independent) variables ◦ KLOC (n) ◦ Previous faults (n-1) ◦ Previous changes (n-1, n-2) ◦ File age (number of releases) ◦ File type (C,C++,java,sql,make,sh,perl,...) The Standard Model

How many different people have worked on the file in the most recent previous release? How many different people have worked on the file in all previous releases? This is a cumulative count. How many people who changed the file were working on it for the first time? Developer counts

Faults per file in releases of System BTS

Standard Model

Developers Changing File in Previous Release

New Developers Changing File in Previous Release

Total Developers Changing File in All Previous Releases

Total developers touching file in all previous releases

None of the developer count attributes uniformly increases prediction accuracy. In all cases, adding a developer count attribute to the standard model sometimes leads to less accurate predictions than the standard model alone. The benefit is never major. Summary

The standard model includes a count of the number of changes made in the previous two releases. It does not take into account how much code was changed. We will now look at the impact on predictive accuracy of adding to the model fine-grained information about change size. Code Change

 Number of changes made to a file during a previous release  Number of lines added  Number of lines deleted  Number of lines modified  Relative size of change (line changes/LOC)  Changed/not changed Measures of Code Change

18 releases, 5 year lifespan IC: Large provisioning system  6 languages: Java (60%), C, C++, SQL, SQL-C, SQL-C++  files  1.5Mil LOC  Average of 395 faults/release  AR: Utility, data aggregation system  >10 languages: Java (77%), Perl, xml, sh,...  800 files  280K LOC  Average of 90 faults/release Two Subject Systems

Distribution of files, averages over all releases.

System IC Faults per File, by Release

System AR Faults per File, by Release

 Univariate models  Base model: log(KLOC), File age, File type  Augmented models: ◦ Previous Changes ◦ Previous {Adds / Deletes / Mods} ◦ Previous {Adds / Deletes / Mods} / LOC (relative churn) ◦ Previous Developers Prediction Models with Line-level Change Counts

Fault-percentile averages for univariate predictor models: System IC

Base Model and Added Variables: System IC Base model: KLOC, File age (number of releases), File type (C,C++,java,sql,make,sh,perl,...)

Base Model and Added Variables: System AR

 Change information provides important information for fault predictions  {Adds+Deletes+Mods} improves the accuracy of a model that doesn’t include any change information BUT  a simple count of prior changes slightly outperforms {Adds+Deletes+Mods}  Prior changed (a simple binary variable) is nearly as good as either, when added to a model without change info  Lines added is the most effective single change predictor  Lines deleted is least effective single change predictor  Relative changes is no better than absolute changes for predicting total fault count Summary

Individual Developers How can we measure the effect that a single developer has on the faultiness of a file? If developer d modifies k files in release N  how many of those files have bugs in release N+1?  how many bugs are in those files in release N+1?

The BuggyFile Ratio If d modifies k files in release N, and if b of them have bugs in release N+1, the buggyfile ratio for d is b/k System IC has 107 programmers. Over 15 releases, their buggyfile ratios vary between 0 and 1 The average is about 0.4

Average buggyfile ratio, all programmers

Buggyfile ratio for two programmers

Buggyfile ratio more typical cases

The Bug Ratio If d modifies k files in release N, and if there are B bugs in those files in release N+1, the bug ratio for d is B/k The bug ratio can vary between 0 and B Over 15 releases, we’ve seen a maximum bug ratio of about 8 The average is about 1.5

Bug RatioBuggyfile Ratio

Problems with these definitions  A file can be changed by more than one developer.  A file may be changed in Rel N and a fault detected in N+1, but that change may not have caused that fault.  A programmer might change many files in the identical trivial ways (interface, variable name,...)  The “best” programmers might be assigned to work on the most difficult files.  For most programmers, the bug ratios vary widely from release to release.

Is individual programmer bug-proneness helpful for prediction? Is this information useful for helping a project succeed? Are there better ways to measure it? Is it ethical to measure it? Does attempting to measure it lead to poor performance and unhappy programmers? Some final thoughts

Are files that have high rate of interaction with other files more fault-prone? Calling Structure File Q Method 1 Method 2 File X File Y File Z Callees of File Q File A File B Callers of File Q

For each file:  number of callers & callees  number of new callers & callees  number of prior new callers & callees  number of prior changed callers & callees  number of prior faulty callers & callees  ratio of internal calls to total calls Calling Structure Attributes Investigated

 Code and history attributes, no calling structure  Code and history attributes, including calling structure  Code attributes only, including calling structure Fault Prediction by Multi-variable Models

 Models applied to C, C++, and C-SQL files of one of the systems studied.  First model built from the single best attribute.  Each succeeding model built by adding the attribute that most improves the prediction.  Stop when no attribute improves. Fault Prediction by Multi-variable Models

Code and history attributes, no calling structure

Code, history, and calling structure attributes

Code and calling structure attributes but not numbers of faults or changes in previous releases.

 Calling structure attributes do not increase the accuracy of predictions.  History attributes (prior changes, prior faults) increase accuracy, either with or without calling structure.  We only studied these issues for two of the systems. Summary

 The Standard Model performs very well (on all nine industrial systems we have examined)  The augmented models add very little or no additional accuracy  Cumulative developers is the most effective addition to the Standard Model, but still doesn’t guarantee improved prediction or yield significant improvement. Overall Summary

◦ Will our standard model make accurate predictions for open- source systems? ◦ Will our standard model make accurate predictions agile systems? ◦ Can we predict which files will contain the faults with the highest severities? ◦ Can predictions be made for units smaller than files? ◦ Can run-time attributes be used to make fault predictions? (execution time, execution frequency, memory use, …) ◦ What is the most meaningful way to assess the effectiveness and accuracy of the predictions? What’s Ahead?