Selecting a defect model for maintenance resource planning and software insurance Paul Li Carnegie Mellon University

Selecting a defect model for maintenance resource planning and software insurance Paul Li Carnegie Mellon University Paul.Li@cs.cmu.edu

Presentation Overview Creating a predictive model for software defect occurrences is a first step toward dealing with consequences of software failure in commercial software systems. Creating a predictive model for software defect occurrences is a first step toward dealing with consequences of software failure in commercial software systems. Model form and parameterization of Weibull and Gamma distributions fit field defect occurrence characteristics of widely-used commercial software systems. Model form and parameterization of Weibull and Gamma distributions fit field defect occurrence characteristics of widely-used commercial software systems. The next step is to take use information available prior to release to estimate the fitted model parameters. The next step is to take use information available prior to release to estimate the fitted model parameters.

The real world problem and the research problem Real World Problem Research Framework Research Problem Consequences of commercial software systems defects include costs to consumers in the form of losses associated with failures and costs to producers in the form of maintenance costs associated with repairing the underlying faults. A set of composable tools to help producers to manage and evaluate the risks and uncertainties associated with commercial software systems: defect prediction model, defect attribution method, loss model, cost to repair model. A defect prediction model that takes information available before release to estimate the number of field defects anytime after release.

The fault model Fault duration: permanent (reproducible) Fault duration: permanent (reproducible) Fault manifestation: deviation from expected behavior as perceived and reported by a user in the field. Fault manifestation: deviation from expected behavior as perceived and reported by a user in the field. Fault source: any mistake at the code level. Fault source: any mistake at the code level. Granularity: clearly identified software component. Granularity: clearly identified software component. Fault profile expectation: random, arbitrary, and unforeseen. Fault profile expectation: random, arbitrary, and unforeseen.

The research setting 1. Determine the defect model that can best describe the field defect occurrences and derived model parameters associated with the best fitted model for each release. 2. Use information prior to release for each release to predict the best fitted model parameters. 3. Use data as it becomes available after release to adjust defect estimates. 4. Identify and incorporate additional predictors to improve predictions.

Previous works Recall from previous talks that: Recall from previous talks that:  We are look at the number of user reported defects from widely-used and multi-release commercial software systems.  We think that the functional form and parameterization of Gamma and Weibull models make them better suited to describe the ramping up characteristics seen in the commercial systems.

Comparing the fit of defect models There is a set of parameterized defect model classes each having its own form and parameterization. There is a set of parameterized defect model classes each having its own form and parameterization.  We select commonly accepted classes of models: Exponential, Gamma, Weibull, Power, and Logarithmic We find the model parameters for each class of models that best fits actual field defect data for releases of two widely-used and multi-release commercial software system and compare the fits. We find the model parameters for each class of models that best fits actual field defect data for releases of two widely-used and multi-release commercial software system and compare the fits.

A middleware

An operating system

Difference in estimates Sum absolute difference between best fit model estimates in each model class and actual defect occurrences for OS and Middleware Model ModelReleaseExponentialWeibullGammaPowerLogarithmic OS R1 1156983144127 OS R2 5844439173 OS R3 216151184361263 OS R4 5857758770 MW R1 6952528879

Variance in parameter values Percentage deviation from the mean in OS model Parameter Release ReleaseModel OS R1 OS R2 OS R3 OS R4 Exponential: N(1 - exp (- t/ beta) ) Exponential N 36%51%121%34% Exponential Beta 39%9%16%13% Weibull: N(1 - exp (- (t^alpha)/beta) Weibull N 39%51%123%34% Weibull Alpha 17%3%3%10% Weibull Beta 104%26%34%44% Gamma: alpha(1 - (1+t/beta) * exp (- t/beta) ) Gamma Alpha 38%51%123%34% Gamma Beta 29%3%13%13% Power: alpha (t^beta) Power Alpha 51%42%106%13% Power Beta 8%8%8%8% Logarithmic: ln(t/alpha +1) * beta Log Alpha 104%28%33%44% Log Beta 12%54%107%41%

Variance in parameter values 2 Percentage deviation from the mean (OS Avg and Middleware) in model Parameter System SystemModel OS (Average) MW Exponential: N(1 - exp (- t/ beta) ) Exponential N 10%10% Exponential Beta 36%36% Weibull: N(1 - exp (- (t^alpha)/beta) Weibull N 18%18% Weibull Alpha 2%2% Weibull Beta 30%30% Gamma: alpha(1 - (1+t/beta) * exp (- t/beta) ) Gamma Alpha 18%18% Gamma Beta 23%23% Power: alpha (t^beta) Power Alpha 42%42% Power Beta 10%10% Logarithmic: ln(t/alpha +1) * beta Log Alpha 52%52% Log Beta 12%12%

Validity of results External Validity External Validity Real widely-used (>1000 users) multi-release commercial software system. From one software producing organization. Internal Validity Best currently available models. Likelihood maximization using Non-homogenous poison process mathematical fitting procedure. Fitted using grid search process. For releases in late stages of release-life.

The next step We have a parameter values for the best fitting model for each class of models. We have a parameter values for the best fitting model for each class of models. We have limited pre-release information for each release. We have limited pre-release information for each release. Determine how well the pre-release information can predict the best fitted parameter values. Determine how well the pre-release information can predict the best fitted parameter values.

The research problem and the real world Real World Solution Research Framework Research Solution A defect prediction model that uses information available prior to release to estimate the number of defect occurrences in the field at any time. Together with other research pieces we can produce products like: software insurance, maintenance resource planner, effect estimator for changes in development Software consumers can select the software product that meets their risk profiles and buy insurance to hedge risks. Software producers can allocate the appropriate amount of maintenance resources and make informed decision during development. A policy tool based on insurance rates to influence and encourage engineering/development decisions.

The End Thank you. Please send suggestions and email to Paul.Li@cs.cmu.edu

Selecting a defect model for maintenance resource planning and software insurance Paul Li Carnegie Mellon University

Similar presentations

Presentation on theme: "Selecting a defect model for maintenance resource planning and software insurance Paul Li Carnegie Mellon University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Selecting a defect model for maintenance resource planning and software insurance Paul Li Carnegie Mellon University

Similar presentations

Presentation on theme: "Selecting a defect model for maintenance resource planning and software insurance Paul Li Carnegie Mellon University"— Presentation transcript:

Similar presentations

About project

Feedback