Empirical Evaluation of Defect Projection Models for Widely-deployed Production Software Systems FSE 2004 Paul Li, Mary Shaw, Jim Herbsleb Institute for.

Empirical Evaluation of Defect Projection Models for Widely-deployed Production Software Systems FSE 2004 Paul Li, Mary Shaw, Jim Herbsleb Institute for Software Research Intl., School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Bonnie Ray, P. Santhanam Center for Software Engineering IBM T.J. Watson Research Center Hawthorne, NY 10532

Overview one Defect occurrences are problems Methods that deal with the economic consequences require accurate defect occurrence rate projections Defect occurrence rate projection for widely-deployed production software systems has novel problems

Overview two We have two empirical results that can help defect occurrence rate projections:  Part 1: The Weibull model is better than other previous published models  Part 2: Naïve parameter extrapolation methods that do not consider changes in characteristics are inadequate

The real world problem Methods to deal with the economic consequences:  Maintenance resource allocation  Service contracts  Software insurance All require accurate defect occurrence rate projections:  Projections of the rate of user reported defect occurrences after a software release becomes available for each distinct release

The context Widely-deployed production software systems:  Many software and hardware configurations in use  Unknown deployment and usage patterns  Constrained development process  Evolving contents over time over multiple releases

The research problem How do you predict the rate of defect occurrences? Now Months Defect occurrences Release N+1 Release N Release N-1 Release N-2

Given this model, how can we predict model parameters for the next release? Now Months Defect occurrences The research questions Is there a model that describes the defect occurrence pattern? ? ?

The research approach In the context of widely-deployed production software:  Perform analysis to develop hypotheses concerning models/methods  Use real world data to empirically test hypotheses

The data User-reported defects in 22 releases of four widely-deployed productions software systems:  8 releases of a commercial operating system  3 releases of a commercial middleware system  8 releases of an open source operating system (OpenBSD)  3 releases of an open source middleware system (Jakarta Tomcat)

Relation to prior work Software reliability modeling and software certification:  Assume software and hardware configurations and deployment and usage patterns are known Total number of defects prediction and defect prone module identification:  Produce results that are insufficient for maintenance planning and software insurance No work on projecting defect occurrence rates for open source software systems

Part 1: which model to use? Now Months Defect occurrences ?

Previously published models Model typeModel shapeModel form Exponential Goel & Okumoto [1979] λ(t) = N α e – α t Weibull Schick-Wolverton [1978] λ(t) = N α β t α-1 e – β t Gamma Yamada, Ohba, & Osaki [1983] λ(t) = N β α t α-1 e – β t Power Duane [1964] λ(t) = α β e – β t Logarithmic Musa-Okumoto [1975] λ(t) = α (α β t +1) – 1 α Total number of defect occurrences Increasing component, dominates when t is small Decreasing component, dominates when t is large

Model comparison ModelAIC Score Exponential model110 Power model113 Logarithmic model112 Gamma model90 Weibull model83 Months Defect occurrences

Conclusion: Weibull is better Has the best AIC score in 73% of the releases Is within the 95% C.I. of the best AIC score in 95% of the releases Is good despite differences in the type of system, style of development, and the kind of data

Now Months Defect occurrences Weibull = N α β t α-1 e – β t Part 2 : How to extrapolate model parameters? ? α

Parameter extrapolation methods No consideration of similarities and differences in characteristics between historical releases and current release. Tomcat 3.3 β : 15.4439 Tomcat 4.0 β : 16.8946 Moving averages (2 releases) estimate of Tomcat 4.1, β : 16.16925 Exponential smoothing (2 releases) estimate of Tomcat 4.1, β : 16.29725.5.41.59

Extrapolation process α=2.51 β=5.69 α=2.28 β=4.66 α=2.79 β=6.83 N known α projected β projected t1t1 t2t2 uninformed guess projected actual previous baseline difference projection difference Defect occurrences Now Months

Defect projection evaluation Releases/ System one release two releases three releases four releases five releases six releases seven releases Open source OS R2.8 1.060.70 Open source OS R2.9 1.320.931.04 Open source OS R3.0 0.870.420.430.44 Open source OS R3.1 0.720.700.730.710.73 Open source OS R3.2 0.760.910.870.990.971.02 Open source OS R3.3 1.561.100.850.860.66 0.57 Theil statistics for forecasting experiments using moving averages method

Conclusion: Naïve methods are inadequate In 50% of forecasting experiments, more information did not improve projections In 44% of forecasting experiments, Theil statistics are greater than or equal to 1 Methods that consider changes in characteristics of widely-deployed production software systems should be considered

Summary Results  Weibull model is the preferred model: May allow us to quantify effects of changes in characteristics by examining changes in parameter values  Naïve parameter extrapolation methods are inadequate: Motivates further work to capture and account for changes in characteristics to improve projections Accurate defect occurrence rate projections may aid better planning and may enabled software insurance

The end Questions, suggestions, comments Email: Paul.Li@cs.cmu.edu

The AIC model selection criterion Compares model fits with different number of parameters. Accounts for variance and bias. Follows a ~ X 2 (Chi-squared) distribution. 4 ~ 95% Confidence Interval. AIC = n log σ 2 + 2 |S| Number of observations Number of model parameters Residual standard error VarianceBias

The Theil forecasting statistic Parameter extrapolation method Historical releases: Current release: Theil forecasting statistic: √ (Σ(Actual – Predicted) 2 ) √( Σ(Actual) 2) A1A1 A2A2 Actual = (A2-A1) P2P2 Predicted = (P2-A1) Perfect forecast: P2 = A2 (Actual – Predicted) = ((A2-A1) – (P2-A1)) = ((A2-A1) – (P2-A1)) = ((A2-A1) – (A2-A1)) = 0 → Theil statistic of 0 P2P2 P2P2 Uninformed forecast: P2 = A1 (Actual – Predicted) = ((A2-A1) – (P2-A1)) = ((A2-A1) – (A1-A1)) = ((A2-A1) – 0) = ((A2-A1) – 0) = Actual → Theil statistic of 1 Special cases:

Empirical Evaluation of Defect Projection Models for Widely-deployed Production Software Systems FSE 2004 Paul Li, Mary Shaw, Jim Herbsleb Institute for.

Similar presentations

Presentation on theme: "Empirical Evaluation of Defect Projection Models for Widely-deployed Production Software Systems FSE 2004 Paul Li, Mary Shaw, Jim Herbsleb Institute for."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Empirical Evaluation of Defect Projection Models for Widely-deployed Production Software Systems FSE 2004 Paul Li, Mary Shaw, Jim Herbsleb Institute for.

Similar presentations

Presentation on theme: "Empirical Evaluation of Defect Projection Models for Widely-deployed Production Software Systems FSE 2004 Paul Li, Mary Shaw, Jim Herbsleb Institute for."— Presentation transcript:

Similar presentations

About project

Feedback