Presentation is loading. Please wait.

Presentation is loading. Please wait.

Empirical Evaluation of Defect Projection Models for Widely-deployed Production Software Systems FSE 2004 Paul Li, Mary Shaw, Jim Herbsleb Institute for.

Similar presentations


Presentation on theme: "Empirical Evaluation of Defect Projection Models for Widely-deployed Production Software Systems FSE 2004 Paul Li, Mary Shaw, Jim Herbsleb Institute for."— Presentation transcript:

1 Empirical Evaluation of Defect Projection Models for Widely-deployed Production Software Systems FSE 2004 Paul Li, Mary Shaw, Jim Herbsleb Institute for Software Research Intl., School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Bonnie Ray, P. Santhanam Center for Software Engineering IBM T.J. Watson Research Center Hawthorne, NY 10532

2 Overview one Defect occurrences are problems Methods that deal with the economic consequences require accurate defect occurrence rate projections Defect occurrence rate projection for widely-deployed production software systems has novel problems

3 Overview two We have two empirical results that can help defect occurrence rate projections:  Part 1: The Weibull model is better than other previous published models  Part 2: Naïve parameter extrapolation methods that do not consider changes in characteristics are inadequate

4

5 The real world problem Methods to deal with the economic consequences:  Maintenance resource allocation  Service contracts  Software insurance All require accurate defect occurrence rate projections:  Projections of the rate of user reported defect occurrences after a software release becomes available for each distinct release

6 The context Widely-deployed production software systems:  Many software and hardware configurations in use  Unknown deployment and usage patterns  Constrained development process  Evolving contents over time over multiple releases

7 The research problem How do you predict the rate of defect occurrences? Now Months Defect occurrences Release N+1 Release N Release N-1 Release N-2

8 Given this model, how can we predict model parameters for the next release? Now Months Defect occurrences The research questions Is there a model that describes the defect occurrence pattern? ? ?

9 The research approach In the context of widely-deployed production software:  Perform analysis to develop hypotheses concerning models/methods  Use real world data to empirically test hypotheses

10 The data User-reported defects in 22 releases of four widely-deployed productions software systems:  8 releases of a commercial operating system  3 releases of a commercial middleware system  8 releases of an open source operating system (OpenBSD)  3 releases of an open source middleware system (Jakarta Tomcat)

11 Relation to prior work Software reliability modeling and software certification:  Assume software and hardware configurations and deployment and usage patterns are known Total number of defects prediction and defect prone module identification:  Produce results that are insufficient for maintenance planning and software insurance No work on projecting defect occurrence rates for open source software systems

12 Part 1: which model to use? Now Months Defect occurrences ?

13 Previously published models Model typeModel shapeModel form Exponential Goel & Okumoto [1979] λ(t) = N α e – α t Weibull Schick-Wolverton [1978] λ(t) = N α β t α-1 e – β t Gamma Yamada, Ohba, & Osaki [1983] λ(t) = N β α t α-1 e – β t Power Duane [1964] λ(t) = α β e – β t Logarithmic Musa-Okumoto [1975] λ(t) = α (α β t +1) – 1 α Total number of defect occurrences Increasing component, dominates when t is small Decreasing component, dominates when t is large

14 Model comparison ModelAIC Score Exponential model110 Power model113 Logarithmic model112 Gamma model90 Weibull model83 Months Defect occurrences

15 Conclusion: Weibull is better Has the best AIC score in 73% of the releases Is within the 95% C.I. of the best AIC score in 95% of the releases Is good despite differences in the type of system, style of development, and the kind of data

16 Now Months Defect occurrences Weibull = N α β t α-1 e – β t Part 2 : How to extrapolate model parameters? ? α

17 Parameter extrapolation methods No consideration of similarities and differences in characteristics between historical releases and current release. Tomcat 3.3 β : 15.4439 Tomcat 4.0 β : 16.8946 Moving averages (2 releases) estimate of Tomcat 4.1, β : 16.16925 Exponential smoothing (2 releases) estimate of Tomcat 4.1, β : 16.29725.5.41.59

18 Extrapolation process α=2.51 β=5.69 α=2.28 β=4.66 α=2.79 β=6.83 N known α projected β projected t1t1 t2t2 uninformed guess projected actual previous baseline difference projection difference Defect occurrences Now Months

19 Defect projection evaluation Releases/ System one release two releases three releases four releases five releases six releases seven releases Open source OS R2.8 1.060.70 Open source OS R2.9 1.320.931.04 Open source OS R3.0 0.870.420.430.44 Open source OS R3.1 0.720.700.730.710.73 Open source OS R3.2 0.760.910.870.990.971.02 Open source OS R3.3 1.561.100.850.860.66 0.57 Theil statistics for forecasting experiments using moving averages method

20 Conclusion: Naïve methods are inadequate In 50% of forecasting experiments, more information did not improve projections In 44% of forecasting experiments, Theil statistics are greater than or equal to 1 Methods that consider changes in characteristics of widely-deployed production software systems should be considered

21 Summary Results  Weibull model is the preferred model: May allow us to quantify effects of changes in characteristics by examining changes in parameter values  Naïve parameter extrapolation methods are inadequate: Motivates further work to capture and account for changes in characteristics to improve projections Accurate defect occurrence rate projections may aid better planning and may enabled software insurance

22 The end Questions, suggestions, comments Email: Paul.Li@cs.cmu.edu

23 The AIC model selection criterion Compares model fits with different number of parameters. Accounts for variance and bias. Follows a ~ X 2 (Chi-squared) distribution. 4 ~ 95% Confidence Interval. AIC = n log σ 2 + 2 |S| Number of observations Number of model parameters Residual standard error VarianceBias

24 The Theil forecasting statistic Parameter extrapolation method Historical releases: Current release: Theil forecasting statistic: √ (Σ(Actual – Predicted) 2 ) √( Σ(Actual) 2) A1A1 A2A2 Actual = (A2-A1) P2P2 Predicted = (P2-A1) Perfect forecast: P2 = A2 (Actual – Predicted) = ((A2-A1) – (P2-A1)) = ((A2-A1) – (P2-A1)) = ((A2-A1) – (A2-A1)) = 0 → Theil statistic of 0 P2P2 P2P2 Uninformed forecast: P2 = A1 (Actual – Predicted) = ((A2-A1) – (P2-A1)) = ((A2-A1) – (A1-A1)) = ((A2-A1) – 0) = ((A2-A1) – 0) = Actual → Theil statistic of 1 Special cases:


Download ppt "Empirical Evaluation of Defect Projection Models for Widely-deployed Production Software Systems FSE 2004 Paul Li, Mary Shaw, Jim Herbsleb Institute for."

Similar presentations


Ads by Google