University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 1 A COCOMO Extension for Software Maintenance 25 th International Forum on COCOMO and Systems/Software Cost Modeling Vu Nguyen, Barry Boehm November 2 nd, 2010
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 2 Outline Motivation Problem and A Solution COCOMO Extension for SW Maintenance –Sizing method –Effort model Results –Data collection results –Calibrations Conclusions
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 3 Software Maintenance Work of modifying, enhancing, and providing cost-effective support to the existing software Characteristics of maintenance projects –Constrained by legacy system Quality of the system Requirements, architecture and design System understandability Documentation
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 4 Magnitude of Software Maintenance Majority of software costs incur after the first operational release [Boehm 1981] Software maintenance cost versus total software cost Maintenance vs. Total Software Cost Zelkowitz et al. (1979) McKee (1984) Moad (1990) Erlikh (2000) Studies % of Software Cost Others Maintenance
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 5 Importance of Software Estimation in Managing Software Projects Estimation is a key factor determining success or failure of software projects –Two out of three most-cited project failures are related to resource estimation, CompTIA survey [Rosencrance 2007] Cost estimate is key information for investment, project planning and control, etc. Many software estimation approaches have been proposed and used in industry –E.g., COCOMO, SEER-SEM, SLIM, PRICE-S, Function Point Analysis
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 6 Problem and Solution These models are built on the assumptions of new development projects Problem is that these assumptions do not always hold in software maintenance due to differences between new development and maintenance –Low estimation accuracies achieved Solution: Extending COCOMO II to support estimating maintenance projects Objective: Improving the estimation performance
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 7 COCOMO II for Maintenance An extension of COCOMO II –COCOMO is the non-proprietary most popular model –COCOMO has attracted many independent validations and extensions Designed to estimate effort of a software release Has two components –Maintenance Sizing Model –Effort Model Supports maintenance types –Enhancement –Error corrections
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 8 COCOMO II for Maintenance – Extensions Maintenance Sizing Model –Uniting Adaptation/Reuse and Maintenance models –Redefining size parameters DM, CM, and IM Using deleted SLOC from modified modules Method to determine actual equivalent SLOC from code Effort Model –Excluding RUSE and SCED cost drivers from the model –Revising rating levels for personnel attributes –Providing a reduced-parameter model –Providing a new set of rating scales for the cost drivers
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 9 Software Maintenance Sizing Size is a key determinant of effort Sizing method has to take into account different types of code Automatically translate External Modules Existing System Modules Reused Modules Adapted Modules New Modules Manually develop and maintain Automatically Translated Modules Preexisting CodeDelivered Code Types of Code
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 10 Software Maintenance Sizing (cont’d) Computing Equivalent SLOC: –New Modules: –Adapted Modules: –Reused Modules: –Total Equivalent KSLOC: : KSLOC of the adapted modules before changes : KSLOC of the reused modules
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 11 COCOMO Effort Model for Maintenance Using the same COCOMO II form, non-linear Where, PM – project effort measured in person-month A – a multiplicative constant, calibrated using data sample B – an exponent constant, calibrated using data sample Size – software size measured in EKSLOC EM – 15 effort multipliers, cost drivers that have an multiplicative effect on effort SF – 5 scale factors, cost drivers that have an exponential effect on effort Linearizing the model using log-transformation log(PM) = 0 + 1 log(Size) + i SF i log(Size) + j log(EM j )
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 12 Data Collection Delphi survey –Surveying experts about rating scales of cost drivers Sample data –Collecting data of completed maintenance projects from industry –Following inclusion criteria, e.g., Starting and ending dates are clear Include only major releases with Equivalent SLOC no less than 2000 SLOC Maintenance type: error corrections, enhancements Release N Project starts for Release N+1 Release N+1Project starts for Release N+2 Baseline 1 Baseline 2 Timeline Maintenance project N+1 Release Period
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 13 Calibration Process of fitting data to the model to adjust its parameters and constants Initial rating scales for cost drivers Sample data Delphi survey of 8 experts (Expert-judgment estimates) Model Calibration New rating scales for cost drivers and constants Calibration Techniques: - Ordinary Least Squares Regression (OLS) - Bayesian Analysis [Boehm 2000] - Constrained Regression [Nguyen 2008] 80 data points from 3 organizations
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 14 Data Collection Results Delphi Survey Results –8 surveys collected from experts in the field –Considerable changes seen in the personnel factors Productivity Ranges (PRs) Differences in PRs between COCOMO II.2000 and Delphi Results
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 15 Data Collection Results (cont’d) Sample data –86 releases in 24 programs (6 releases are outliers) Statistics Size (EKSLOC) Effort (PM) Schedule (Month) Average Median Max Min ESLOC Reused 7.5% ESLOC Added 31.8% ESLOC Adapted 60.7% Equivalent SLOC differs from SLOC of the delivered program Distribution of size metrics ReleasesSource 64A large organization member of CSSE Affiliates, USA 14A CMMI-L5 company, Vietnam 8A CMMI-L3 company, Thailand
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 16 Data Collection Results (cont’d) Distribution of Size and Effort PM vs. EKSLOC Log(PM) vs. Log(EKSLOC) Log(EKSLOC) Log(PM)
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 17 Model Calibrations Full model calibrations –Applying Bayesian and Constrained Regression –Using 80 data points (6 outliers eliminated) Local calibrations –Calibrating models into organizations and programs –Using four approaches productivity index, simple regression, Bayesian, constrained regression
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 18 Full Model Calibrations Bayesian approach –Productivity ranges indicate that APCAP is less influential than it is in COCOMO II.2000 CPLX is still the most influential PCAP is more influential than ACAP Productivity Ranges Differences in PRs between COCOM II.2000 and COCOMO II for Maint.
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 19 Full Model Calibrations (cont’d) ModelMMREPRED(0.25)PRED(0.3) COCOMO II %31%38% COCOMO II for Maintenance: Bayesian48%41%51% COCOMO II for Maintenance: CMRE37%56%60% COCOMO II for Maintenance: CMSE39%43%51% COCOMO II for Maintenance: CMAE42%54%58% Estimation accuracies –COCOMO II.2000: use the model to estimate 80 data points –COCOMO II for Maintenance: calibrated using Bayesian and Constrained regression approaches COCOMO II for Maintenance outperforms COCOMO II.2000 by a wide margin 58% Three Constrained Regression Techniques: CMRE: Constrained Minimum sum of Relative Errors CMSE: Constrained Minimum sum of Square Errors CMAE: Constrained Minimum sum of Absolute Errors
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 20 Local Calibration Local calibration potentially improving the performance of estimation models [Chulani 1999, Velerdi 2005] In local calibration, the model’s constants A and B estimated using local data sets Local calibration types –Organization-based All data points of each organization used to calibrate the model 3 organizations, 80 releases –Program-based All data points (releases) of each program Only programs having 5 or more releases Total 45 releases in 6 programs
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 21 Local Calibration (cont’d) Approaches to be compared –Productivity index Using the productivity of past projects to estimate the effort of the current project given size The most simple but widely used –Simple linear regression Building a simple regression model using log(PM) as the response and log(EKSLOC) as the predictor Widely used estimation approach –COCOMO II for Maintenance: Bayesian analysis –COCOMO II for Maintenance: CMRE
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 22 Local Calibration (cont’d) Organization-based calibration accuracies: 80 data points ModelMMREPRED(0.25)PRED(0.3) Productivity index44%40%48% Simple linear regression50%34%35% COCOMO II for Maintenance: Bayesian38%54%59% COCOMO II for Maintenance: CMRE34%62%64% ModelMMREPRED(0.25)PRED(0.3) Productivity index27%53%64% Simple linear regression25%64%69% COCOMO II for Maintenance: Bayesian22%71%80% COCOMO II for Maintenance: CMRE21%72%79% Program-based calibration accuracies: 45 data points
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 23 Conclusions A model for sizing maintenance and reuse is proposed A set of cost drivers and levels of their impact on maintenance cost are derived Deleted SLOC is an important maintenance cost driver The extension is more favorable than the productivity index and simple linear regression Organization-based and program-based calibrations improve estimation accuracy –Best model generates estimates within 30% of the actuals 80% of the time
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 24 Threats to Validity Threats to Internal Validity –Unrecorded overtime not included in actual effort reported –Various counting tools used in the US organization –Reliability of the data reported from the organizations Threats to External Validity –Bias in the data set: data from the three organizations may not be relevant to the general software industry –Bias in the selection of participants for the Delphi survey
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 25 Future Work Calibrate the model with more data points from industry Build domain-specific, language-specific, or platform- specific model Survey a more diverse group of experts, not only those who are familiar with COCOMO Extend the model to other types of maintenance –reengineering, language and data migration, performance improvement, etc. Extend the model to support effort estimation of iterations in iterative development
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 26 Thank You
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 27 References – 1/2 Abran A., Silva I., Primera L. (2002), "Field studies using functional size measurement in building estimation models for software maintenance", Journal of Software Maintenance and Evolution, Vol 14, part 1, pp Abran A., St-Pierre D., Maya M., Desharnais J.M. (1998), "Full function points for embedded and real-time software", Proceedings of the UKSMA Fall Conference, London, UK, 14. Albrecht A.J. (1979), “Measuring Application Development Productivity,” Proc. IBM Applications Development Symp., SHARE-Guide, pp Basili V.R., Condon S.E., Emam K.E., Hendrick R.B., Melo W. (1997) "Characterizing and Modeling the Cost of Rework in a Library of Reusable Software Components". Proceedings of the 19th International Conference on Software Engineering, pp Boehm B.W. (1981), “Software Engineering Economics”, Prentice-Hall, Englewood Cliffs, NJ, Boehm B.W. (1999), "Managing Software Productivity and Reuse," Computer 32, Sept., pp Boehm B.W., Horowitz E., Madachy R., Reifer D., Clark B.K., Steece B., Brown A.W., Chulani S., and Abts C. (2000), “Software Cost Estimation with COCOMO II,” Prentice Hall. Briand L.C. & Basili V.R. (1992) “A Classification Procedure for an Effective Management of Changes during the Software Maintenance Process”, Proc. ICSM ’92, Orlando, FL Chulani S. (1999), "Bayesian Analysis of Software Cost and Quality Models", PhD Thesis, the University of Southern California. Port D., Nguyen V., Menzies T., (2009) “Studies of Confidence in Software Cost Estimation Research Based on the Criterions MMRE and PRED.” Submitted to Journal of Empirical Software Engineering De Lucia A., Pompella E., Stefanucci S. (2003), “Assessing the maintenance processes of a software organization: an empirical analysis of a large industrial project”, The Journal of Systems and Software 65 (2), 87–103. Erlikh L. (2000). “Leveraging legacy system dollars for E-business”. (IEEE) IT Pro, May/June, Gerlich R., and Denskat U. (1994), "A Cost Estimation Model for Maintenance and High Reuse, Proceedings," ESCOM 1994, Ivrea, Italy. IEEE (1998) IEEE Std , Standard for Software Maintenance, IEEE Computer Society Press, Los Alamitos, CA.
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 28 References – 2/2 Jorgensen M. (1995), “Experience with the accuracy of software maintenance task effort prediction models”, IEEE Transactions on Software Engineering 21 (8) 674–681. McKee J. (1984). “Maintenance as a function of design”. Proceedings of the AFIPS National Computer Conference, Moad J. (1990). “Maintaining the competitive edge”. Datamation 61-62, 64, 66. Niessink F., van Vliet H. (1998), “Two case study in measuring maintenance effort”, Proceedings of International Conference on Software Maintenance, Bethesda, MD, USA, pp. 76–85. Ramil J.F. (2003), “Continual Resource Estimation for Evolving Software," PhD Thesis, University of London, Imperial College of Science, Technology and Medicine. Nguyen V., Deeds-Rubin S., Tan T., Boehm B.W. (2007), “A SLOC Counting Standard,” The 22nd International Annual Forum on COCOMO and Systems/Software Cost Modeling. Nguyen V., Steece B., Boehm B.W. (2008), “A constrained regression technique for COCOMO calibration”, Proceedings of the 2nd ACM-IEEE international symposium on Empirical software engineering and measurement (ESEM), pp Nguyen V., Boehm B.W., Danphitsanuphan P. (2009), “Assessing and Estimating Corrective, Enhancive, and Reductive Maintenance Tasks: A Controlled Experiment.” In Proceedings of 16th Asia-Pacific Software Engineering Conference (APSEC 2009), Dec. Nguyen V., Boehm B.W., Danphitsanuphan P. (2010), “A Controlled Experiment in Assessing and Estimating Software Maintenance Tasks”, APSEC Special Issue, Information and Software Technology Journal, Sneed H.M., (1995), "Estimating the Costs of Software Maintenance Tasks," IEEE International Conference on Software Maintenance, pp Rosencrance L. (2007), "Survey: Poor communication causes most IT project failures," Computerworld Selby R. (1988), Empirically Analyzing Software Reuse in a Production Environment, In Software Reuse: Emerging Technology, W. Tracz (Ed.), IEEE Computer Society Press, pp Sneed H.M., (2004), "A Cost Model for Software Maintenance & Evolution," IEEE International Conference on Software Maintenance, pp Symons C.R. (1988) "Function Point Analysis: Difficulties and Improvements," IEEE Transactions on Software Engineering, vol. 14, no. 1, pp Valerdi R. (2005), "The Constructive Systems Engineering Cost Model (Cosysmo)", PhD Thesis, The University of Southern California. Zelkowitz M.V., Shaw A.C., Gannon J.D. (1979). “Principles of Software Engineering and Design”. Prentice-Hall
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 29 Backup Slides
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 30 Abbreviations COCOMOConstructive Cost Model COCOMO IIConstructive Cost Model version II CMMICapability Maturity Model Integration EMEffort Multiplier PMPerson Month OLSOrdinary Least Squares MSEMean Square Error MAEMean Absolute Error CMSEConstrained Minimum Sum of Square Errors CMAEConstrained Minimum Sum of Absolute Errors CMREConstrained Minimum Sum of Relative Errors MMREMean of Magnitude of Relative Errors MREMagnitude of Relative Errors PREDPrediction level ICMIncremental Commitment Model PRProductivity Range SFScale Factor
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 31 Model Parameter Abbreviations AAAssessment and Assimilation AAFAdaptation Adjustment Factor AAMAdaptation Adjustment Multiplier AKSLOCKilo Source Lines of Code of the Adapted Modules CMCode Modified DMDesign Modified EKSLOCEquivalent Kilo Source Lines of Code ESLOCEquivalent Source Lines of Code IMIntegration Modified KSLOCKilo Source Lines of Code RKSLOCKilo Source Lines of Code of the Reused Modules SLOCSource Lines of Code SUSoftware Understanding UNFMProgrammer Unfamiliarity ACAPAnalyst Capability APEXApplications Experience CPLXProduct Complexity DATADatabase Size DOCUDocumentation Match to Life-Cycle Needs FLEXDevelopment Flexibility LTEXLanguage and Tool Experience PCAPProgrammer Capability PCONPersonnel Continuity PERSPersonnel Capability PLEXPlatform Experience PMATEquivalent Process Maturity Level PRECPrecedentedness of Application PREXPersonnel Experience PVOLPlatform Volatility RELYRequired Software Reliability RESLRisk Resolution SITEMultisite Development STORMain Storage Constraint TEAMTeam Cohesion TIMEExecution Time Constraint TOOLUse of Software Tools
University of Southern California Center for Systems and Software Engineering © 2010, USC-CSSE 32 Model Accuracy Measures Magnitude of relative error (MRE) Mean of MRE (MMRE) Prediction Level: PRED(l) = k/N –k is the number of estimates with MRE ≤ l –Commonly used PRED(0.30) and PRED(0.25)