Download presentation
Presentation is loading. Please wait.
Published byGriffin Gray Modified over 9 years ago
1
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) My Dad Is Bigger Than Your Dad: Analysis Of Productivity Ranges In Popular Software Cost Estimation Models Using Stratified Data Dan Strickland Software SME, Research Division 16 Nov 2015 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited.
2
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 2
3
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 3 Overview Simple Model Calibration With Data Aerospace Database Putnam Model -Performance Overall -Performance With Boundaries SEER/Sage Model -Performance Overall -Performance With Boundaries COCOMO II Model -Performance Overall -Performance With Boundaries Paired Data Results What Does It All Mean? Future Research
4
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 4 Software Estimating Models In the Feb 2006 CrossTalk article “Software Estimating Models: Three Viewpoints”, three popular software cost estimation models (Sage/SEER, Putnam, COCOMO II) are described in their base mathematical forms All three models calculate effort using size and productivity Two models (Putnam, SEER) also use development time as a factor to calculate Productivity is expressed as software output over software input, usually in SLOC/hr or SLOC/PM; in calibration, estimators are looking for productivity Putnam model on calculating productivity: “From historic projects, we know the size, effort, and schedule…just put in a consistent set of historic numbers and calculate a Process Productivity Parameter.” Is it really that simple to calculate Productivity? Given a database of completed projects and the three default models, can reliable productivity ranges be developed in different development strata?
5
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 5 Aerospace Database Published in 2004 by Gayek, Long, et al, the Aerospace “Software Cost and Productivity Model” is an analysis of over 450 completed Aerospace, DoD, and space software programs Data was normalized for size, effort, and duration and stratified by Operating Environment, Application Domain, Programming Language, etc. When normalized, there are 181 records of Computer Software Configuration Item (CSCI) level software development with effort (in hours) and duration (in months) representing Preliminary Design through Integration and Testing
6
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 6 Aerospace Database Strata Operating Environments Military GroundGround-based, mission critical software; usually a fixed site Military MobileIn a vehicle such as a truck, trailer, or ship Mil-Spec AvionicsOperation of an aircraft or similar vehicle Unmanned SpaceSimilar to Avionics, but may be satellite or UAV Application Domains Command & ControlNetwork monitoring, network control, mission control, command processing DatabaseCollects, stores, and organizes information; database generation and management Mission Planning Supports mission planning like aircraft mission planning, scenario generation, route planning Operating System/ Executive Controls basic hardware functions and serves as the platform for other applications to run Signal Processing Enhances, transforms, filters, and converts data signals; Complex algorithms for sonar, radar, and flight systems SimulationEnvironment simulation, system simulation, network simulation, system reliability SupportSoftware used to aid the development, testing, and support of other applications TestSoftware used specifically for testing and evaluating software and hardware systems
7
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 7 Putnam Model Assume ESLOC = SLOC Effort (PY) = Labor Effort / 12 Schedule (yrs) = Develop Schedule / 12
8
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 8 Putnam Model – Results Several PP values below the minimum observed values by Putnam Example (Mission Plan, Median PP): Effort (PY) = 15 * * (td min ) 3 td min (years) = 0.68 * (Size / 4222.6) 0.43
9
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 9 Putnam Model – Accuracy Formula used to compare estimate vs. actual from Aerospace data Mean Magnitude of Relative Error (MMRE) measures the average relative error of all the predictions to their actuals, independent of scale and sign – lower is better Prediction Level (PRED) measures the percentage of all the predictions that fall within a defined error bounds of the actual, here we used PRED(30) or within 30% – higher is better Overall, the Putnam model using either the mean or median productivity values is only accurate in a few strata (Database, Unmanned Space) What about those data points that had calculated PP outside of the published limits for Putnam? What happens when those data points are thrown out?
10
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 10 Putnam Model – In-Bounds Results Removed 55 data points with calculated PP values below threshold aka “out of bounds”
11
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 11 Putnam Model – In-Bounds Accuracy Removing the data points with “out of bounds” Productivity calculations has little effect on the accuracy Some values go up, some go down Database and Unmanned Space continue to have good performance
12
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 12 SEER Model Basic Equation: C te = (Size)/(K) 1/2 Schedule -Size in ESLOC -K is software life-cycle effort in person-years -Schedule is the development time in years -C te is the Effective Technology Constant ranges from 2.7 to 22,184.1 -C te increases, effort (cost) decreases -Software development effort is 0.3945 of the total life-cycle effort (K) Using the Aerospace database, solve for C te and observe stratified results Can a Productivity value be developed that produces accurate results? -Effort (PM) = ((Size / (C te *Schedule)) 2 (0.3945) * 12 ESLOC = ESLOC K (PY) = Labor Effort/(12)(0.3945) Schedule (yrs) = Develop Schedule / 12
13
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 13 SEER Model – Results All calculated C te values are “in-bounds” with model thresholds Example (Mission Plan, Median C te ): Effort (PM) = ((Size / (2695 * Schedule)) 2 * (0.3945) * 12
14
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 14 SEER Model – Accuracy Using the mean or median C te values, SEER model performs poorly for most strata Performs well in the Mission Planning Application Domain and Java-based developments Difficult to remove any data points as all the calculated C te values are “in-bounds” Is there another value at work in the SEER model that can indicate possible candidates for removal?
15
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 15 SEER Model – Calculating D The Staffing Complexity factor, D, represents the difficulty on terms of the rate at which staff can be added to a software product D ranges from 4 to 28, where higher values equate to very complex software that is difficult to staff (missile algorithms) and lower values equate to simple software that can be broken up and staffed easily (data entry) D is interactive with the schedule and effort Formula: D = K/(Schedule) 3 Calculate D for the database and remove the values that are out of bounds
16
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 16 SEER Model – In-Bounds Accuracy Staffing Complexity boundaries remove 42 data points Removing the data points with “out of bounds” Staffing Complexity values has little effect on the accuracy Some values go up, some go down, Median predictions trend upward Mission Planning continues to perform well, but Java data points are completely out of bounds
17
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 17 COCOMO II Model Basic Equation: Effort (PM) = 2.94 * (EAF) (Size) E -Size in KESLOC -EAF is the Effort Adjustment Factor, used to calculate productivity -E is the exponential scaling factor; default value of 1.0997 -EAF ranges from 0.0569 to 80.8271 -As EAF increases, effort (cost) increases Using the Aerospace database, solve for EAF and observe stratified results Size = ESLOC / 1000 Effort = Labor Effort Schedule is not used in the equation
18
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 18 COCOMO II – Results All calculated EAF values are “in-bounds” with model thresholds Example (Mission Plan, Median EAF): Effort = 2.94 * (1.88) (Size) 1.0997
19
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 19 COCOMO II – Accuracy COCOMO II performs average to good for many strata using mean and median EAF values Performs well in the Command and Control and Database Application Domains, Java-based developments, and Military Mobile Operating Environment Difficult to remove any data points as all the calculated EAF values are “in-bounds” Is there another value in the COCOMO II model that can indicate possible candidates for removal?
20
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 20 COCOMO II Exponent The effort equation’s scaling exponent is also used in the COCOMO II Schedule Equation -Schedule = 3.67 * (Effort) F -F = 0.28 + 0.2 * (E – 0.91) Where the COCOMO II effort equation does not use schedule as an input, the data in the Aerospace database could be used to solve for E -E = [(ln(Schedule/3.67)/ln(Effort)) - 0.098)] / 0.2 E ranges from 0.91 to 1.2262 Calculate E from the Schedule and Effort values of the Aerospace database and remove values that are out of bounds
21
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 21 COCOMO II Model – In-Bounds Accuracy Scaling Exponent boundaries remove 72 data points Calculating the Scaling Exponent from Schedule and Effort and removing the data points with “out of bounds” values eliminates almost all accuracy Prediction values almost universally drop to unusable values
22
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 22 Paired Data A popular method with this type of database is stratifying by Operating Environment and Application Domain together to analyze data As more strata are introduced, less values are available What is the accuracy in terms of PRED(30) of the mean/median productivity calculations of paired data for both full and “in-bounds” data sets? Key assumption: Need at least 5 data points in a paired strata to be applicable
23
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 23 Putnam Paired Data Performance Unbounded: In-Bounds: Little difference in overall effect Some pairs accurate in unbounded data and some pairs accurate in “in-bounds” data
24
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 24 SEER Paired Data Performance Unbounded: In-Bounds: Unbounded data has a few productive pairs Using “in-bounds” data improves overall accuracy with more data pairs having higher PRED(30) scores
25
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 25 COCOMO II Paired Data Performance Unbounded: In-Bounds: Unbounded data has many productive pairs with high PRED(30) values Using “in-bounds” data eliminates accuracy with no data pairs having high PRED(30) values
26
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 26 What Does It All Mean? Putnam Model: –Default model has some accuracy in certain areas using mean/median Productivity values –Bounding the data by Productivity thresholds has little effect –Paired Operating Environments and Application Domains are limited in accuracy SEER Model: –Default model has high prediction accuracy in only a few areas –Bounding the data by Staffing Complexity has limited effect –Paired, bounded data has the highest overall prediction accuracy for the model COCOMO II Model: –Default model has the best overall prediction accuracy –Bounding the data by Schedule-calculated Scaling Exponent produces poor results –Paired, unbounded data has some excellent prediction accuracy values Caveats: –No calibration tools or regression analysis was performed –No “fine-tuning” of the Productivity values was possible Calculating the average productivity variable for a model with stratification does not always produce credible results, but in certain cases, quick estimates can be made with limited information Pairing strata is very beneficial in some models Calculated productivity ranges and resulting performance can be used to develop and evaluate software cost models
27
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 27 Future Research Software Requirements Data Reports (SRDR) performance in models with/without boundaries Performance against tool-calibrated model changes (Calico, SEER calibration, etc.) Optimal schedule vs. actual schedule performance
28
CP – Cost Analytics and Parametric Estimation Directorate UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15) 28 Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.