University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution Thomas Tan 26 th International Forum on Systems, Software, and COCOMO Cost Modeling November 2011
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution2 Research Overview Motivation: –Project risks early in the project lifecycle: Lack of project knowledge. Known project attributes may change later on. –These risks contribute to the Cone of Uncertainty effect. Lead to extremely unrealistic cost schedule. –However, many estimation methodologies use one-size-fit-all effort distribution. –Application domains: Easy to define for a project. Available early. Relatively stable throughout the project lifecycle. Phase/ActivitiesEffort % Plan and Requirement7 (2-15) Product Design17 Detailed Design27-23 Code and Unit Test37-29 Integration and Test19-31 Transition12 (0-20) Table 1: COCOMO II Waterfall Effort Distribution Percentages Figure 1: Cone of Uncertainty for Software Cost and Size Estimation
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution3 Research Overview Goal: –Study the impacts of application domains. –Apply the effects of application domains on effort distribution guideline. This work is part of the research study to provide better software estimation guideline: –This is an on-going research. –Based on data analysis of the government projects – the SRDR data. –Sponsored by the Air Force Cost Analysis Agency.
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution4 Research Overview We are investigating that if we can use application domains as illustrated in the following diagram: COCOMO II Model + Application Domain Extension for effort distribution Domain-based Effort Distribution Guideline Data Support Application Domain Size (KSLOC) Personnel Ratings Figure 2: Expected Extension using Application Domains
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution5 Research Overview Research plan: –Determine application domains. 22 application domains selected from US Air Force Cost Estimation Handbook and Mil-881 standard. Also referencing other studies. –Normalize the SRDR data. –Determine effort distribution patterns by application domains. Calculate average effort distribution percentages for each application domains. – Hoping that effort distribution patterns are different. Prove the differences between each domain and between domains and the COCOMO II model are statistically significant. –Study how system size and personnel ratings affect the effort distribution patterns for different domains. –Establish effort distribution guideline based on findings. –Integrate this guideline with the COCOMO II model.
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution6 Effort Distribution Definitions Matching the research data, we will only investigate the following activity groups: –Plan & requirements –Architecture & Design –Code and Unit Testing –Integration –Qualification Testing –Note: we will combine Integration and Qualification Testing to match the COCOMO II Waterfall phases. Effort activity groups definitions: –We will use the SRDR standard activities definitions: similar to that of COCOMO II model. –Adjusted COCOMO II model distribution averages: divided by 1.07 to ensure the sum of all averages is 100%. COCOMO II PhaseSRDR Activities Plan and requirementSoftware requirements analysis Product design and detail design Software architecture and detailed design Coding and unit testingCoding, unit testing Integration and qualification testing Software integration and system/software integration; Qualification/Acceptance testing Phase/ActivitiesEffort % Plan and Requirement6.5 Product Architecture & Design39.3 Code and Unit Testing30.8 Integration and Qualification Testing 23.4 Table 3: COCOMO II Waterfall Effort Distribution Percentages with adjustment Table 2: Mapping of SRDR Activities to COCOM II Phases
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution7 Data Processing Data Set: –A collection of projects data from DoD and other government agencies. –Data are extracted from SRDR Form , which is the final reports for each project after the project has been completed. –Data include effort, size, and other development parameters (such as language, process, staffing information, etc.). –Data issues: Missing data. Untrustworthy data patterns. Duplicated records. Lack of quality measures. –Need further data processing before analysis: Eliminate bad records. Normalize the data. Backfill effort data.
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution8 Data Processing Data Normalization –Evaluate all data records and eliminate those that we are unable to normalize (i.e. missing important fields, no definitions at all, and duplicated records). –Eliminate those with weird patterns that are likely results of bad reporting, i.e. those with huge effort and little size or vice versa. –Deal with missing effort fields (see detail in the data backfilling section on next couple slides). –Transform all data records into the same units of size, effort, and schedule. Calculate equivalent size for all records. –Use of DM, CM, and IM. Calculate effort and schedule total for all records. –Calculate personnel ratings.
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution9 Data Processing Backfilling Data: –Rationale: Backfilling is necessary to increase the number of usable data records. –Method: Approximative Non-Negative Matrix Factorization: Factorize subject data set (X) into two matrices: X W x H W and H are two random matrices whose dot product approximates data set X. Iteratively adjusting values in W and H using a simple approximation algorithm (use α, β as the adjustment factors). Exit iterations when error margin is smaller than a preset value (usually 0.01 or smaller). Also set the maximum number of iterations to stop the process if it goes too long (usually 5,000 to 10,000).
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution10 Data Processing Backfilling Data Sets: –Missing 2 Set: missing 1 to 2 values from the 5 activity groups. –Missing 3 Set: missing 3 values from the 5 activity groups. –Missing 4 Set: missing 4 values from the 5 activity groups. Developed a matrix factorization program in Matlab™: –Exit margin = –10,000 iterations. –α = , β = –Import and export CSV. Calculated error margins between backfilled and original data points. –Most backfilled values are within 10% of the original value (many are very small). –Few presents huge differences due to significant discrepancies in data patterns (very small in some activity while huge in another).
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution11 Data Processing Error comparing original and backfilled data points. –Error = (Backfilled – Original) / Original Table 4: Backfilled Error from “Missing 2” Set Original Backfilled Error DomainREQARCHCODEINTQT REQARCHCODEINTQT REQARCHCODEINTQT Mission Planning %-0.4%-1.9%-1.3%2.8% Mission Planning % %0.5%-2.0%-0.6% Mission Planning %-1.4%-0.9%-4.2%Inf Mission Planning %1.2%-4.2%0.2%0.8% Mission Planning %-0.7%-3.9%-0.4%-0.2% Mission Planning %-4.4%-0.6%-1.6%Inf Mission Planning Inf-1.8%-2.6%0.5%Inf Mission Planning %5.1%-3.2%-3.1%-2.6% Mission Planning %0.7%-3.0%1.2%937.2% Mission Planning %0.1%0.0%0.3%-5.2% Mission Planning %-1.5%1.9%-4.7%0.3% Mission Planning %-2.5%-0.7%2.4%Inf Mission Planning %-4.7%-0.5%2.3%-0.4%
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution12 Application Domain and Effort Distribution Records by application domains: Research Data Records Count Application DomainsMissing 4Missing 3Missing 2Perfect Set Business10665 Command & Control Communications Controls & Displays12773 Executive5111 Information Assurance1111 Infrastructure or Middleware13882 Maintenance & Diagnostics1111 Mission Management Mission Planning None1111 Process Control9440 Scientific Systems1111 Sensor Control and Processing Simulation & Modeling Spacecraft Bus1111 Spacecraft Payload1110 Test & Evaluation2211 Tool & Tool Systems5552 Training2210 Weapons Delivery and Control Total Table 5: Research Data Records Count
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution13 Application Domain and Effort Distribution Calculate percentages for each records Calculate average percentages for each domain
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution14 Application Domain and Effort Distribution
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution15 Application Domain and Effort Distribution From the plots of the calculated averages, we find the following: –Plan and requirement efforts are similar between all domains (not widely spreading). –Notable differences of efforts in coding and integration & qualification testing activity groups for all domains. –Obvious differences of efforts in architecture & design activities for all domains from the “Perfect” set results. Possible differences from the other data sets. –Allocating more efforts as project moves from plan & requirement activities to integration & qualification testing activities. –Results from the backfilled sets are similar. Although we can see clear differences between domains’ averages in the resulting plots, we still need statistical proof that the differences are not results of noise.
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution16 Application Domain and Effort Distribution Test 1: show if there is difference between application domains in term of effort distribution percentages. Test 1: Use simple ANOVA to test the following: –H0: effort distributions are same. –Ha: effort distributions are not all the same between domains. Test input is the list of effort percentages grouped by application domains. Test uses 90% confidence level to determine the significance of the results
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution17 Application Domain and Effort Distribution Test 1 Results: –The following table shows the results from two of the four testing data sets. –Results from “Missing 3” and “Missing 4” data sets are basically same as “Missing 2”. –The results indicates that domain effect is not significant in Plan & Requirements but active in all other activity groups (based on consensus that 3 data sets favors this result). –Based on this result, we can say that domains are different in effort distribution percentages. Activity Group “Perfect” Data Set “Missing 2” Data Set FP-Value ResultsFP-ValueResults Plan & Requirements Can’t Reject Reject Architecture & Design Reject Reject Code & Unit Testing Can’t Reject Reject Integration and Qualification Testing Reject Reject Table 6: Test 1 Results
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution18 Application Domain and Effort Distribution Test 2: show if there is difference between application domains averages and the COCOMO II effort distribution averages. Test 2: Use independent one-sample t-test to test the following: –H0: domain average is the same as COCOMO average. –Ha: domain average is not the same as COCOMO average. –Tests run for every domain on every activity group. Use the following formula to calculate T value in order to determine the result of the t-test: –where s is the standard deviation, n is the sample size, and µ 0 is the COCOMO average we used to compare against. Also uses 90% confidence level to determine the significance of the results.
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution19 Application Domain and Effort Distribution Test 2 Results: –Again, the results from “Missing 3” and “Missing 4” testing data sets are similar to the results of “Missing 2”. –Three out of four activity group show at least 50% of the domains different from COCOMO II model: tentatively enough for us to move on at this point. Activity Group COCOMO Averages “Perfect” Data Set“Missing 2” Data Set Plan & Requirements 6.5%All domains reject except Sensor Control and Simulation domains All domains reject except Sensor Control Architecture & Design 39.3%All domains reject except Sensor Control and Simulation domains All domains reject except Simulation Code & Unit Testing 30.8%No domains rejectOnly Mission Planning domain rejects Integration and Qualification Testing 23.4%Only Mission Management and Weapon Delivery domain rejects Communications, Mission Management, Sensor Control, and Weapons Delivery domains reject; other four domains do not Table 7: Test 2 Results
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution20 Application Domain and Effort Distribution System size effects over effort distribution: –COCOMO II model shows that effort distribution varies a little when system size grows. –For our study, we tried out simple experiment on system size to observe any effects on effort distribution: Tests on Command & Control and Communication domains. Divide projects into three sizing groups: –0 to 32 KSLOCs –33 to 128 KSLOCs –129+ KSLOCs Plot results to observe any patterns or trends in order to see any effects from system size to effort distribution. –Need to run similar experiment on all domains.
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution21 Application Domain and Effort Distribution Command & Control: –Observed decreasing trend in Requirements and increasing trend in Architecture: as project size grow, more effort is allocated to design. –No clear trend for coding and integration & testing effort. Communications: –Hard to find an obvious trend in all activity group. –Seems that none of the sizing groups produces the common distribution: growing from requirement to coding and then decreasing from coding to integration & testing.
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution22 Next Steps Go deeper with the system size experiment. Expand the experiment on system size for all subject domains and groups of domains. Use similar approach to study the effect of personnel ratings on effort distribution by application domains. Run the study using the Productivity Types (from Brad's research) and compare the results with the current one using domains. Combine the results of the studies to come up with a proposal of the domain-based effort distribution guideline. Design and integrate the application domain extension to the COCOMO II model.
University of Southern California Center for Systems and Software Engineering An Investigation on Domain-Based Effort Distribution23 Questions? For more information, contact: Thomas Tan