smi COCOMO II Calibration Status USC-CSE Annual Research Review March 2004
smi USC-CSE Annual Research Review - March A Little History Calibration effort started in January 2002 Confusion –Repository in an inconsistent state –“Uncharacterized” data from many sources –Process for duplicating the 2000 calibration results –Schedule compression rating was inconsistent Expectation –New data had a lot of variation but… –Affiliates (and the user population in general) want an “Accurate” and up-to-date model – not just one that explained variation PRED(.25) versus R 2
smi USC-CSE Annual Research Review - March Change in Approach Removed pre-1990 data from dataset used in calibration –This removed a lot of “converted” data Removed “bad” data –Incomplete: No duration data, estimated effort, no valid SLOC size Still use the Bayesian calibration approach developed by Chulani Changed to a holistic analysis approach: considered effort and duration together –Identified data that needed review –Schedule compression was automatically set
smi USC-CSE Annual Research Review - March Effort- Duration Error
smi USC-CSE Annual Research Review - March Effort- Duration Error Interpretation Effort EstimatesDuration Estimates Data Validation / Interpretation Under-estimated Actual size data is too small due to reuse modeling Actual error and duration included lifecycle phases not in the model Difficult, low productivity projects Under-EstimatedOver-EstimatedSchedule Compression required Over-estimatedUnder-estimatedFixed-staffing levels Project slow-down Schedule Stretch-out Over-estimated Actual data is too large due to physical SLOC count, reuse modeling Actual effort and duration cover fewer lifecycle phases than estimated Easy, high productivity
smi USC-CSE Annual Research Review - March Effort Estimate Error Compared to Size
smi USC-CSE Annual Research Review - March Duration Estimate Error Compared to Size
smi USC-CSE Annual Research Review - March Preliminary Results 89 project data Effort Estimation Accuracy –PRED(.30) = 92.1% (was 75% for 2000 calibration) –PRED(.25) = 88.8% (was 68%) –PRED(.20) = 84.3% (was 63%) Duration Estimation Accuracy –PRED(.30) = 82.0% (was 64% for 2000 calibration) –PRED(.25) = 70.8% (was 55%) –PRED(.20) = 64.0% (was 50%) 65 more project data to review
smi USC-CSE Annual Research Review - March Next Steps in Calibration Review and incorporate the remaining outstanding data Beta test the new driver values with some of the Affiliates Assess the need to elaborate (and possibly expand) the definition for Schedule Compression Attempt to develop “pre-sets” for different types of applications
smi USC-CSE Annual Research Review - March How Should the Model Handle Level-loaded staffing?
smi USC-CSE Annual Research Review - March Are Compression / Stretch-out Handled Adequately? Should there be an Extra Low rating? Should there be ratings for stretch-out other than 1.0
smi COCOMO II Driver Elaboration
smi USC-CSE Annual Research Review - March Motivation Recent experience has shown that engineers have difficulty in rating some of the COCOMO II Drivers Researchers have documented impact of COCOMO Drivers’ subjectivity Calibration data appears to show some differences in the interpretation of COCOMO II Driver ratings Our goal is to address these problems
smi USC-CSE Annual Research Review - March Examples of Questions DATA (DB bytes / Pgm SLOC) < D/P < D/P < 1000 D/P > 1000 RELY slight inconvenience low, easily recoverable losses Moderate, easily recoverable losses high financial loss risk to human life Programmer Capability – What is the 75 th percentile? Required Software Reliability – What if my application is not financial? Database Size – Are you mixing two different units of measure?
smi USC-CSE Annual Research Review - March Elaborating Cost Driver Workshop Goals No new math (elaboration not re-definition) No scare-factor (not too many inputs) –Something behind the curtain –Gradual unfolding More comprehensible vocabulary –Consider wider range of application users –Applicable to Business and Military Make it easier to use –Eye-ball average, optional weighted scoring Make it less subjective –Crisper definitions
smi USC-CSE Annual Research Review - March Driver “Subjectivity” Ranking Each driver was rated (1 to 10) for it’s level of subjectivity by participants. Ratings: 1 = Very Objective, 10 = Very Subjective
smi USC-CSE Annual Research Review - March Status Held a workshop in October 2003 on identifying drivers that need further elaboration Lack of progress –Need an effective collaboration technique that is not moderator intensive –Calibration work has taken precedence Possible collaboration solution exists by using same technology USC Software Engineering classes use for submission of their work. Calibration work is winding down (I think…) Expect soon on the startup of the Driver Elaboration working group
smi USC-CSE Annual Research Review - March For more information, requests or questions Brad Clark Software Metrics, Inc. Ye Yang USC-CSE