Download presentation
Presentation is loading. Please wait.
1
University of Southern California Center for Systems and Software Engineering 1 © USC-CSSE A Constrained Regression Technique for COCOMO Calibration Presented by Vu Nguyen On behalf of Vu Nguyen, Bert Steece, Barry Boehm {nguyenvu, berts, boehm}@usc.edu
2
University of Southern California Center for Systems and Software Engineering 2 © USC-CSSE Outline Introduction –Multiple Linear Regression –OLS, Stepwise, Lasso, Ridge Constrained Linear Regression Validation and Comparison –COCOMO overview –Cross validation Conclusions Limitations Future Work
3
University of Southern California Center for Systems and Software Engineering 3 © USC-CSSE Introduction Building software estimation models is a search problem to find the best possible parameters that –generate high prediction accuracy –satisfy predefined constraints
4
University of Southern California Center for Systems and Software Engineering 4 © USC-CSSE Multiple Linear Regression Multiple linear regression is presented as y i = 0 + 1 x i1 +…+ k x ik + i, i = 1,2,…, n Where, – 0, 1,…, k are the coefficients –n is the number of observations –k is the number of variables –x ij is the value of the variable jth for the ith observation –y i is the response of the ith observation
5
University of Southern California Center for Systems and Software Engineering 5 © USC-CSSE Ordinary Least Squares OLS is the most common method to estimate coefficients 0, 1,…, k OLS estimates coefficients by minimizing the sum of squared errors (SSE) –Minimize – is the estimate of ith observation
6
University of Southern California Center for Systems and Software Engineering 6 © USC-CSSE Some Limitations of OLS Highly sensitive to outliers Low bias but high variance (e.g., caused by collinearity or overfitting) Unable to constrain the estimates of coefficients –Estimated coefficients may be counter-intuitive –Example, OLS coefficient estimate for RUSE is negative, e.g., increase RUSE rating results in a decrease in effort Develop for Reuse (RUSE) OLS estimates Expected values
7
University of Southern California Center for Systems and Software Engineering 7 © USC-CSSE Some Other Approaches Stepwise (forward selection) –Start with no variable and gradually add variables until “optimal” solution is achieved Ridge –Minimize SSE and impose a penalty on sum of squared coefficients Lasso –Minimize SSE and impose a penalty on sum of absolute coefficients Low bias High variance High bias Low variance Training sample Testing sample
8
University of Southern California Center for Systems and Software Engineering 8 © USC-CSSE Outline Introduction –Multiple Linear Regression –OLS, Stepwise, Lasso, Ridge Constrained Linear Regression Validation –COCOMO overview –Cross validation Conclusions Limitations Future Work
9
University of Southern California Center for Systems and Software Engineering 9 © USC-CSSE Constrained Regression Principles –Use optimization paradigm: optimizing objective function with constraint Minimize f(y, X) subject to cf(z) –Impose constraints on coefficients and relative error –Expect to reduce variance by reducing the number of variables (variance and bias tradeoff)
10
University of Southern California Center for Systems and Software Engineering 10 © USC-CSSE Constrained Regression (cont) General form Minimize subject to Constrained Minimum Sum of Squared Errors (CMSE) Constrained Minimum Sum of Absolute Errors (CMAE) Constrained Minimum Sum of Relative Errors (CMRE)
11
University of Southern California Center for Systems and Software Engineering 11 © USC-CSSE Solve the Equations Solving the equations is an optimization problem –CMSE: quadratic programming –CMRE and CMAE: transformed to the form of linear programming –We used lpsolve and quadprog packages in R Determine parameter c using cross-validation 0.4 0.5 0.6 0.7 0.8 0.40.60.811.21.4 c values PRED(0.3) CMSE CMAE CMRE
12
University of Southern California Center for Systems and Software Engineering 12 © USC-CSSE Outline Introduction –Multiple Linear Regression –OLS, Stepwise, Lasso, Ridge Constrained Linear Regression Validation and comparison –COCOMO overview –Cross validation Conclusions Limitations Future Work
13
University of Southern California Center for Systems and Software Engineering 13 © USC-CSSE Validation and Comparison Two COCOMO datasets –COCOMO 2000: 161 projects –COCOMO 81: 63 projects Comparing with popular model building approaches –OLS –Stepwise –Lasso –Ridge Cross-validation –10-fold cross validation
14
University of Southern California Center for Systems and Software Engineering 14 © USC-CSSE COCOMO Cost Constructive Model (COCOMO) first published in 1981 –Calibrated using 63 projects (COCOMO 81 dataset) –Uses SLOC as a size measure and 15 cost drivers COCOMO II published in 2000 –Reflects changes in technologies and practices –Uses 22 cost drivers plus size measure –Introduces 5 scale factors –Calibrated using 161 data points (COCOMO II dataset)
15
University of Southern California Center for Systems and Software Engineering 15 © USC-CSSE COCOMO Overview (cont) COCOMO Effort Equation, non-linear Linearize the model using log-transformation –COCOMO 81 log(PM) = 0 + 1 log(Size) + 2 log(EM 1 ) + … + 16 log(EM 15 ) –COCOMO II log(PM) = 0 + 1 log(Size) + i SF i log(Size) + j log(EM j ) Estimate coefficients using a linear regression method
16
University of Southern California Center for Systems and Software Engineering 16 © USC-CSSE Model Accuracy Measures Magnitude of relative errors (MRE) Mean of MRE (MMRE) Prediction Level: PRED(l) = k/N –Where, k is the number of estimates with MRE ≤ l
17
University of Southern California Center for Systems and Software Engineering 17 © USC-CSSE Cross Validation 10-fold cross validation was used Step 1. Randomly split the dataset into K=10 subsets Step 2. For each i = 1... 10 –Remove the subset i th and build the model –i th subset is used as testing set to calculate MMRE i and PRED( l ) I Step 3. Repeat 1 and 2 for r=15 times
18
University of Southern California Center for Systems and Software Engineering 18 © USC-CSSE Non-cross validation results CMSECMAECMRE Max. MRE (c)MMREPRED*MMREPRED*MMREPRED* + infinity0.230.780.220.810.210.78 1.20.230.780.210.810.210.80 1.00.230.760.210.770.210.77 0.80.230.720.220.730.210.75 0.60.240.680.250.650.230.70 COCOMO II dataset (N = 161) CMSECMAECMRE Max. MRE (c)MMREPRED*MMREPRED*MMREPRED* + infinity0.300.620.290.650.300.68 1.20.300.650.280.650.280.62 1.00.300.620.290.620.290.59 0.80.300.580.290.600.300.59 COCOMO 81 dataset (N = 63) OLS: Max MRE=1.23 PRED=0.78 * PRED(0.3)
19
University of Southern California Center for Systems and Software Engineering 19 © USC-CSSE Cross-validation Results COCOMO II dataset COCOMO 81 dataset
20
University of Southern California Center for Systems and Software Engineering 20 © USC-CSSE Statistical Significance Results of statistical significance tests on MMRE (0.05 confidence level used) –Mann-Whitney U hypothesis test COCOMOII.2000COCOMO 81 There are statistical differences among Lasso, Ridge, OLS, Stepwise p > 0.11p > 0.15 CMSE outperforms Ridge, OLS p > 0.10 p > 0.10 CMSE outperforms Lasso, Stepwise p 0. 05 CMAE outperforms Lasso, Ridge, OLS p < 10 -3 p < 0. 02 Stepwise CMRE outperforms Lasso, Ridge, OLS p < 10 -4 p < 10 -4 Stepwise
21
University of Southern California Center for Systems and Software Engineering 21 © USC-CSSE Comparing With Published Results Some best published results in for COCOMO datasets –Bayesian analysis (Boehm et al., 2000) –Chen et al., 2006 Best cross-validated mean PRED(.30): DatasetBayesianChen et alCMRE COCOMO II70NA75 COCOMO 81NA5156
22
University of Southern California Center for Systems and Software Engineering 22 © USC-CSSE Productivity Range COCOMO II.2000 A = 2.94 B = 0.91 CMRE A = 2.27 B = 0.98
23
University of Southern California Center for Systems and Software Engineering 23 © USC-CSSE Outline Introduction –Multiple Linear Regression –OLS, Stepwise, Lasso, Ridge Constrained Linear Regression Validation and comparison –COCOMO overview –Cross validation Conclusions Limitations Future Work
24
University of Southern California Center for Systems and Software Engineering 24 © USC-CSSE Conclusions Technique imposes constraints on the estimates of coefficients and the magnitude of errors term –Directly resolving the unexpected estimates of coefficients determined by data –Estimation accuracies are favorable CMRE and CMAE outperform OLS, Stepwise, Ridge, Lasso, and CMSE –MRE and MAE are favorable objective functions Technique can be applied in not only COCOMO-like models but also other linear models An alternative for researchers and practitioners to build models
25
University of Southern California Center for Systems and Software Engineering 25 © USC-CSSE Limitations As the technique deals with the optimization, sub-optimal solution is returned instead of global-optimal one Multiple solutions exist for the estimates of coefficients There are only two datasets investigated, the technique might not work well on other datasets
26
University of Southern California Center for Systems and Software Engineering 26 © USC-CSSE Future Work Validate the technique using other datasets (e.g., NASA datasets) Compare results from the technique with others such as neutral networks, generic programming Apply and compare with other objective functions –MdMRE (median of MRE) –Z measure (z=estimate/actual)
27
University of Southern California Center for Systems and Software Engineering 27 © USC-CSSE References Boehm et al., 2000. B. Boehm, E. Horowitz, R. Madachy, D. Reifer, B. K. Clark, B. Steece, A. W. Brown, S. Chulani, and C. Abts, Software Cost Estimation with COCOMO II. Prentice Hall, 2000. Chen et al., 2000, Z. Chen, T. Menzies, D. Port, and B. Boehm. Finding the right data for software cost modeling. IEEE Software, Nov 2005.
28
University of Southern California Center for Systems and Software Engineering 28 © USC-CSSE Thank You Q&A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.