Presentation is loading. Please wait.

Presentation is loading. Please wait.

Qi Li,Qing Wang,Ye Yang and Mingshu Li

Similar presentations


Presentation on theme: "Qi Li,Qing Wang,Ye Yang and Mingshu Li"— Presentation transcript:

1 Reducing Biases in Individual Software Effort Estimations: A Combining Approach
Qi Li,Qing Wang,Ye Yang and Mingshu Li Laboratory for Internet Software Technologies Institute of Software Chinese Academy of Sciences COCOMO Forum, October 28, 2008 7/19/2019 COCOMO Forum 2008

2 Agenda Introduction Optimal Linear Combining (OLC) Method with an Experimental Study Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 7/19/2019 7/19/2019 COCOMO Forum 2008 2 2 2

3 Agenda Introduction Optimal Linear Combining (OLC) Method with an Experimental Study Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 7/19/2019 COCOMO Forum 2008 3 3 3

4 Introduction Software effort estimation techniques abound, each with its own set of advantages and disadvantages, and no one proves to be the single best answer Best practices recommends that project managers should use at least two approaches since many factors affect the estimation and these might be captured by using alternative approaches Combining forecasting techniques have been rapidly developed and widely used in many practical fields such as whether forecasting, money market, macro-economics analysis etc. with considerable success It has come to a consensus that combining estimation may help integrate estimating knowledge acquired by component methods, reduce errors deriving from faulty assumptions, bias, or mistakes in data and improve the estimation accuracy Optimal Linear Combining method is the most typical linear method 7/19/2019 COCOMO Forum 2008 4 4 4

5 Agenda Introduction OLC Method with an Experimental Study
Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 7/19/2019 COCOMO Forum 2008 5 5 5

6 OLC Method with an Experimental Study
Granger and Ramanathan first introduce the OLC method, Hashem and Schmeiser extend the idea of OLCs and discuss related issues about how to improve the predictive power of the combined model by reducing collinearity The OLC method gives components different weights according to their performances, can make full use of information provided by each component to maximize the accuracy in prediction Overview of OLC Method Step 1: Preparing data and component methods The same organization and preferably of the same project type “Different” component methods. Error Correlation Analysis Step 2: OLC modeling OLS, Four cases of OLC Step 3: Further improving OLC’s predictive power Collinearity Step 4: Returning the final estimating model 7/19/2019 COCOMO Forum 2008 6 6 6

7 Individual Estimates OLC Model More Accurate Estimating Result
7/19/2019 COCOMO Forum 2008

8 OLC Method with an Experimental Study
Step 1: Preparing data and component methods Experiment Data Source: individual estimates of COCOMO、SLIM and Function Points for 15 projects from F.Kemerer’s empirical work Low Correlation can benefit combining method 7/19/2019 COCOMO Forum 2008

9 OLC Method with an Experimental Study
Step 2:OLC Modeling Essence: Multiple regression analysis using Ordinary Least Square (OLS), estimates of components as independent variables, and actual effort as attributive variable Four extended cases of OLC models 7/19/2019 COCOMO Forum 2008

10 OLC Method with an Experimental Study
Step 2:OLC Modeling In-Sample MSE Comparison Accuracy Comparison after LOOCV (leave-one-out cross-validation LOOCV) OLC’s MSE Still larger than F’s, so it needs further improvement MSRE and MMRE have already been improved 7/19/2019 COCOMO Forum 2008

11 OLC Method with an Experimental Study
Step 3:Further Improving Predictive Power (1/4) The problem that affects the predictive power of the OLC is the collinearity among the predictors variables Solution: A common and simple way to deal with collineariy is to drop a component involved in the strongest collinearity. Two rules: “High R2( the multiple coefficient of determination [45])but few significant t ratios”. The variables whose coefficients are not significant are involved in collinearity. This rule of thumb helps us to detect collinearity and identify all the variables involved in collinearity. “High pair-wise correlations among regressors”. If the pair-wise or zero-order correlation coefficient between two regressors is high (generally higher than 0.8) then collinearity is a serious problem. This rule of thumb helps us to find the pair involved in the strongest collinearity. 7/19/2019 COCOMO Forum 2008

12 OLC Method with an Experimental Study
Step 3:Further Improving Predictive Power (2/4) Accuracy Comparison after Dropping C Drop worse C OLC’s MSE is smaller than F’s,Accuracy has been improved 7/19/2019 COCOMO Forum 2008

13 OLC Method with an Experimental Study
Step 3:Further Improving Predictive Power (3/4) OLC after Dropping C OLC after Dropping Constant Coefficients are all significant Drop the Constant Accuracy Comparison after Dropping Constant MSE further decreases. Accuracy is improved further 7/19/2019 COCOMO Forum 2008

14 OLC Method with an Experimental Study
Step 3:Further Improving Predictive Power (4/4) OLC (Ⅰ_C+S+F)->OLC (Ⅰ_S+F)->OLC (Ⅲ_S+F) In succession to maximize OLC’s predictive power Decreasing Trend of MSE,MSRE,MMRE 7/19/2019 COCOMO Forum 2008

15 OLC Method with an Experimental Study
Step 4:Returning the Final Estimating Model Result: Compared with the apparently best component F Accuracy on the sense of MSE, MSRE, MMRE are improved by 66.29% ,3.09 times and 61.48% respectively Consistency on the sense of SD is improved by 96.91% 7/19/2019 COCOMO Forum 2008

16 Agenda Introduction OLC Method with an Experimental Study
Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 7/19/2019 COCOMO Forum 2008 16 16 16

17 Lessons learned from the Experiment
The improvement in combining accuracy depends on the following factors: “Degree of redundancy in the information obtained from the components” If every component method captures the same information, there is no benefit from combining “Superiority of the best component method” If one method performs much superior to the rest, while the other methods have no additional knowledge to contribute, the OLC will tend to favor using the best component by itself “Adequacy of the combination data” Small quantity of data might cause severe ill effects of collinearity “Outliers at different noise levels” MSE are often blamed for its high sensitiveness to outliers, another way to reduce OLC sensitiveness to outliers might be employing other algorithms instead of OLS by minimizing less sensitive criterion 7/19/2019 COCOMO Forum 2008

18 Agenda Introduction OLC Method with an Experimental Study
Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 7/19/2019 COCOMO Forum 2008 18 18 18

19 Discussion of Possible Threats to Validity
Data Quality Estimates of components are old and can’t be compared to up-to-date methods Significant accuracy improvement in component methods will result in further accuracy improvement in combining methods Our focus in this paper is not to evaluate component methods, but to experimentally prove that combining methods can improve predictive power Data Quantity Lack of public data of individual estimates for the same data set Only 15 projects’ data might be statistically so small a sample to show OLC method’s effectiveness Statistical Significance Commonly used statistic tests: parametric test (paired t test) or nonparametric test (Wilcoxon matched pair test) are not proper for evaluating combining method's statistical significance, since the combining results are highly dependent on the components, it cannot always ensure significant improvement from the best component Not proper to require their results should be statistically significantly better than the best component Usability of OLC Model Complex and cost a lot We are currently implementing a tool incorporating the most popular and mature cost estimation techniques to solve this 7/19/2019 COCOMO Forum 2008

20 Agenda Introduction OLC Method with an Experimental Study
Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 7/19/2019 COCOMO Forum 2008 20 20 20

21 Conclusion and Future Work
Introduce the systematic combining idea into the field of software effort estimation, and estimate software effort using Optimal Linear Combining (OLC) method with an experimental study based on a real-life data set Combining estimates derived from different techniques or tools and draw from different sources of information should become part of the mainstream of estimating practice in software effort to improve estimating accuracy Combining estimates is especially useful when you are uncertain about the situation, uncertain about which method is most accurate, and when you want to avoid large errors Providing an OLC estimate of the probability distribution of its possible values Exploring and validating more and effective combining methods using more data sets 7/19/2019 COCOMO Forum 2008

22 Thank you! 7/19/2019 COCOMO Forum 2008 22 22 22

23 Q & A 7/19/2019 COCOMO Forum 2008 23 23 23


Download ppt "Qi Li,Qing Wang,Ye Yang and Mingshu Li"

Similar presentations


Ads by Google