Qi Li,Qing Wang,Ye Yang and Mingshu Li

Slides:



Advertisements
Similar presentations
On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.
Advertisements

Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Further Inference in the Multiple Regression Model Hill et al Chapter 8.
Chapter 5 Multiple Linear Regression
Design of Experiments Lecture I
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Simple Linear Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
University of Southern California Center for Systems and Software Engineering 1 © USC-CSSE A Constrained Regression Technique for COCOMO Calibration Presented.
Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable.
Local Bias and its Impacts on the Performance of Parametric Estimation Models Accepted by PROMISE2011 (Best paper award) Ye Yang, Lang Xie, Zhimin He (iTechs)
Today Concepts underlying inferential statistics
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Correlation and Regression Analysis
Applied Business Forecasting and Planning
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Objectives of Multiple Regression
Understanding Statistics
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Correlational Research Chapter Fifteen Bring Schraw et al.
DSc 3120 Generalized Modeling Techniques with Applications Part II. Forecasting.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
NCAF Manchester July 2000 Graham Hesketh Information Engineering Group Rolls-Royce Strategic Research Centre.
11 Chapter 5 The Research Process – Hypothesis Development – (Stage 4 in Research Process) © 2009 John Wiley & Sons Ltd.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
Loco: Distributing Ridge Regression with Random Projections Yang Song Department of Statistics.
Stats Methods at IC Lecture 3: Regression.
Multiple Regression Analysis: Inference
Estimating standard error using bootstrap
Multiple Regression.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Chapter 15 Multiple Regression Model Building
F-tests continued.
32931 Technology Research Methods Autumn 2017 Quantitative Research Component Topic 4: Bivariate Analysis (Contingency Analysis and Regression Analysis)
Computer aided teaching of statistics: advantages and disadvantages
Estimate Testing Size and Effort Using Test Case Point Analysis
Chapter 7. Classification and Prediction
Linear Regression.
Chapter 9 Multiple Linear Regression
Eco 6380 Predictive Analytics For Economists Spring 2016
Further Inference in the Multiple Regression Model
Statistics in MSmcDESPOT
Fundamentals of regression analysis
Estimating with PROBE II
More on Specification and Data Issues
...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001
Chapter 6: MULTIPLE REGRESSION ANALYSIS
Multiple Regression.
Principles of Supply Chain Management: A Balanced Approach
Multiple Regression Models
Techniques for Data Analysis Event Study
10701 / Machine Learning Today: - Cross validation,
Linear Model Selection and regularization
Simple Linear Regression
Multivariate Linear Regression Models
Seminar in Economics Econ. 470
Product moment correlation
Parametric Methods Berlin Chen, 2005 References:
Chapter 13 Additional Topics in Regression Analysis
Chapter 11 Variable Selection Procedures
Introduction to Regression
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
DESIGN OF EXPERIMENTS by R. C. Baker
Retrieval Performance Evaluation - Measures
Qi Li,Qing Wang,Ye Yang and Mingshu Li
Demand Management and Forecasting
MGS 3100 Business Analysis Regression Feb 18, 2016
Forecasting Plays an important role in many industries
Presentation transcript:

Reducing Biases in Individual Software Effort Estimations: A Combining Approach Qi Li,Qing Wang,Ye Yang and Mingshu Li Laboratory for Internet Software Technologies Institute of Software Chinese Academy of Sciences COCOMO Forum, October 28, 2008 12/6/2018 COCOMO Forum 2008

Agenda Introduction Optimal Linear Combining (OLC) Method with an Experimental Study Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 12/6/2018 12/6/2018 COCOMO Forum 2008 2 2 2

Agenda Introduction Optimal Linear Combining (OLC) Method with an Experimental Study Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 12/6/2018 COCOMO Forum 2008 3 3 3

Which technique, or tool should I use? Introduction Usually, one estimation tool performs well on some projects, but does much worse on other projects Some empirical studies show that when one technique predicts poorly, other techniques tend to perform significantly better Effort estimation tools and techniques abound, each with its own set of advantages and disadvantages, and no tool stands out to be the silver bullet Which technique, or tool should I use? 12/6/2018 COCOMO Forum 2008

Introduction (Cont.) Best practices recommend that project managers should use at least two approaches since many factors affect the estimation and these might be captured by using alternative approaches Combining forecasting techniques have been rapidly developed and widely used in many practical fields such as whether forecasting, money market, macro-economics analysis etc. with considerable success It has come to a consensus that combining estimation may help integrate estimating knowledge acquired by component methods, reduce errors deriving from faulty assumptions, bias, or mistakes in data and improve the estimation accuracy 12/6/2018 COCOMO Forum 2008 5 5 5

Which technique, or tool should I use? Introduction (Cont.) Expert method Parametric method Which technique, or tool should I use? Regression-based method Combine to generate more accurate result A new way to estimate Learning-oriented method Dynamics-based method … 12/6/2018 COCOMO Forum 2008

Agenda Introduction OLC Method with an Experimental Study Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 12/6/2018 COCOMO Forum 2008 7 7 7

OLC Method with an Experimental Study Optimal Linear Combining method is the most typical linear method Granger and Ramanathan first introduce the OLC method, Hashem and Schmeiser extend the idea of OLCs and discuss related issues about how to improve the predictive power of the combined model by reducing collinearity The OLC method gives components different weights according to their performances, can make full use of information provided by each component to maximize the accuracy in prediction 12/6/2018 COCOMO Forum 2008 8 8 8

OLC Method with an Experimental Study Overview of OLC Method Step 1: Preparing data and component methods The same organization and preferably of the same project type “Different” component methods. Error Correlation Analysis Step 2: OLC modeling OLS, Four cases of OLC Step 3: Further improving OLC’s predictive power Collinearity Step 4: Returning the final estimating model 12/6/2018 COCOMO Forum 2008 9 9 9

Individual Estimates OLC Model More Accurate Estimating Result 12/6/2018 COCOMO Forum 2008

OLC Method with an Experimental Study Step 1: Preparing data and component methods Experiment Data Source: individual estimates of COCOMO、SLIM and Function Points for 15 projects from F.Kemerer’s empirical work Low Correlation can benefit combining method 12/6/2018 COCOMO Forum 2008

OLC Method with an Experimental Study Step 2:OLC Modeling (1/2) Essence: Multiple regression analysis using Ordinary Least Square (OLS), estimates of components as independent variables, and actual effort as attributive variable Four extended cases of OLC models 12/6/2018 COCOMO Forum 2008

OLC Method with an Experimental Study Step 2:OLC Modeling (2/2) In-Sample MSE Comparison Accuracy Comparison after LOOCV (leave-one-out cross-validation LOOCV) OLC’s MSE still larger than F’s, so it needs further improvement MSRE and MMRE have already been improved 12/6/2018 COCOMO Forum 2008

OLC Method with an Experimental Study Step 3:Further Improving Predictive Power (1/4) The problem that affects the predictive power of the OLC is the collinearity among the predictors variables Solution: A common and simple way to deal with collineariy is to drop a component involved in the strongest collinearity. Two rules: “High R2( the multiple coefficient of determination [45])but few significant t ratios”. The variables whose coefficients are not significant are involved in collinearity. This rule of thumb helps us to detect collinearity and identify all the variables involved in collinearity. “High pair-wise correlations among regressors”. If the pair-wise or zero-order correlation coefficient between two regressors is high (generally higher than 0.8) then collinearity is a serious problem. This rule of thumb helps us to find the pair involved in the strongest collinearity. 12/6/2018 COCOMO Forum 2008

OLC Method with an Experimental Study Step 3:Further Improving Predictive Power (2/4) Accuracy Comparison after Dropping C Drop worse C OLC’s MSE is smaller than F’s,Accuracy has been improved 12/6/2018 COCOMO Forum 2008

OLC Method with an Experimental Study Step 3:Further Improving Predictive Power (3/4) OLC after Dropping C OLC after Dropping Constant Coefficients are all significant Drop the Constant Accuracy Comparison after Dropping Constant MSE further decreases. Accuracy is improved further 12/6/2018 COCOMO Forum 2008

OLC Method with an Experimental Study Step 3:Further Improving Predictive Power (4/4) OLC (Ⅰ_C+S+F)->OLC (Ⅰ_S+F)->OLC (Ⅲ_S+F) In succession to maximize OLC’s predictive power Decreasing Trend of MSE,MSRE,MMRE 12/6/2018 COCOMO Forum 2008

OLC Method with an Experimental Study Step 4:Returning the Final Estimating Model Result: Compared with the apparently best component F Accuracy on the sense of MSE, MSRE, MMRE are improved by 66.29% ,3.09 times and 61.48% respectively Consistency on the sense of SD is improved by 96.91% 12/6/2018 COCOMO Forum 2008

Agenda Introduction OLC Method with an Experimental Study Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 12/6/2018 COCOMO Forum 2008 19 19 19

Lessons learned from the Experiment The improvement in combining accuracy depends on the following factors: “Degree of redundancy in the information obtained from the components” If every component method captures the same information, there is no benefit from combining “Superiority of the best component method” If one method performs much superior to the rest, while the other methods have no additional knowledge to contribute, the OLC will tend to favor using the best component by itself 12/6/2018 COCOMO Forum 2008

Lessons learned from the Experiment (Cont.) “Adequacy of the combination data” Small quantity of data might cause severe ill effects of collinearity “Outliers at different noise levels” MSE are often blamed for its high sensitiveness to outliers, another way to reduce OLC sensitiveness to outliers might be employing other algorithms instead of OLS by minimizing less sensitive criterion, such as MMRE 12/6/2018 COCOMO Forum 2008

Agenda Introduction OLC Method with an Experimental Study Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 12/6/2018 COCOMO Forum 2008 22 22 22

Discussion of Possible Threats to Validity Data Quality Estimates of components are old and can’t be compared to up-to-date methods Significant accuracy improvement in component methods will result in further accuracy improvement in combining methods Our focus in this paper is not to evaluate component methods, but to experimentally prove that combining methods can improve predictive power Data Quantity Lack of public data of individual estimates for the same data set Only 15 projects’ data might be statistically so small to show OLC method’s effectiveness 12/6/2018 COCOMO Forum 2008

Discussion of Possible Threats to Validity (Cont.) Statistical Significance Commonly used statistic tests: parametric test (paired t test) or nonparametric test (Wilcoxon matched pair test) are not proper for evaluating combining method's statistical significance, since the combining results are highly dependent on the components, it cannot always ensure significant improvement from the best component Not proper to require their results should be statistically significantly better than the best component Usability of OLC Model Complex and cost a lot We are currently implementing a tool incorporating the most popular and mature cost estimation techniques with the same inputs to solve this problem 12/6/2018 COCOMO Forum 2008

Agenda Introduction OLC Method with an Experimental Study Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 12/6/2018 COCOMO Forum 2008 25 25 25

Conclusion Introduce the systematic combining idea into the field of software effort estimation, and estimate software effort using Optimal Linear Combining (OLC) method with an experimental study based on a real-life data set Combining estimates derived from different techniques or tools and draw from different sources of information should become part of the mainstream of estimating practice in software effort to improve estimating accuracy Combining estimates is especially useful when you are uncertain about the situation, uncertain about which method is the most accurate, and when you want to avoid large errors 12/6/2018 COCOMO Forum 2008

Future Work Providing an OLC estimate of the probability distribution of its possible values Exploring and validating more and effective combining methods using more data sets 12/6/2018 COCOMO Forum 2008

Thank you! 12/6/2018 COCOMO Forum 2008 28 28 28

Q & A 12/6/2018 COCOMO Forum 2008 29 29 29