Qi Li,Qing Wang,Ye Yang and Mingshu Li

Slides:



Advertisements
Similar presentations
On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.
Advertisements

Chapter 5 Multiple Linear Regression
Design of Experiments Lecture I
The Multiple Regression Model.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Simple Linear Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
University of Southern California Center for Systems and Software Engineering 1 © USC-CSSE A Constrained Regression Technique for COCOMO Calibration Presented.
Local Bias and its Impacts on the Performance of Parametric Estimation Models Accepted by PROMISE2011 (Best paper award) Ye Yang, Lang Xie, Zhimin He (iTechs)
Today Concepts underlying inferential statistics
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Correlation and Regression Analysis
Applied Business Forecasting and Planning
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Objectives of Multiple Regression
Understanding Statistics
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
DSc 3120 Generalized Modeling Techniques with Applications Part II. Forecasting.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Managerial Economics Demand Estimation & Forecasting.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Loco: Distributing Ridge Regression with Random Projections Yang Song Department of Statistics.
Stats Methods at IC Lecture 3: Regression.
Multiple Regression Analysis: Inference
Estimating standard error using bootstrap
Multiple Regression.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Chapter 15 Multiple Regression Model Building
F-tests continued.
32931 Technology Research Methods Autumn 2017 Quantitative Research Component Topic 4: Bivariate Analysis (Contingency Analysis and Regression Analysis)
Computer aided teaching of statistics: advantages and disadvantages
Estimate Testing Size and Effort Using Test Case Point Analysis
Chapter 7. Classification and Prediction
Linear Regression.
Chapter 9 Multiple Linear Regression
Eco 6380 Predictive Analytics For Economists Spring 2016
Further Inference in the Multiple Regression Model
Statistics in MSmcDESPOT
Fundamentals of regression analysis
Estimating with PROBE II
More on Specification and Data Issues
Chapter 6: MULTIPLE REGRESSION ANALYSIS
Multiple Regression.
Multiple Regression Models
Techniques for Data Analysis Event Study
Qi Li,Qing Wang,Ye Yang and Mingshu Li
Linear Model Selection and regularization
Simple Linear Regression
Multivariate Linear Regression Models
Seminar in Economics Econ. 470
Product moment correlation
Regression Forecasting and Model Building
Understanding Statistical Inferences
Chapter 13 Additional Topics in Regression Analysis
Chapter 11 Variable Selection Procedures
Introduction to Regression
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
DESIGN OF EXPERIMENTS by R. C. Baker
Multiple Regression Berlin Chen
Propagation of Error Berlin Chen
Demand Management and Forecasting
Bootstrapping and Bootstrapping Regression Models
MGS 3100 Business Analysis Regression Feb 18, 2016
Design Issues Lecture Topic 6.
Presentation transcript:

Reducing Biases in Individual Software Effort Estimations: A Combining Approach Qi Li,Qing Wang,Ye Yang and Mingshu Li Laboratory for Internet Software Technologies Institute of Software Chinese Academy of Sciences COCOMO Forum, October 28, 2008 7/19/2019 COCOMO Forum 2008

Agenda Introduction Optimal Linear Combining (OLC) Method with an Experimental Study Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 7/19/2019 7/19/2019 COCOMO Forum 2008 2 2 2

Agenda Introduction Optimal Linear Combining (OLC) Method with an Experimental Study Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 7/19/2019 COCOMO Forum 2008 3 3 3

Introduction Software effort estimation techniques abound, each with its own set of advantages and disadvantages, and no one proves to be the single best answer Best practices recommends that project managers should use at least two approaches since many factors affect the estimation and these might be captured by using alternative approaches Combining forecasting techniques have been rapidly developed and widely used in many practical fields such as whether forecasting, money market, macro-economics analysis etc. with considerable success It has come to a consensus that combining estimation may help integrate estimating knowledge acquired by component methods, reduce errors deriving from faulty assumptions, bias, or mistakes in data and improve the estimation accuracy Optimal Linear Combining method is the most typical linear method 7/19/2019 COCOMO Forum 2008 4 4 4

Agenda Introduction OLC Method with an Experimental Study Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 7/19/2019 COCOMO Forum 2008 5 5 5

OLC Method with an Experimental Study Granger and Ramanathan first introduce the OLC method, Hashem and Schmeiser extend the idea of OLCs and discuss related issues about how to improve the predictive power of the combined model by reducing collinearity The OLC method gives components different weights according to their performances, can make full use of information provided by each component to maximize the accuracy in prediction Overview of OLC Method Step 1: Preparing data and component methods The same organization and preferably of the same project type “Different” component methods. Error Correlation Analysis Step 2: OLC modeling OLS, Four cases of OLC Step 3: Further improving OLC’s predictive power Collinearity Step 4: Returning the final estimating model 7/19/2019 COCOMO Forum 2008 6 6 6

Individual Estimates OLC Model More Accurate Estimating Result 7/19/2019 COCOMO Forum 2008

OLC Method with an Experimental Study Step 1: Preparing data and component methods Experiment Data Source: individual estimates of COCOMO、SLIM and Function Points for 15 projects from F.Kemerer’s empirical work Low Correlation can benefit combining method 7/19/2019 COCOMO Forum 2008

OLC Method with an Experimental Study Step 2:OLC Modeling Essence: Multiple regression analysis using Ordinary Least Square (OLS), estimates of components as independent variables, and actual effort as attributive variable Four extended cases of OLC models 7/19/2019 COCOMO Forum 2008

OLC Method with an Experimental Study Step 2:OLC Modeling In-Sample MSE Comparison Accuracy Comparison after LOOCV (leave-one-out cross-validation LOOCV) OLC’s MSE Still larger than F’s, so it needs further improvement MSRE and MMRE have already been improved 7/19/2019 COCOMO Forum 2008

OLC Method with an Experimental Study Step 3:Further Improving Predictive Power (1/4) The problem that affects the predictive power of the OLC is the collinearity among the predictors variables Solution: A common and simple way to deal with collineariy is to drop a component involved in the strongest collinearity. Two rules: “High R2( the multiple coefficient of determination [45])but few significant t ratios”. The variables whose coefficients are not significant are involved in collinearity. This rule of thumb helps us to detect collinearity and identify all the variables involved in collinearity. “High pair-wise correlations among regressors”. If the pair-wise or zero-order correlation coefficient between two regressors is high (generally higher than 0.8) then collinearity is a serious problem. This rule of thumb helps us to find the pair involved in the strongest collinearity. 7/19/2019 COCOMO Forum 2008

OLC Method with an Experimental Study Step 3:Further Improving Predictive Power (2/4) Accuracy Comparison after Dropping C Drop worse C OLC’s MSE is smaller than F’s,Accuracy has been improved 7/19/2019 COCOMO Forum 2008

OLC Method with an Experimental Study Step 3:Further Improving Predictive Power (3/4) OLC after Dropping C OLC after Dropping Constant Coefficients are all significant Drop the Constant Accuracy Comparison after Dropping Constant MSE further decreases. Accuracy is improved further 7/19/2019 COCOMO Forum 2008

OLC Method with an Experimental Study Step 3:Further Improving Predictive Power (4/4) OLC (Ⅰ_C+S+F)->OLC (Ⅰ_S+F)->OLC (Ⅲ_S+F) In succession to maximize OLC’s predictive power Decreasing Trend of MSE,MSRE,MMRE 7/19/2019 COCOMO Forum 2008

OLC Method with an Experimental Study Step 4:Returning the Final Estimating Model Result: Compared with the apparently best component F Accuracy on the sense of MSE, MSRE, MMRE are improved by 66.29% ,3.09 times and 61.48% respectively Consistency on the sense of SD is improved by 96.91% 7/19/2019 COCOMO Forum 2008

Agenda Introduction OLC Method with an Experimental Study Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 7/19/2019 COCOMO Forum 2008 16 16 16

Lessons learned from the Experiment The improvement in combining accuracy depends on the following factors: “Degree of redundancy in the information obtained from the components” If every component method captures the same information, there is no benefit from combining “Superiority of the best component method” If one method performs much superior to the rest, while the other methods have no additional knowledge to contribute, the OLC will tend to favor using the best component by itself “Adequacy of the combination data” Small quantity of data might cause severe ill effects of collinearity “Outliers at different noise levels” MSE are often blamed for its high sensitiveness to outliers, another way to reduce OLC sensitiveness to outliers might be employing other algorithms instead of OLS by minimizing less sensitive criterion 7/19/2019 COCOMO Forum 2008

Agenda Introduction OLC Method with an Experimental Study Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 7/19/2019 COCOMO Forum 2008 18 18 18

Discussion of Possible Threats to Validity Data Quality Estimates of components are old and can’t be compared to up-to-date methods Significant accuracy improvement in component methods will result in further accuracy improvement in combining methods Our focus in this paper is not to evaluate component methods, but to experimentally prove that combining methods can improve predictive power Data Quantity Lack of public data of individual estimates for the same data set Only 15 projects’ data might be statistically so small a sample to show OLC method’s effectiveness Statistical Significance Commonly used statistic tests: parametric test (paired t test) or nonparametric test (Wilcoxon matched pair test) are not proper for evaluating combining method's statistical significance, since the combining results are highly dependent on the components, it cannot always ensure significant improvement from the best component Not proper to require their results should be statistically significantly better than the best component Usability of OLC Model Complex and cost a lot We are currently implementing a tool incorporating the most popular and mature cost estimation techniques to solve this 7/19/2019 COCOMO Forum 2008

Agenda Introduction OLC Method with an Experimental Study Lessons learned from the Experiment Discussion of Possible Threats to Validity Conclusions and Future Work 7/19/2019 COCOMO Forum 2008 20 20 20

Conclusion and Future Work Introduce the systematic combining idea into the field of software effort estimation, and estimate software effort using Optimal Linear Combining (OLC) method with an experimental study based on a real-life data set Combining estimates derived from different techniques or tools and draw from different sources of information should become part of the mainstream of estimating practice in software effort to improve estimating accuracy Combining estimates is especially useful when you are uncertain about the situation, uncertain about which method is most accurate, and when you want to avoid large errors Providing an OLC estimate of the probability distribution of its possible values Exploring and validating more and effective combining methods using more data sets 7/19/2019 COCOMO Forum 2008

Thank you! 7/19/2019 COCOMO Forum 2008 22 22 22

Q & A 7/19/2019 COCOMO Forum 2008 23 23 23