Understanding the Human Estimator Gary D. Boetticher Univ. of Houston - Clear Lake, Houston, TX, USA 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop Nazim Lokhandwala Univ. of Houston - Clear Lake, Houston, TX, USA James C. Helm Univ. of Houston - Clear Lake, Houston, TX, USA
Introduction Chaos Chronicles [Standish03] 300 billion dollars 250,000 new projects 1.2 million dollars per project 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Boehm’s 4X 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Types of Estimation [Jorgenson04] % Human-Based % Algorithmic and Machine Learners 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Research Focus Number of Papers On Software Estimation in IEEE [Jorgenson02] Human-Based Estimation (17%) Other (83%) 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Statement of Problem How do human demographics affect human-based estimation? Can predictive models be constructed using human demographics? 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Investigation Procedure Collect demographics from participants Request participants to estimate software components Build models (Estimates vs. Actuals) Survey 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Which Demographics? Basic Demographics Academic Background Work Experience Domain Experience 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
The Survey nd International Predictor Models in Software Engineering (PROMISE) Workshop
Competitive Procurement Software Buyer Admin Buyer 1 Buyer n... Buyer Software Distribution Server Supplier 1 Supplier 2 Supplier n : Supplier Software 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Sample Estimation Screenshots 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Survey Results Screenshots 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Data Collection Invitations Filtered Incomplete Records 122 Final Records 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Participant Educational Background Most of the participants hold Bachelors or Masters Degrees MeanMaximum Standard Deviation Computer Science Undergrad Courses Grad Courses Hardware Undergrad Courses Grad Courses Management Information Systems Undergrad Courses Grad Courses Project Management Undergrad Courses Grad Courses Software Engineering Undergrad Courses Grad Courses nd International Predictor Models in Software Engineering (PROMISE) Workshop
Participant Work Experience MeanMaximum Standard Deviation (Years) Years of Experience As Hardware Project Manager Software Project Manager No of Projects estimated Hardware Projects Software Projects nd International Predictor Models in Software Engineering (PROMISE) Workshop
Participant Domain Experience Process Industry Procurement and Billing Domain Experience Standard Deviation Maximum (Years) Mean (Years) 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Data Preparation INPUT= 69% zeros…Needs Consolidation Courses, Workshops, Conferences, Programming Exp. 45 attributed reduced to 14 attributes Highest Degree Achieved…Need Transformation OUTPUT= MRE=Abs (Total Actual – Total Est.)/(Total Actual) 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Build Models Linear Regression (Excel) Non-Linear Regression (DataFit) Genetic Programming (GDB_GP) 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
GP Configuration 3 Settings 1000 Chromosomes 50 Generations 512 Chromosomes 128 Generations 1000 Chromosomes 128 Generations 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop 20 Trials each
Results: All Demographic Factors 1.87E E-17T-test Mean Non-Linear Regression Genetic Programming Linear Regression Non-Linear Regression Std. Error R Squared Genetic Programming Linear Regression Best Values of R Squared with Min. Std. Error T-Test between Average R Square Values 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Results: Educational Factors E-13T-test Mean Non-Linear Regression Genetic Programming Linear Regression Non-Linear Regression Std. Error R Squared Genetic Programming Linear Regression Best Values of R Squared with Min. Std. Error T-Test between Average R Square Values 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Results: Work Experience 1.54E E-19T-test Mean Non-Linear Regression Genetic Programming Linear Regression Non-Linear Regression Std. Error R Squared Genetic Programming Linear Regression Best Values of R Squared with Min. Std. Error T-Test between Average R Square Values 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Results: Domain Experience 4.55E E-23T-test Mean Non-Linear Regression Genetic Programming Linear Regression Non-Linear Regression Std. Error R Squared Genetic Programming Linear Regression Best Values of R Squared with Min. Std. Error T-Test between Average R Square Values 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Summary of All Experiments R Square Values Linear Regression Best Case Genetic Prog. Avg. Case Genetic Prog. Non-Linear Regression All Factors Education Only Work Experience Only Domain Experience Only nd International Predictor Models in Software Engineering (PROMISE) Workshop
Best Equation: All Factors. r 2 = ((Log (TechGradCourses + (TechGradCourses ^ ((Log TotWShops)/(Cos (TechGradCourses ^ ((ProcIndExp + (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos (Log (Log (Log SWProjEstExp))))))))))))) / (TechGradCourses ^ (Log SWProjEstExp)))))) / (((Cos (TechGradCourses ^ ((ProcIndExp + (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos (TechGradCourses ^ ((ProcIndExp + (((ProcIndExp + (Log (Sin MgmtGradCourses)))/(Sin SWPMExp)) + (Sin ((Cos (TechGradCourses ^ ((ProcIndExp + (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Sin SWPMExp)))))))))) / (TechGradCourses ^ (Log SWProjEstExp)))))) / (((Cos (TechGradCourses ^ ((Log SWProjEstExp) / (((Log (ProcIndExp + (Log (TechGradCourses ^ ((Log SWProjEstExp) / (Log SWProjEstExp)))))) - 3) / (ProcIndExp + (TechGradCourses ^ (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos ((((Log SWProjEstExp) / ((ProcIndExp + (Log (TechGradCourses ^ (TechGradCourses ^ (Log SWProjEstExp))))) / (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos (Log (Log (Log SWProjEstExp)))))))))))))) / (Sin SWPMExp)) / (Sin SWPMExp)))))))))))) / (TechGradCourses ^ (Log SWProjEstExp))))))))))) - 3) / (TechGradCourses ^ (Log SWProjEstExp)))))) + ((Log SWProjEstExp) / (Log SWProjEstExp)))))) / (Log (Log (Log (TechGradCourses + (Cos (Log (Log (TechGradCourses ^ (Cos (((((Log SWProjEstExp) / (TechGradCourses ^ (Log SWProjEstExp))) / ((ProcIndExp + (Log (Sin MgmtGradCourses))) / ((Log SWProjEstExp) / (Log SWProjEstExp)))) / (Sin SWPMExp)) / (Sin SWPMExp))))))))))))))))))))))) / (TechGradCourses ^ (Log SWProjEstExp)))))) / (((Log ((((Log TotLangExp) / (Log SWProjEstExp)) / (Log SWProjEstExp)) / (Sin SWPMExp))) - 3) / (TechGradCourses ^ (Log SWProjEstExp)))))) - 3) / (TechGradCourses ^ (Log SWProjEstExp)))))))))) + (((((ProcIndExp + (Log (TechGradCourses ^ (Log (TechGradCourses + ((TechGradCourses ^ (TechGradCourses ^ (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos ((((Log SWProjEstExp) / ((ProcIndExp + (Log (TechGradCourses ^ (Log (TechGradCourses + (Cos (Log (Log (TechGradCourses ^ (Cos (((((Log SWProjEstExp) / (TechGradCourses ^ (Log SWProjEstExp))) / ((ProcIndExp + (Log (Sin MgmtGradCourses))) / ((Log SWProjEstExp) / (Log SWProjEstExp)))) / (Sin SWPMExp)) / (Sin SWPMExp)))))))))))) / ((Log SWProjEstExp) / (Log SWProjEstExp)))) / (Sin SWPMExp)) / (Sin SWPMExp)))))))))))) / (TechGradCourses ^ (Log SWProjEstExp))))))) / (Sin SWPMExp))))))) / (TechGradCourses ^ (Log SWProjEstExp))) / (TechGradCourses ^ (Log SWProjEstExp))) / (TechGradCourses ^ (Log SWProjEstExp))) / (Sin SWPMExp))) 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop Too Much of a Good Thing?
Conclusions Viability of a human-based est. model Model assessment Non-linear GP Impact on Human Based Estimation 1) All Factors 2) Domain Experience Work Experience 3) Education 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
Future Directions Equation Optimizer for GP Collect More Data Further analysis without consolidation Detailed Effect of Educational Factors Use other statistical indicators Build other models Hybrid (Non-linear and GP) Classifiers Impact of process on estimation 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop
2 nd International Predictor Models in Software Engineering (PROMISE) Workshop Questions?
2 nd International Predictor Models in Software Engineering (PROMISE) Workshop Thank You !