Understanding the Human Estimator Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston - Clear Lake, Houston, TX, USA Nazim Lokhandwala Lokhandwala@uhcl.edu Univ. of Houston - Clear Lake, Houston, TX, USA James C. Helm Helm@uhcl.edu Univ. of Houston - Clear Lake, Houston, TX, USA http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Chaos Chronicles [Standish03] Introduction Chaos Chronicles [Standish03] 300 billion dollars 250,000 new projects 1.2 million dollars per project http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Boehm’s 4X http://nas.cl.uh.edu/boetticher/publications.html Boehm’s Software Engineering Economics. http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Types of Estimation [Jorgenson04] 7 - 16% Algorithmic and Machine Learners [Jorgensen04], Jorgensen M., “A review of studies on Expert Estimation of Software Development Effort.”, “Journal of Systems and Software”, 2004. Jorgensen summarized and compared various surveys McAuley’s gave the above numbers. Heemstra - Human Based 62%, Algorithmic 14% and Other 9%, Wydenbach – 86%, Algorithmic 26% and Other 11%. 63 - 86% Human-Based http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Research Focus Number of Papers On Software Estimation in IEEE [Jorgenson02] Human-Based Estimation (17%) Other (83%) A search of estimation papers in the journals IEEE transactions of Software Engineering, Journals of Systems and Software, Journal of Information and Software Technology and Journal of Empirical Software engineering. http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
How do human demographics affect human-based estimation? Statement of Problem How do human demographics affect human-based estimation? Can predictive models be constructed using human demographics? http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Investigation Procedure Collect demographics from participants Request participants to estimate software components Build models (Estimates vs. Actuals) Survey http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Which Demographics? Basic Demographics Academic Background Work Experience Domain Experience http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
The Survey http://nas.cl.uh.edu/boetticher/EffortEstimationSurvey.html http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Competitive Procurement Software Supplier Software Buyer Software Distribution Server Supplier1 Buyer Admin Supplier2 Buyer1 ... Buyern : Suppliern http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Sample Estimation Screenshots http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Survey Results Screenshots http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Data Collection Invitations Filtered Incomplete Records 122 Final Records http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Participant Educational Background Mean Maximum Standard Deviation Computer Science Undergrad Courses 8.8525 70 11.6326 Grad Courses 2.4262 15 3.2293 Hardware 3.5246 64 8.0209 0.5000 10 1.3252 Management Information Systems Undergrad Courses 0.7705 12 1.5892 0.4918 9 1.3742 Project Management 0.2951 4 0.6886 0.8115 6 1.1806 Software Engineering 0.9180 7 1.2958 2.1557 21 3.1202 Most of the participants hold Bachelors or Masters Degrees http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Participant Work Experience Mean Maximum Standard Deviation (Years) Years of Experience As Hardware Project Manager 0.6557 15 1.9251 Software Project Manager 1.3443 10 2.0811 No of Projects estimated Hardware Projects 0.8279 20 2.6307 Software Projects 2.9508 28 4.4848 http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Participant Domain Experience 2.2512 20 0.7274 Process Industry 1.3818 10 0.6209 Procurement and Billing Domain Experience Standard Deviation Maximum (Years) Mean http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Data Preparation INPUT= OUTPUT= 9 4 2 1 3 7 5 INPUT= 69% zeros…Needs Consolidation Courses, Workshops, Conferences, Programming Exp. 45 attributed reduced to 14 attributes Highest Degree Achieved…Need Transformation OUTPUT= MRE=Abs (Total Actual – Total Est.)/(Total Actual) http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Build Models Linear Regression (Excel) Non-Linear Regression (DataFit) Genetic Programming (GDB_GP) http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
GP Configuration 3 Settings 1000 Chromosomes 50 Generations 20 Trials each http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Results: All Demographic Factors Best Values of R Squared with Min. Std. Error 1.6470 0.8847 Non-Linear Regression Std. Error R Squared 1.3875 4.4580 0.9174 0.1550 Genetic Programming Linear Regression T-Test between Average R Square Values 1.87E-15 3.45E-17 T-test 0.8847 0.5592 0.1550 Mean Non-Linear Regression Genetic Programming Linear Regression http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Results: Educational Factors Best Values of R Squared with Min. Std. Error 4.1667 0.2136 Non-Linear Regression Std. Error R Squared 3.9738 4.6101 0.2784 0.0373 Genetic Programming Linear Regression T-Test between Average R Square Values 0.0486 2.74E-13 T-test 0.2136 0.1973 0.0373 Mean Non-Linear Regression Genetic Programming Linear Regression http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Results: Work Experience Best Values of R Squared with Min. Std. Error 4.0644 0.3698 Non-Linear Regression Std. Error R Squared 2.2855 4.5169 0.7572 0.0596 Genetic Programming Linear Regression T-Test between Average R Square Values 1.54E-11 2.73E-19 T-test 0.3698 0.5564 0.0596 Mean Non-Linear Regression Genetic Programming Linear Regression http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Results: Domain Experience Best Values of R Squared with Min. Std. Error 3.9091 0.3260 Non-Linear Regression Std. Error R Squared 2.9283 4.5425 0.5911 0.0243 Genetic Programming Linear Regression T-Test between Average R Square Values 4.55E-16 3.27E-23 T-test 0.3260 0.5405 0.0243 Mean Non-Linear Regression Genetic Programming Linear Regression http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Summary of All Experiments R Square Values Linear Regression Best Case Genetic Prog. Avg. Case Genetic Prog. Non-Linear Regression All Factors 0.1550 0.9174 0.5592 0.8847 Education Only 0.0373 0.2784 0.1973 0.2136 Work Experience Only 0.0596 0.7572 0.5564 0.3698 Domain Experience Only 0.0243 0.5911 0.5405 0.3260 http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Best Equation: All Factors. r2 = 0.9174 Too Much of a Good Thing? Best Equation: All Factors. r2 = 0.9174 ((Log (TechGradCourses + (TechGradCourses ^ ((Log TotWShops)/(Cos (TechGradCourses ^ ((ProcIndExp + (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos (Log (Log (Log SWProjEstExp))))))))))))) / (TechGradCourses ^ (Log SWProjEstExp)))))) / (((Cos (TechGradCourses ^ ((ProcIndExp + (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos (TechGradCourses ^ ((ProcIndExp + (((ProcIndExp + (Log (Sin MgmtGradCourses)))/(Sin SWPMExp)) + (Sin ((Cos (TechGradCourses ^ ((ProcIndExp + (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Sin SWPMExp)))))))))) / (TechGradCourses ^ (Log SWProjEstExp)))))) / (((Cos (TechGradCourses ^ ((Log SWProjEstExp) / (((Log (ProcIndExp + (Log (TechGradCourses ^ ((Log SWProjEstExp) / (Log SWProjEstExp)))))) - 3) / (ProcIndExp + (TechGradCourses ^ (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos ((((Log SWProjEstExp) / ((ProcIndExp + (Log (TechGradCourses ^ (TechGradCourses ^ (Log SWProjEstExp))))) / (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos (Log (Log (Log SWProjEstExp)))))))))))))) / (Sin SWPMExp)) / (Sin SWPMExp)))))))))))) / (TechGradCourses ^ (Log SWProjEstExp))))))))))) - 3) / (TechGradCourses ^ (Log SWProjEstExp)))))) + ((Log SWProjEstExp) / (Log SWProjEstExp)))))) / (Log (Log (Log (TechGradCourses + (Cos (Log (Log (TechGradCourses ^ (Cos (((((Log SWProjEstExp) / (TechGradCourses ^ (Log SWProjEstExp))) / ((ProcIndExp + (Log (Sin MgmtGradCourses))) / ((Log SWProjEstExp) / (Log SWProjEstExp)))) / (Sin SWPMExp)) / (Sin SWPMExp))))))))))))))))))))))) / (TechGradCourses ^ (Log SWProjEstExp)))))) / (((Log ((((Log TotLangExp) / (Log SWProjEstExp)) / (Log SWProjEstExp)) / (Sin SWPMExp))) - 3) / (TechGradCourses ^ (Log SWProjEstExp)))))) - 3) / (TechGradCourses ^ (Log SWProjEstExp)))))))))) + (((((ProcIndExp + (Log (TechGradCourses ^ (Log (TechGradCourses + ((TechGradCourses ^ (TechGradCourses ^ (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos ((((Log SWProjEstExp) / ((ProcIndExp + (Log (TechGradCourses ^ (Log (TechGradCourses + (Cos (Log (Log (TechGradCourses ^ (Cos (((((Log SWProjEstExp) / (TechGradCourses ^ (Log SWProjEstExp))) / ((ProcIndExp + (Log (Sin MgmtGradCourses))) / ((Log SWProjEstExp) / (Log SWProjEstExp)))) / (Sin SWPMExp)) / (Sin SWPMExp)))))))))))) / ((Log SWProjEstExp) / (Log SWProjEstExp)))) / (Sin SWPMExp)) / (Sin SWPMExp)))))))))))) / (TechGradCourses ^ (Log SWProjEstExp))))))) / (Sin SWPMExp))))))) / (TechGradCourses ^ (Log SWProjEstExp))) / (TechGradCourses ^ (Log SWProjEstExp))) / (TechGradCourses ^ (Log SWProjEstExp))) / (Sin SWPMExp))) http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Conclusions Viability of a human-based est. model Model assessment Non-linear GP Impact on Human Based Estimation 1) All Factors 2) Domain Experience Work Experience 3) Education http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Future Directions Equation Optimizer for GP Collect More Data Further analysis without consolidation Detailed Effect of Educational Factors Use other statistical indicators Build other models Hybrid (Non-linear and GP) Classifiers Impact of process on estimation http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Questions? http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop
Thank You ! http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop