Understanding the Human Estimator

Slides:



Advertisements
Similar presentations
The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Univ. of Houston.
Advertisements

Better Software Defect Prediction Using Equalized Learning With Machine Learners Kim Kaminsky Gary D. Boetticher Department of Computer Science University.
Understanding the Human Estimator Gary D. Boetticher Univ. of Houston - Clear Lake, Houston, TX, USA
Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models.
Nearest Neighbor Sampling for Better Defect Prediction Gary D. Boetticher Department of Software Engineering University of Houston - Clear Lake Houston,
On-line learning and Boosting
Presenter: Yufan Liu November 17th,
Psychology 202b Advanced Psychological Statistics, II February 15, 2011.
Statistical Methods Chichang Jou Tamkang University.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
REGRESSION Predict future scores on Y based on measured scores on X Predictions are based on a correlation from a sample where both X and Y were measured.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
by B. Zadrozny and C. Elkan
Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Chapter 6 : Software Metrics
Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains Gary D. Boetticher Department of Software Engineering.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Ch 6-1 © 2004 Pearson Education, Inc. Pearson Prentice Hall, Pearson Education, Upper Saddle River, NJ Ostwald and McLaren / Cost Analysis and Estimating.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Economics 173 Business Statistics Lecture 19 Fall, 2001© Professor J. Petry
Shuang Wu REU-DIMACS, 2010 Mentor: James Abello.  Project description  Our research project Input: time data recorded from the ‘Name That Cluster’ web.
Holly Wang Workshop at CAU December 15, 2010 Conducting Empirical Research and Publishing in International Journals.
1 Preparation for Final Exam How to answer question related to computer output?
WERST – Methodology Group
Linear Prediction Correlation can be used to make predictions – Values on X can be used to predict values on Y – Stronger relationships between X and Y.
1-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
© 2000 Prentice-Hall, Inc. Chap Chapter 10 Multiple Regression Models Business Statistics A First Course (2nd Edition)
Economics 173 Business Statistics Lecture 18 Fall, 2001 Professor J. Petry
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
MEER 111 – Global Research Solving Real-World Problems with Evolutionary Algorithms Daniel Tauritz, Ph.D. Associate Professor of Computer Science.
Multiple Regression Reference: Chapter 18 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
1 James R. Black Qing Qing Wu 17 Feb 2016 Modeling Prediction Intervals using Monte Carlo Simulation Software 2016 ICEAA Professional Development & Training.
Chapter 13 Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
Data Transformation: Normalization
MATH-138 Elementary Statistics
Developing an early warning system combined with dynamic LMS data
Chapter 7. Classification and Prediction
Mechanical Engineering Haldia Institute of Technology
Determining How Costs Behave
Introduction to Data Science Lecture 7 Machine Learning Overview
Shudong Wang NWEA Liru Zhang Delaware Department of Education
Auditing & Investigations I
Implications and Future Research Research Subjects/Questions
Kim Kaminsky Gary D. Boetticher Department of Computer Science
How to Predict More with Less: Defect Prediction Using Machine Learners in an Implicitly Data Starved Domain Kim Kaminsky Gary D. Boetticher Department.
Overview of Supervised Learning
IE-432 Design Of Industrial Experiments
MEASURING FOOD LOSSES Session 6: Loss assessment through modelling.
Regression Computer Print Out
Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov
Predict Failures with Developer Networks and Social Network Analysis
10701 / Machine Learning Today: - Cross validation,
School of Psychology, University of Aberdeen
Project Information Management Jiwei Ma
1/18/2019 ST3131, Lecture 1.
Reasoning in Psychology Using Statistics
Reasoning in Psychology Using Statistics
Forecasting - Introduction
Introduction to Regression
Multiple Regression Berlin Chen
Stock Predictions Project Presentation
Pearson Correlation and R2
Presentation transcript:

Understanding the Human Estimator Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston - Clear Lake, Houston, TX, USA Nazim Lokhandwala Lokhandwala@uhcl.edu Univ. of Houston - Clear Lake, Houston, TX, USA James C. Helm Helm@uhcl.edu Univ. of Houston - Clear Lake, Houston, TX, USA http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Chaos Chronicles [Standish03] Introduction Chaos Chronicles [Standish03] 300 billion dollars 250,000 new projects 1.2 million dollars per project http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Boehm’s 4X http://nas.cl.uh.edu/boetticher/publications.html Boehm’s Software Engineering Economics. http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Types of Estimation [Jorgenson04] 7 - 16% Algorithmic and Machine Learners [Jorgensen04], Jorgensen M., “A review of studies on Expert Estimation of Software Development Effort.”, “Journal of Systems and Software”, 2004. Jorgensen summarized and compared various surveys McAuley’s gave the above numbers. Heemstra - Human Based 62%, Algorithmic 14% and Other 9%, Wydenbach – 86%, Algorithmic 26% and Other 11%. 63 - 86% Human-Based http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Research Focus Number of Papers On Software Estimation in IEEE [Jorgenson02] Human-Based Estimation (17%) Other (83%) A search of estimation papers in the journals IEEE transactions of Software Engineering, Journals of Systems and Software, Journal of Information and Software Technology and Journal of Empirical Software engineering. http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

How do human demographics affect human-based estimation? Statement of Problem How do human demographics affect human-based estimation? Can predictive models be constructed using human demographics? http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Investigation Procedure Collect demographics from participants Request participants to estimate software components Build models (Estimates vs. Actuals) Survey http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Which Demographics? Basic Demographics Academic Background Work Experience Domain Experience http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

The Survey http://nas.cl.uh.edu/boetticher/EffortEstimationSurvey.html http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Competitive Procurement Software Supplier Software Buyer Software Distribution Server Supplier1 Buyer Admin Supplier2 Buyer1 ... Buyern : Suppliern http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Sample Estimation Screenshots http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Survey Results Screenshots http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Data Collection Invitations Filtered Incomplete Records 122 Final Records http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Participant Educational Background   Mean Maximum Standard Deviation Computer Science Undergrad Courses 8.8525 70 11.6326 Grad Courses 2.4262 15 3.2293 Hardware 3.5246 64 8.0209 0.5000 10 1.3252 Management Information Systems Undergrad Courses 0.7705 12 1.5892 0.4918 9 1.3742 Project Management 0.2951 4 0.6886 0.8115 6 1.1806 Software Engineering 0.9180 7 1.2958 2.1557 21 3.1202 Most of the participants hold Bachelors or Masters Degrees http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Participant Work Experience   Mean Maximum Standard Deviation (Years) Years of Experience As Hardware Project Manager 0.6557 15 1.9251 Software Project Manager 1.3443 10 2.0811 No of Projects estimated Hardware Projects 0.8279 20 2.6307 Software Projects 2.9508 28 4.4848 http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Participant Domain Experience 2.2512 20 0.7274 Process Industry 1.3818 10 0.6209 Procurement and Billing   Domain Experience Standard Deviation Maximum (Years) Mean http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Data Preparation INPUT= OUTPUT= 9 4 2 1 3 7 5 INPUT= 69% zeros…Needs Consolidation Courses, Workshops, Conferences, Programming Exp. 45 attributed reduced to 14 attributes Highest Degree Achieved…Need Transformation OUTPUT= MRE=Abs (Total Actual – Total Est.)/(Total Actual) http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Build Models Linear Regression (Excel) Non-Linear Regression (DataFit) Genetic Programming (GDB_GP) http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

GP Configuration 3 Settings 1000 Chromosomes 50 Generations 20 Trials each http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Results: All Demographic Factors Best Values of R Squared with Min. Std. Error 1.6470 0.8847 Non-Linear Regression Std. Error R Squared 1.3875 4.4580 0.9174 0.1550 Genetic Programming Linear Regression T-Test between Average R Square Values 1.87E-15 3.45E-17 T-test 0.8847 0.5592 0.1550 Mean Non-Linear Regression Genetic Programming Linear Regression http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Results: Educational Factors Best Values of R Squared with Min. Std. Error 4.1667 0.2136 Non-Linear Regression Std. Error R Squared 3.9738 4.6101 0.2784 0.0373 Genetic Programming Linear Regression T-Test between Average R Square Values 0.0486 2.74E-13 T-test 0.2136 0.1973 0.0373 Mean Non-Linear Regression Genetic Programming Linear Regression http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Results: Work Experience Best Values of R Squared with Min. Std. Error 4.0644 0.3698 Non-Linear Regression Std. Error R Squared 2.2855 4.5169 0.7572 0.0596 Genetic Programming Linear Regression T-Test between Average R Square Values 1.54E-11 2.73E-19 T-test 0.3698 0.5564 0.0596 Mean Non-Linear Regression Genetic Programming Linear Regression http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Results: Domain Experience Best Values of R Squared with Min. Std. Error 3.9091 0.3260 Non-Linear Regression Std. Error R Squared 2.9283 4.5425 0.5911 0.0243 Genetic Programming Linear Regression T-Test between Average R Square Values 4.55E-16 3.27E-23 T-test 0.3260 0.5405 0.0243 Mean Non-Linear Regression Genetic Programming Linear Regression http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Summary of All Experiments R Square Values Linear Regression Best Case Genetic Prog. Avg. Case Genetic Prog. Non-Linear Regression All Factors 0.1550 0.9174 0.5592 0.8847 Education Only 0.0373 0.2784 0.1973 0.2136 Work Experience Only 0.0596 0.7572 0.5564 0.3698 Domain Experience Only 0.0243 0.5911 0.5405 0.3260 http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Best Equation: All Factors. r2 = 0.9174 Too Much of a Good Thing? Best Equation: All Factors. r2 = 0.9174 ((Log (TechGradCourses + (TechGradCourses ^ ((Log TotWShops)/(Cos (TechGradCourses ^ ((ProcIndExp + (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos (Log (Log (Log SWProjEstExp))))))))))))) / (TechGradCourses ^ (Log SWProjEstExp)))))) / (((Cos (TechGradCourses ^ ((ProcIndExp + (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos (TechGradCourses ^ ((ProcIndExp + (((ProcIndExp + (Log (Sin MgmtGradCourses)))/(Sin SWPMExp)) + (Sin ((Cos (TechGradCourses ^ ((ProcIndExp + (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Sin SWPMExp)))))))))) / (TechGradCourses ^ (Log SWProjEstExp)))))) / (((Cos (TechGradCourses ^ ((Log SWProjEstExp) / (((Log (ProcIndExp + (Log (TechGradCourses ^ ((Log SWProjEstExp) / (Log SWProjEstExp)))))) - 3) / (ProcIndExp + (TechGradCourses ^ (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos ((((Log SWProjEstExp) / ((ProcIndExp + (Log (TechGradCourses ^ (TechGradCourses ^ (Log SWProjEstExp))))) / (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos (Log (Log (Log SWProjEstExp)))))))))))))) / (Sin SWPMExp)) / (Sin SWPMExp)))))))))))) / (TechGradCourses ^ (Log SWProjEstExp))))))))))) - 3) / (TechGradCourses ^ (Log SWProjEstExp)))))) + ((Log SWProjEstExp) / (Log SWProjEstExp)))))) / (Log (Log (Log (TechGradCourses + (Cos (Log (Log (TechGradCourses ^ (Cos (((((Log SWProjEstExp) / (TechGradCourses ^ (Log SWProjEstExp))) / ((ProcIndExp + (Log (Sin MgmtGradCourses))) / ((Log SWProjEstExp) / (Log SWProjEstExp)))) / (Sin SWPMExp)) / (Sin SWPMExp))))))))))))))))))))))) / (TechGradCourses ^ (Log SWProjEstExp)))))) / (((Log ((((Log TotLangExp) / (Log SWProjEstExp)) / (Log SWProjEstExp)) / (Sin SWPMExp))) - 3) / (TechGradCourses ^ (Log SWProjEstExp)))))) - 3) / (TechGradCourses ^ (Log SWProjEstExp)))))))))) + (((((ProcIndExp + (Log (TechGradCourses ^ (Log (TechGradCourses + ((TechGradCourses ^ (TechGradCourses ^ (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos ((((Log SWProjEstExp) / ((ProcIndExp + (Log (TechGradCourses ^ (Log (TechGradCourses + (Cos (Log (Log (TechGradCourses ^ (Cos (((((Log SWProjEstExp) / (TechGradCourses ^ (Log SWProjEstExp))) / ((ProcIndExp + (Log (Sin MgmtGradCourses))) / ((Log SWProjEstExp) / (Log SWProjEstExp)))) / (Sin SWPMExp)) / (Sin SWPMExp)))))))))))) / ((Log SWProjEstExp) / (Log SWProjEstExp)))) / (Sin SWPMExp)) / (Sin SWPMExp)))))))))))) / (TechGradCourses ^ (Log SWProjEstExp))))))) / (Sin SWPMExp))))))) / (TechGradCourses ^ (Log SWProjEstExp))) / (TechGradCourses ^ (Log SWProjEstExp))) / (TechGradCourses ^ (Log SWProjEstExp))) / (Sin SWPMExp))) http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Conclusions Viability of a human-based est. model Model assessment Non-linear  GP Impact on Human Based Estimation 1) All Factors 2) Domain Experience  Work Experience 3) Education http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Future Directions Equation Optimizer for GP Collect More Data Further analysis without consolidation Detailed Effect of Educational Factors Use other statistical indicators Build other models Hybrid (Non-linear and GP) Classifiers Impact of process on estimation http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Questions? http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop

Thank You ! http://nas.cl.uh.edu/boetticher/publications.html The 2nd International Predictor Models in Software Engineering (PROMISE) Workshop