Understanding the Human Estimator Gary D. Boetticher Univ. of Houston - Clear Lake, Houston, TX, USA

Slides:



Advertisements
Similar presentations
Requirements Engineering Processes – 2
Advertisements

PDAs Accept Context-Free Languages
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala
A Study of State and Local Implementation and Impact The Study of State and Local Implementation and Impact of the Individuals with Disabilities Education.
Chapter 27 Software Change.
Rachel T. Johnson Douglas C. Montgomery Bradley Jones
PT3 Join Together: EFFECTIVE RECRUITING SURVEY Aggregated Results Developed by Melissa DeLana, 2006.
Learning Introductory Signal Processing Using Multimedia 1 Outline Overview of Information and Communications Some signal processing concepts Tools available.
Slide 1 FastFacts Feature Presentation September 6, 2012 To dial in, use this phone number and participant code… Phone number: Participant.
ASTM International Officers Training Workshop September 2012 Pat Picariello, Director, Developmental Operations 1 Strategic Planning & New Activity Development.
September 2013 ASTM Officers Training Workshop September 2013 ASTM Officers Training Workshop Strategic Planning & New Activity Development September 2013.
SMA 6304 / MIT / MIT Manufacturing Systems Lecture 11: Forecasting Lecturer: Prof. Duane S. Boning Copyright 2003 © Duane S. Boning. 1.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Welcome to Who Wants to be a Millionaire
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
MULTIPLICATION EQUATIONS 1. SOLVE FOR X 3. WHAT EVER YOU DO TO ONE SIDE YOU HAVE TO DO TO THE OTHER 2. DIVIDE BY THE NUMBER IN FRONT OF THE VARIABLE.
Addition Facts
Variation, uncertainties and models Marian Scott School of Mathematics and Statistics, University of Glasgow June 2012.
CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J R W Hunter,
1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.
SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)
Assumptions underlying regression analysis
Evaluating Provider Reliability in Risk-aware Grid Brokering Iain Gourlay.
£1 Million £500,000 £250,000 £125,000 £64,000 £32,000 £16,000 £8,000 £4,000 £2,000 £1,000 £500 £300 £200 £100 Welcome.
STATISTICAL INFERENCE ABOUT MEANS AND PROPORTIONS WITH TWO POPULATIONS
W.E.B. Du Bois Library Combining Quantitative and Qualitative Assessment of an Information Commons or Two Heads are Better than One – How Qualitative and.
On Sequential Experimental Design for Empirical Model-Building under Interval Error Sergei Zhilin, Altai State University, Barnaul, Russia.
1 WATER AUTHORITY Dr. Or Goldfarb CENTRAL BUREAU of STATISTICS Zaur Ibragimov Water Accounts in Israel Vienna January 2009.
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Chapter 4: Basic Estimation Techniques
1 Challenge the future Subtitless On Lightweight Design of Submarine Pressure Hulls.
ECTS grade system in the curricula of Ruse University Principal Assist. Dr Desislava Atanasova.
Test plans. Test Plans A test plan states: What the items to be tested are At what level they will be tested What sequence they are to be tested in How.
Discrete Event (time) Simulation Kenneth.
Squares and Square Root WALK. Solve each problem REVIEW:
Area under curves Consider the curve y = f(x) for x  [a, b] The actual area under the curve is units 2 The approximate area is the sum of areas.
Multiple Regression. Introduction In this chapter, we extend the simple linear regression model. Any number of independent variables is now allowed. We.
Lecture Unit Multiple Regression.
Executional Architecture
When you see… Find the zeros You think….
The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Univ. of Houston.
Better Software Defect Prediction Using Equalized Learning With Machine Learners Kim Kaminsky Gary D. Boetticher Department of Computer Science University.
Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models.
Nearest Neighbor Sampling for Better Defect Prediction Gary D. Boetticher Department of Software Engineering University of Houston - Clear Lake Houston,
Addition 1’s to 20.
Slippery Slope
1 Budgets and Budgetary Control Prepared and Presented By Gladstone K. Hlalakuhle.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Determining How Costs Behave
STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION GROUP
Spreadsheet Modeling & Decision Analysis
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
DETC ASME Computers and Information in Engineering Conference ONE AND TWO DIMENSIONAL DATA ANALYSIS USING BEZIER FUNCTIONS P.Venkataraman Rochester.
1 PART 1 ILLUSTRATION OF DOCUMENTS  Brief introduction to the documents contained in the envelope  Detailed clarification of the documents content.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Linear Function A Linear Function Is a function of the form where m and b are real numbers and m is the slope and b is the y - intercept. The x – intercept.
1 McGill University Department of Civil Engineering and Applied Mechanics Montreal, Quebec, Canada.
Lecture 14 Nonlinear Problems Grid Search and Monte Carlo Methods.
Data Modelling and Regression Techniques M. Fatih Amasyalı.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
Kim Kaminsky Gary D. Boetticher Department of Computer Science
How to Predict More with Less: Defect Prediction Using Machine Learners in an Implicitly Data Starved Domain Kim Kaminsky Gary D. Boetticher Department.
Regression Computer Print Out
Understanding the Human Estimator
Predict Failures with Developer Networks and Social Network Analysis
Presentation transcript:

Understanding the Human Estimator Gary D. Boetticher Univ. of Houston - Clear Lake, Houston, TX, USA 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop Nazim Lokhandwala Univ. of Houston - Clear Lake, Houston, TX, USA James C. Helm Univ. of Houston - Clear Lake, Houston, TX, USA

Introduction Chaos Chronicles [Standish03]  300 billion dollars  250,000 new projects  1.2 million dollars per project 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Boehm’s 4X 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Types of Estimation [Jorgenson04] % Human-Based % Algorithmic and Machine Learners 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Research Focus Number of Papers On Software Estimation in IEEE [Jorgenson02]  Human-Based Estimation (17%)  Other (83%) 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Statement of Problem How do human demographics affect human-based estimation? Can predictive models be constructed using human demographics? 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Investigation Procedure Collect demographics from participants Request participants to estimate software components Build models (Estimates vs. Actuals) Survey 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Which Demographics? Basic Demographics Academic Background Work Experience Domain Experience 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

The Survey nd International Predictor Models in Software Engineering (PROMISE) Workshop

Competitive Procurement Software Buyer Admin Buyer 1 Buyer n... Buyer Software Distribution Server Supplier 1 Supplier 2 Supplier n : Supplier Software 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Sample Estimation Screenshots 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Survey Results Screenshots 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Data Collection Invitations Filtered Incomplete Records 122 Final Records 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Participant Educational Background Most of the participants hold Bachelors or Masters Degrees MeanMaximum Standard Deviation Computer Science Undergrad Courses Grad Courses Hardware Undergrad Courses Grad Courses Management Information Systems Undergrad Courses Grad Courses Project Management Undergrad Courses Grad Courses Software Engineering Undergrad Courses Grad Courses nd International Predictor Models in Software Engineering (PROMISE) Workshop

Participant Work Experience MeanMaximum Standard Deviation (Years) Years of Experience As Hardware Project Manager Software Project Manager No of Projects estimated Hardware Projects Software Projects nd International Predictor Models in Software Engineering (PROMISE) Workshop

Participant Domain Experience Process Industry Procurement and Billing Domain Experience Standard Deviation Maximum (Years) Mean (Years) 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Data Preparation INPUT= 69% zeros…Needs Consolidation Courses, Workshops, Conferences, Programming Exp. 45 attributed reduced to 14 attributes Highest Degree Achieved…Need Transformation OUTPUT= MRE=Abs (Total Actual – Total Est.)/(Total Actual) 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Build Models Linear Regression (Excel) Non-Linear Regression (DataFit) Genetic Programming (GDB_GP) 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

GP Configuration 3 Settings 1000 Chromosomes 50 Generations 512 Chromosomes 128 Generations 1000 Chromosomes 128 Generations 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop 20 Trials each

Results: All Demographic Factors 1.87E E-17T-test Mean Non-Linear Regression Genetic Programming Linear Regression Non-Linear Regression Std. Error R Squared Genetic Programming Linear Regression Best Values of R Squared with Min. Std. Error T-Test between Average R Square Values 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Results: Educational Factors E-13T-test Mean Non-Linear Regression Genetic Programming Linear Regression Non-Linear Regression Std. Error R Squared Genetic Programming Linear Regression Best Values of R Squared with Min. Std. Error T-Test between Average R Square Values 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Results: Work Experience 1.54E E-19T-test Mean Non-Linear Regression Genetic Programming Linear Regression Non-Linear Regression Std. Error R Squared Genetic Programming Linear Regression Best Values of R Squared with Min. Std. Error T-Test between Average R Square Values 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Results: Domain Experience 4.55E E-23T-test Mean Non-Linear Regression Genetic Programming Linear Regression Non-Linear Regression Std. Error R Squared Genetic Programming Linear Regression Best Values of R Squared with Min. Std. Error T-Test between Average R Square Values 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Summary of All Experiments R Square Values Linear Regression Best Case Genetic Prog. Avg. Case Genetic Prog. Non-Linear Regression All Factors Education Only Work Experience Only Domain Experience Only nd International Predictor Models in Software Engineering (PROMISE) Workshop

Best Equation: All Factors. r 2 = ((Log (TechGradCourses + (TechGradCourses ^ ((Log TotWShops)/(Cos (TechGradCourses ^ ((ProcIndExp + (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos (Log (Log (Log SWProjEstExp))))))))))))) / (TechGradCourses ^ (Log SWProjEstExp)))))) / (((Cos (TechGradCourses ^ ((ProcIndExp + (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos (TechGradCourses ^ ((ProcIndExp + (((ProcIndExp + (Log (Sin MgmtGradCourses)))/(Sin SWPMExp)) + (Sin ((Cos (TechGradCourses ^ ((ProcIndExp + (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Sin SWPMExp)))))))))) / (TechGradCourses ^ (Log SWProjEstExp)))))) / (((Cos (TechGradCourses ^ ((Log SWProjEstExp) / (((Log (ProcIndExp + (Log (TechGradCourses ^ ((Log SWProjEstExp) / (Log SWProjEstExp)))))) - 3) / (ProcIndExp + (TechGradCourses ^ (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos ((((Log SWProjEstExp) / ((ProcIndExp + (Log (TechGradCourses ^ (TechGradCourses ^ (Log SWProjEstExp))))) / (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos (Log (Log (Log SWProjEstExp)))))))))))))) / (Sin SWPMExp)) / (Sin SWPMExp)))))))))))) / (TechGradCourses ^ (Log SWProjEstExp))))))))))) - 3) / (TechGradCourses ^ (Log SWProjEstExp)))))) + ((Log SWProjEstExp) / (Log SWProjEstExp)))))) / (Log (Log (Log (TechGradCourses + (Cos (Log (Log (TechGradCourses ^ (Cos (((((Log SWProjEstExp) / (TechGradCourses ^ (Log SWProjEstExp))) / ((ProcIndExp + (Log (Sin MgmtGradCourses))) / ((Log SWProjEstExp) / (Log SWProjEstExp)))) / (Sin SWPMExp)) / (Sin SWPMExp))))))))))))))))))))))) / (TechGradCourses ^ (Log SWProjEstExp)))))) / (((Log ((((Log TotLangExp) / (Log SWProjEstExp)) / (Log SWProjEstExp)) / (Sin SWPMExp))) - 3) / (TechGradCourses ^ (Log SWProjEstExp)))))) - 3) / (TechGradCourses ^ (Log SWProjEstExp)))))))))) + (((((ProcIndExp + (Log (TechGradCourses ^ (Log (TechGradCourses + ((TechGradCourses ^ (TechGradCourses ^ (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos ((((Log SWProjEstExp) / ((ProcIndExp + (Log (TechGradCourses ^ (Log (TechGradCourses + (Cos (Log (Log (TechGradCourses ^ (Cos (((((Log SWProjEstExp) / (TechGradCourses ^ (Log SWProjEstExp))) / ((ProcIndExp + (Log (Sin MgmtGradCourses))) / ((Log SWProjEstExp) / (Log SWProjEstExp)))) / (Sin SWPMExp)) / (Sin SWPMExp)))))))))))) / ((Log SWProjEstExp) / (Log SWProjEstExp)))) / (Sin SWPMExp)) / (Sin SWPMExp)))))))))))) / (TechGradCourses ^ (Log SWProjEstExp))))))) / (Sin SWPMExp))))))) / (TechGradCourses ^ (Log SWProjEstExp))) / (TechGradCourses ^ (Log SWProjEstExp))) / (TechGradCourses ^ (Log SWProjEstExp))) / (Sin SWPMExp))) 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop Too Much of a Good Thing?

Conclusions Viability of a human-based est. model Model assessment  Non-linear  GP Impact on Human Based Estimation 1) All Factors 2) Domain Experience  Work Experience 3) Education 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

Future Directions Equation Optimizer for GP Collect More Data  Further analysis without consolidation  Detailed Effect of Educational Factors Use other statistical indicators Build other models  Hybrid (Non-linear and GP)  Classifiers Impact of process on estimation 2 nd International Predictor Models in Software Engineering (PROMISE) Workshop

2 nd International Predictor Models in Software Engineering (PROMISE) Workshop Questions?

2 nd International Predictor Models in Software Engineering (PROMISE) Workshop Thank You !