CS 540 – Quantitative Software Engineering

CS 540 – Quantitative Software Engineering
Lecture 7 Estimation Estimate size, then Estimate effort, schedule and cost from size Bound estimates

Project Metrics: Why Estimate?
Cost and schedule estimation Measure progress Calibrate models for future estimation Manage Project Scope Make Bid/No Bid decisions Make Buy/Build decisions Just differentiating metrics and their uses (and most metrics could be used for both).

QSE Lambda Protocol Prospectus Measurable Operational Value Prototyping or Modeling sQFD (Quality Functional Deployment) Schedule, Staffing, Quality Estimates ICED-T (Intuitive, Consistent, Efficient, Durable-Thoughtful) Trade-off Analysis

Specification for Development Plan
Project Feature List Development Process Size Estimates Staff Estimates Schedule Estimates Organization Gantt Chart

Approaches to Cost Estimation
By expert By analogies Decomposition Parkinson’s Law; work expands to fill time available Price to win/ customer willingness-to -pay Lines of Code Function Points Mathematical Models: Function Points & COCOMO

Heuristics to do Better Estimates
Decompose Work Breakdown Structure to lowest possible level and type of software. Review assumptions with all stakeholders Do your homework - past organizational experience Retain contact with developers Update estimates and track new projections (and warn) Use multiple methods Reuse makes it easier (and more difficult) Use ‘current estimate’ scheme Here are some other hints I have accumulated over time. An obvious practice is that once development starts, track your estimates to determine how well they are doing and adjust accordingly! Reuse is great, but as B&Y pointed out, preparing code for reuse requires a tripling of the estimate of effort.

Heuristics to Cope with Estimates
Add and train developers early Use gurus for tough tasks Provide manufacturing and admin support Sharpen tools Eliminate unrelated work and red tape (50% issue) Devote full time end user to project Increase level of exec sponsorship to break new ground (new tools, techniques, training) Set a schedule goal date but commit only after detailed design Use broad estimation ranges rather than single point estimates And what if you are not unhappy with the schedule that management hands to you! (Not so uncommon these days!) McConnell lists some strategies that ma help you to meet these “unpopular” (aka impossible) schedules.

Popular Methods for Effort Estimation
Parametric Estimation Wideband Delphi Cocomo SLIM (Software Lifecycle Management) SEER-SEM Function Point Analysis PROBE (Proxy bases estimation, SEI CMM) Planning Game (XP) Explore-Commit Program Evaluation and Review Technique (PERT)

SEER-SEM: System Evaluation and Estimation of Resources
Sizing. How large is the software project being estimated (Lines of Code, Function Points, Use Cases, etc.) Technology. What is the possible productivity of the developers (capabilities, tools, practices, etc.) Effort and Schedule Calculation. What amount of effort and time are required to complete the project? Constrained Effort/Schedule Calculation. How does the expected project outcome change when schedule and staffing constraints are applied? Activity and Labor Allocation. How should activities and labor be allocated into the estimate? Cost Calculation. Given expected effort, duration, and the labor allocation, how much will the project cost? Defect Calculation. Given product type, project duration, and other information, what is the expected, objective quality of the delivered software? Maintenance Effort Calculation. How much effort will be required to adequately maintain and upgrade a fielded software system? Progress. How is the project progressing and where will it end up. Also how to replan. Validity. Is this development achievable based on the technology involved?

Wide Band Delphi Convene a group of experts
Coordinator provides each expert with specification Experts make private estimate in interval format: most likely value and an upper and lower bound Coordinator prepares summary report indicating group and individual estimates Experts discuss and defend estimates Group iterates until consensus is reached This is sort of a “N heads are better than one” philosophy! This does work fairly well if you have the experts available to do this. It also is useful on some of the new classes of projects, e.g., web related, since most of the older measures including COCOMO did not have in the initial project list that were used to compute the measures. If you do have the experts and they are familiar with the application domain I find this preferable, supported by a KLOC or Function Point estimate -- the two provide a fairly good view on what it will take. If your experts are truly experts and they know your team, they will most likely be the most accurate because they have more local knowledge that cannot possibly be factored into the more general models.

Minimum Time: PERT and GANTT

But, how can I estimate staff months?
Boehm: “A project can not be done in less than 75% of theoretical time” Time Ttheoretical 75% * Ttheoretical Linear increase Staff-month Impossible design But, how can I estimate staff months? Ttheoretical = 2.5 * 3√staff-months

Sizing Software Projects
Effort = (productivity)-1 (size)c productivity ≡ staff-months/kloc size ≡ kloc Staff months 500 Lines of Code or Function Points

Understanding the equations
Consider a transaction project of 38,000 lines of code, what is the shortest time it will take to develop? Module development is about 400 KSLOC/staff month Effort = (productivity)-1 (size)c = (1/.400 KSLOC/SM) (38 KSLOC)1.02 = 2.5 (38)1.02 ≈ 100 SM Min time = .75 T= (.75)(2.5)(SM)1/3 ≈ 1.875(100)1/3 ≈ x 4.63 ≈ 9 months

Productivity= f(size)
Bell Laboratories data Capers Jones data Productivity (Function points / staff month) Function Points

Lines of Code LOC ≡ Line of Code KLOC ≡ Thousands of LOC KSLOC ≡ Thousands of Source LOC NCSLOC ≡ New or Changed KSLOC

Bernstein’s rule of thumb
Productivity per staff-month: 50 NCSLOC for OS code (or real-time system) NCSLOC for intermediary applications (high risk, on-line) NCSLOC for normal applications (low risk, on-line) 10,000 – 20,000 NCSLOC for reused code Reuse note: Sometimes, reusing code that does not provide the exact functionality needed can be achieved by reformatting input/output. This decreases performance but dramatically shortens development time.

Productivity: Measured in 2000
Classical rates 130 – 195 NCSLOC Evolutionary approaches 244 – 325 NCSLOC New embedded flight software 17 – 105 NCSLOC

Heuristics for requirements engineering
Move some of the desired functionality into version 2 Deliver product in stages 0.2, 0.4… Eliminate features Simplify Features Reduce Gold Plating Relax the specific feature specifications Here’s some hints -- note the implicit assumption is that the estimate exceeds the allotted schedule! I’ve never seen it go the other way! Speaking of a risky business -- onto risk!

Function Point (FP) Analysis
Useful during requirement phase Substantial data supports the methodology Software skills and project characteristics are accounted for in the Adjusted Function Points FP is technology and project process dependent so that technology changes require recalibration of project models. Converting Unadjusted FPs (UFP) to LOC for a specific language (technology) and then use a model such as COCOMO.

Function Point Calculations
Unadjusted Function Points UFP= 4I + 5O + 4E + 10L + 7F, Where I ≡ Count of input types that are user inputs and change data structures. O ≡ Count of output types E ≡ Count of inquiry types or inputs controlling execution. [think menu selections] L ≡ Count of logical internal files, internal data used by system [think index files; they are group of logically related data entirely within the applications boundary and maintained by external inputs. ] F ≡ Count of interfaces data output or shared with another application Note that the constants in the nominal equation can be calibrated to a specific software product line. Function points have their fudge factors too, and most practitioners do not use the unadjusted function point metric.

External Inputs – One updates two files
External Inputs (EI) - when data crosses the boundary from outside to inside. This data may come from a data input screen or another application.

External Interface Table
File Type References (FTR’s) are the sum of Internal Logical Files referenced or updated and External Interface Files referenced. For example, EIs that reference or update 2 File Types Referenced (FTR’s) and has 7 data elements would be assigned a ranking of average and associated rating of 4.

External Output from 2 Internal Files
External Outputs (EO) – when data passes across the boundary from inside to outside.

External Inquiry drawing from 2 ILFs
External Inquiry (EQ) - an elementary process with both input and output components that result in data retrieval from one or more internal logical files and external interface files. The input process does not update Internal Logical File, and there is no derived data.

EO and EQ Table mapped to Values

Adjusted Function Points
Accounting for Physical System Characteristics Characteristic Rated by System User 0-5 based on “degree of influence” 3 is average Unadjusted Function Points (UFP) X General System Characteristics (GSC) = Data Communications Distributed Data/Processing Performance Objectives Heavily Used Configuration Transaction Rate On-Line Data Entry End-User Efficiency On-Line Update Complex Processing Reusability Conversion/Installation Ease Operational Ease Multiple Site Use Facilitate Change Adjusted Function Points (AFP) AFP = UFP ( *GSC), note GSC = VAF= TDI

Complexity Table TYPE: SIMPLE AVERAGE COMPLEX INPUT (I) 3 4 6
OUTPUT(O) 5 7 INQUIRY(E) LOG INT (L) 10 15 INTERFACES (F) There is also a judgment made on the complexity of each of these domains that were counted.

Complexity Factors 1. Problem Domain ___
2. Architecture Complexity ___ 3. Logic Design -Data ___ 4. Logic Design- Code ___ Total ___ Complexity = Total/4 = _________

Problem Domain Measure of Complexity (1 is simple and 5 is complex)
All algorithms and calculations are simple. Most algorithms and calculations are simple. Most algorithms and calculations are moderately complex. Some algorithms and calculations are difficult. Many algorithms and calculations are difficult. Score ____

Architecture Complexity
Architecture Complexity Measure of Complexity (1 is simple and 5 is complex) 1. Code ported from one known environment to another. Application does not change more than 5%. 2. Architecture follows an existing pattern. Process design is straightforward. No complex hardware/software interfaces. 3. Architecture created from scratch. Process design is straightforward. No complex hardware/software interfaces. 4. Architecture created from scratch. Process design is complex. Complex hardware/software interfaces exist but they are well defined and unchanging. 5. Architecture created from scratch. Process design is complex. Complex hardware/software interfaces are ill defined and changing. Score ____

Logic Design -Data Simple well defined and unchanging data structures. Shallow inheritance in class structures. No object classes have inheritance greater than 3. Several data element types with straightforward relationships. No object classes have inheritance greater than Multiple data files, complex data relationships, many libraries, large object library. No more than ten percent of the object classes have inheritance greater than three. The number of object classes is less than 1% of the function points Complex data elements, parameter passing module-to-module, complex data relationships and many object classes has inheritance greater than three. A large but stable number of object classes. Complex data elements, parameter passing module-to-module, complex data relationships and many object classes has inheritance greater than three. A large and growing number of object classes. No attempt to normalize data between modules Score ____

Logic Design- Code Nonprocedural code (4GL, generated code, screen skeletons). High cohesion. Programs inspected. Module size constrained between 50 and 500 Source Lines of Code (SLOCs). Program skeletons or patterns used. ). High cohesion. Programs inspected. Module size constrained between 50 and 500 SLOCs. Reused modules. Commercial object libraries relied on. High cohesion. Well-structured, small modules with low coupling. Object class methods well focused and generalized. Modules with single entry and exit points. Programs reviewed. Complex but known structure randomly sized modules. Some complex object classes. Error paths unknown. High coupling. Code structure unknown, randomly sized modules, complex object classes and error paths unknown. High coupling. Score __

Computing Function Points
See

Adjusted Function Points
Now account for 14 characteristics on a 6 point scale (0-5) Total Degree of Influence (DI) is sum of scores. DI is converted to a technical complexity factor (TCF) TCF = DI Adjusted Function Point is computed by FP = UFP X TCF For any language there is a direct mapping from Function Points to LOC Beware function point counting is hard and needs special skills Adjusted function points consider factors similar to those of the advanced COCOMO model. These factors are shown on the next slide. In training, folks should strive for consistency in counting function points … that is, scorers should agree on the counts of the factors, the complexity of each of the counts and the scoring of the characteristics -- this can be achieved but it takes a fair amount of training. The measure of scorers agreeing with each other is often referred to as inter-rater reliability.

Function Points Qualifiers
Based on counting data structures Focus is on-line data base systems Less accurate for WEB applications Even less accurate for Games, finite state machine and algorithm software Not useful for extended machine software and compliers An alternative to NCKSLOC because estimates can be based on requirements and design data. In other software engineering course at Stevens you will learn to calculate function points. It is a skill that has to be acquired and it does take a while. Function points are currently one of the more popular ways to estimate effort. B&Y stress in heavily in chapter 6.

SLOC Defined : Single statement, not two separated by semicolon
Line feed All written statements (OA&M) No Comments Count all instances of calls, subroutines, … There are no industry standards and SLOC can be fudged More on SLOC. The last line is telling - if you demand that developers code to some productivity level measured by lines of code, number of comments, you get what you ask for -- see pp in B&Y, the case of the Bard’s Bulge.

Initial Conversion Language Median SLOC/function point C 104 C++ 53
Language Median SLOC/function point C 104 C++ 53 HTML 42 JAVA 59 Perl 60 J2EE 50 Visual Basic For completeness and to provide you with a feel for the degree of effort a function point represents here’s another table mapping several computer languages to function points.

Average Median Low High Consultant

Function Points = UFP x TCF = 78 * .96 = 51.84 ~ 52 function points
SLOC Function Points = UFP x TCF = 78 * .96 = ~ 52 function points 78 UFP * 53 (C++) SLOC / UFP = 4,134 SLOC ≈ 4.2 KSLOC . (Reference for SLOC per function point:

Understanding the equations
For 4,200 lines of code, what is the shortest time it will take to develop? Module development is about 400 SLOC/staff month From COCOMO: Effort = 2.4 (size)c By Barry Boehm

What is ‘2.4?’ Effort = 2.4 (size)c = 1/(.416) (size)c Effort = (productivity)-1 (size)c where productivity = 400 KSLOC/SM from the statement of the problem = (1/.400 KSLOC/SM)(4.2 KSLOC)1.16 = 2.5 (4.2)1.16 ≈ 13 SM

Minimum Time Theoretical time = 2.5 * 3√staff-months Min time = .75 Theorectical time = (.75)(2.5)(SM)1/3 ≈ 1.875(13)1/3 ≈ x 2.4 ≈ 4.5 months

Function Point pros and cons
Language independent Understandable by client Simple modeling Hard to fudge Visible feature creep Cons: Labor intensive Extensive training Inexperience results in inconsistent results Weighted to file manipulation and transactions Systematic error introduced by single person, multiple raters advised Here’s a listing of the advantages and disadvantages to function points from B&Y pp Function points are certainly more difficult to fudge than SLOC since they address aspects of the application. The other emphasis is on data collection -- you are only as good as your historical data and if you use these techniques extensively you should endeavor to continue to collect data and tune the metrics from experience.

poor performance or poor estimates.
Easy? “When performance does not meet the estimate, there are two possible causes: poor performance or poor estimates. In the software world, we have ample evidence that our estimates stink, but virtually no evidence that people in general don’t work hard enough or intelligently enough.” -- Tom DeMarco Even with all the historical data, fancy statistical models, elaborate schemes for assessing difficulty and estimating effort, it is still not easy and many estimates fall short. I love the DeMarco quote - it puts the issue squarely where it belongs. Developers should not be punished for inaccurate estimates.

Capers Jones Expansion Table
The computer language you use also affects the estimation. This table by Capers Jones provides you with an idea of how lines expand, using a line of assembly language as the basic unit. Therefore a line in C is worth 2.5 lines of assembler and a line in a spreadsheet may represent up to 50 lines of assembler! It provides a metric to help compare apples and oranges. This also illustrates that the problem of estimation must be examined from many perspectives.

Bernstein’s Trends in Software Expansion
638 1000 475 142 113 81 75 100 47 37.5 Expansion Factor 30 15 10 Order of Magnitude Every Twenty Years 3 1 Technology Change 1960 Machine Instructions 1965 Macro Assembler 1970 High Level Language 1975 Database Manager 1980 On-line 1985 Prototyping 1990 Subsec Time Sharing 1995 Object Oriented Programming 2000 Large Scale Reuse Regression Testing Small Scale Reuse 4GL

Sizing Software Projects
Effort = (productivity)-1 (size)c Staff months 500 1000 Lines of Code or Function Points

Regression Models Effort: Watson-Felix: Effort = 5.2 KLOC 0.91
COCOMO: Effort = 2.4 KLOC 1.05 Halstead: Effort = 0.7 KLOC 1.50 Schedule: Watson-Felix: Time = 2.5E 0.35 COCOMO: Time = 2.5E 0.38 Putnam: Time = 2.4E 0.33 Note that for time estimates the exponents of the three models are real close as are the multipliers. Effort equations are a bit more variable but still similar (lower exponents, higher multipliers)

COCOMO COnstructive COst MOdel
Based on Boehm’s analysis of a database of 63 projects - models based on regression analysis of these systems Linked to classic waterfall model Effort is number of Source Lines of Code (SLOC) expressed in thousands of delivered source instructions (NCKSLOC) - excludes comments and unmodified software Original model has 3 versions and considers 3 types of systems: Organic - e.g.,simple business systems Embedded -e.g., avionics Semi-detached -e.g., management inventory systems Key is that these projects are relatively old although Boehm has done a great job of updating the model in newer versions, which we will see later. It is interesting and instructive to see how COCOMO - one of the classic estimation models - evolved.

COCOMO Model Effort in staff months =b*NCKSLOCc b c organic 2.4 1.05
semi-detached 3.0 1.12 embedded 3.6 1.20 As you may have anticipated, the farther down the table the larger the values of the variable in the equation thereby dramatically affecting effort. It is amazing across models how similar the values of the parameters are. (see next slide)

COCOMO System Types SIZE INNOVATION DEADLINE CONSTRAINTS Organic Small
Little Not tight Stable Semi-Detached Medium Embedded Large Greater Tight Complex hdw/customer interfaces These classifications are computed by mapping the project to this table. As you can infer, organic is the simplest and embedded the hardest.

Proposed System Proposed System Accounting Inventory OA&M Order
Customer Shipment Notice Create Order Check Status Proposed System OA&M Order Creation Users Assign Order to Truck Order Display Dispatch Support Problem Resolution Catalog Truckload Report Order Update Shipping Invoices Credit Check Orders Problem Resolution Dispatch Inventory Assignment Management Reporting Completion Held Order Processing Check Credit & Completion Assign Inventory to Order New Inventory for Held Orders Inventory Assigned Management Reports Accounting Inventory

Case Study:

Applying the equations
For 418 UFP x 63 (Java) SLOC/FP = SLOC ≈ 30 KSLOC How long will it take to develop? Module development is about 330 SLOC/staff month

Summary: Popular Methods for Effort Estimation
Parametric Estimation Wideband Delphi Cocomo SLIM (Software Lifecycle Management) SEER-SEM Function Point Analysis PROBE (Proxy bases estimation, SEI CMM) Planning Game (XP) Explore-Commit Program Evaluation and Review Technique (PERT)

Business Realities Customer Affordability/Willingness to pay: design the system to win the business Conflict of Interests: Project Manager (affordability and profit) vs. Development/Test (pad budgets) Estimation with uncertainty: the more you know (better understood)—the higher the estimate Personality Traits: risk aversion, tolerance for ambiguity Staffing Issues: sometimes any business is better than no business Opportunity Cost: Winning a bid prevents you from working on other deals Strategic Interests: Losing $ is sometimes ok

CS 540 – Quantitative Software Engineering

Similar presentations

Presentation on theme: "CS 540 – Quantitative Software Engineering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 540 – Quantitative Software Engineering

Similar presentations

Presentation on theme: "CS 540 – Quantitative Software Engineering"— Presentation transcript:

Similar presentations

About project

Feedback