1 Quiz T/F: TQM is a clearly defined quality management process standard. Define the following: –Defect Rate –FPA –Ratio Scale –OO –Ordinal Scale List.

1 Quiz T/F: TQM is a clearly defined quality management process standard. Define the following: –Defect Rate –FPA –Ratio Scale –OO –Ordinal Scale List at least 5 quality parameters/attributes used to measure software quality (from the customer perspective) Why is LOC a “poor” measure of code size?

2 Project Sample OS X Project Replaced Carbon –and NeXT and Yellow Box and... Developers Customers The Media iCEO

10 Software Quality Engineering CS410 Class 3a Measurement Theory

11 Measurement Theory “It is an undisputed statement that measurement is crucial to the progress of all sciences” (Kan 1995) “Scientific progress is made through observations and generalizations based on data and measurements, the derivation of theories as a result, and in turn the confirmation or refutation of theories via hypothesis testing…” (Kan 1995)

12 Measurement Theory Basic measurement theory steps –Proposition an idea is proposed –Definition components of the idea are defined –Operational Definition operational characteristics of components are identified –Metric definition metrics are identified based on operational definition

13 Measurement Theory –Hypothesis definitions hypotheses are drawn from combination of proposition and definitions –Testing and metric gathering testing is performed and empirical data is collected –Confirmation or refutation of hypothesis hypotheses are confirmed or refuted based on analysis of empirical data

14 Measurement Theory Example: –Proposition - “the more rigorous the front end of the software development process is executed, the better the quality at the back end” –Definitions Front end SW process = design through unit test Back end SW process = integration through system test Rigorous implementation = total adherence to process (assume process designates 100% design and code inspections)

15 Measurement Theory –Operational Definitions Rigorous implementation can be measured by amount of design inspection, and lines of code (LOC) inspection Back end quality means low number of defects found in system test –Metric Definitions Design inspection coverage can be expressed as percentage of designs inspected LOC inspection coverage can be expressed as percentage of LOC inspected Back end quality can be expressed as defects per thousand lines of code (KLOC)

16 Measurement Theory –Hypothesis definition(s) The higher percentage of designs and code inspected, the lower the defect rate will be at system test. –Testing and metric gathering (multiple projects) Track and record inspection coverage Track and record defects found in system testing –Confirmation or refutation of hypothesis Analyze data Hypothesis supported?

17 Measurement Theory The operationalization (definition) process produces metrics and indicators for which data can be collected, and the hypotheses can be tested empirically. In other words - You have to gather, analyze and compare data to prove whether the hypothesis is true or not.

18 Level of Measurement How measurements are classified and compared: –Nominal Scale –Ordinal Scale –Interval Scale –Ratio Scale Scales are hierarchical, each higher level scale posses all of the properties of the lower ones. Operationalization should take advantage of highest level scale possible (I.e. don’t use low/medium/high if you can use 1…10)

19 Level of Measurement Nominal Scale –Lowest level scale –Classification of items (sort items into categories) –Two requirements Jointly exhaustive (all items can be categorized) Mutually exclusive (only one category applies) –Names of categories and sequence order bear no assumptions about relationships between categories –Example: Categories of SW dev: Waterfall, Spiral, Iterative, OO Does not imply that Waterfall is ‘better/greater’ than Spiral

20 Level of Measurement Ordinal Scale –Like nominal except comparison can be applied –But - we cannot determine magnitude of difference –Example: Categories of SW dev orgs based on CMM levels (1-5) We can state that dev orgs at level 2 are more mature then orgs at level 1, and so on... But we cannot state how much better 2 is than 1, or 3 is than 2, or 3 is than 1, and so on… –Likert rating scale often used at with this scale 1 = completely dissatisfied 2 = somewhat dissatisfied 3 = neutral 4 = satisfied 5 = completely satisfied

21 Level of Measurement Interval Scale –Like ordinal scale, except now we can determine exact differences between measurement points –Can use addition/subtraction expressions –Requires establishment of a well-defined, repeatable, unit of measurement –Example of interval scale Temperature in Fahrenheit (vs. cool, warm, hot) Day 1’s high temperature was 80 degrees Day 2’s high temperature was 87 degrees Day 2 was 7 degrees warmer than day 1 (addition) Day 1 was 7 degrees cooler than day 2 (subtraction)

22 Level of Measurement Ratio scale –Interval scale with absolute, non-arbitrary zero point –Highest level scale –Can use multiplication and division –Example MBNQA scores Company A scored 800 in the range of 0...1000 Company B scored 400 in the range of 0…1000 Company A doubled company B’s score (multiplication) Company B scored half as well as Company A (division)

23 Basic Measures Measures are ways of analyzing and comparing data to extract meaningful information. Data vs. Information –Data - raw numbers or facts –Information relevant - related to subject qualified - characteristics specified reliable - dependable, high confidence level Basic measures –Ratio –Proportion –Percentage –Rate

24 Basic Measures Ratio –Result of dividing one quantity by another –Best use is with two distinct groups –Numerator, denominator are mutually exclusive –Examples 1: Developers = 10, Testers = 5 Developer to Tester ratio = 10 / 5 x 100% = 200% –Example 2: Developers = 5, Testers = 10 Developer to Tester ratio = 5 / 10 x 100% = 50%

25 Basic Measures Proportion –Best use is with multiple categories within one group –For n categories (C) in the group (G) then –C 1 /G + C 2 /G … + C n /G = 1 –P of category = desired category / total group size Example –Number of customers surveyed = 50 –Number of satisfied customers = 30 –Proportion of satisfied customers = 30 / 50 or.6 –Proportion of unsatisfied customers = 20 / 50 or.4 –satisfied (.6) plus unsatisfied (.4) = 1

26 Basic Measures Percentage –A proportion expressed in terms of per hundred units –Percentages represent relative frequencies –Total number of cases should always be included –Total number of cases should be sufficiently large –Example 200 bugs found in 8 KLOC 30 requirements bugs (30 / 200 x 100%) = 15% 50 design bugs (50 / 200 x 100%) = 25% 100 code bugs (100 / 200 x 100%) = 50% 20 other bugs (20 / 200 x 100%) = 10%

27 Basic Measures Rate –Associated with dynamic changes of a quantity over time –Changes in y per each unit of x x is usually a quantity of time time unit of x must be expressed –Example Opportunity For Error = 5000 (1. based on 5KLOC) Number of defects = 200 (2. after one year) Defect rate = 200 / 5000 * 1K = 40 defects per KLOC Notes 1. - extremely had to determine OFE 2. - hard to know when to measure

28 Basic Measures Rate –Six Sigma –A specific defect rate of 3.4 defective parts per million (ppm) which has become an industry standard for the ultimate quality goal. –Sigma is the Greek symbol for standard deviation –By definition, if the variations in the process are reduced then it’s easier to obtain Six Sigma quality –Some problems arise in SW engineering What are the parts: lines of source code? lines of assembly code?

29 Reliability Reliability - consistency of a number of measurements taken using the same measurement method on the same subject High degree of reliability - repeated measurements are consistant Low degree of reliability - repeated measurements have large variations Operational definitions (specifics of how measurement is taken) are key to achieving high degrees of reliability

30 Validity Validity is whether the measurement really measures what is intended to be measured –Construct Validity - validity of a metric to represent a theory Difficult to validate abstract concepts Example: Concept - Intelligent people attend college Measurement - Sum college enrollment Conclusion - “The sum of the college enrollment is the number of intelligent people” - Not valid

31 Validity –Criterion-related (predictive) Validity - validity of a metric to predict a theory or relationship Example: Concept - Safe driving requires knowledge of the rules and regulations Measurement - Drivers license test Conclusion - Those that have low scores on driver’s license tests are more likely to have an accident -Content Validity - the degree to which a metric covers the meaning of the concept Example - A general math knowledge test needs to include more than just addition and subtraction.

32 Measurement Errors Two types of measurement Errors –Systematic Errors - errors associated with validity –Random Errors - errors associated with reliability Example: A bathroom scale which is off by 10 pounds Each time scale is used the reading equals: actual weight + 10 pounds + variation true + systematic error + random error systematic error makes reading invalid random error makes reading unreliable

33 Measurement Errors Ways of assessing Reliability –Test/Restest - one or more retests are performed and results compared to previous tests May expose random errors –Alternative-form - acquire same measurements using alternate testing means May expose systematic errors

34 Correlation Correlation - a statistical method for assessing relationships among observed or empirical data sets If the correlation coefficient between two variables is weak, then there is no linear correlation (but there may be non-linear) Example - negative linear relationship between LOC inspected and defects shipped

35 Causality Identification of cause and effect relationships in experiments Three criteria for cause-effect: 1. Cause must precede effect 2. Two variables are empirically related (relationship can be measured) 3. Empirical relationship is direct (not coincidence, or in error)

36 Summary Operational definitions are valuable in determining levels and types of metrics to use Scales and measures have different characteristics and different intended uses Avoid using the wrong scale or measure Validity and Reliability represent measurement quality Correlation and Causality are goals of measurement (I.e. quest to identify and prove a cause-effect relationship)

37 Follow-up: List at least 5 quality parameters/attributes used to measure software quality from the customer perspective

38 Pop Quiz What is the difference between validity and reliability? Why are software development process models important to the study of software quality? Define Six Sigma Define MTTF T/F Defect density and PUM combined represent a true measure of customer satisfaction. T/F If a hypothesis is refuted, then the wrong metrics were used.

39 Software Quality Engineering CS410 Class 3b Product Quality Metrics Process Quality Metrics Function Point analysis

40 Software Quality Metrics Three kinds of Software Quality Metrics –Product Metrics - describe the characteristics of product size, complexity, design features, performance, and quality level –Process Metrics - used for improving software development/maintenance process effectiveness of defect removal, pattern of testing defect arrival, and response time of fixes –Project Metrics - describe the project characteristics and execution number of developers, cost, schedule, productivity, etc. fairly straight forward

41 Software Quality Metrics Product Metrics –Mean Time to Failure (MTTF) –Defect Density –Problems per User Month (PUM) –Customer Satisfaction Process Project Metrics –Defect density during machine testing –Defect arrival patterns during machine testing –Phased-based defect removal –Defect removal effectiveness

42 Software Quality Metrics Some terminology: –Error - a human mistake that results in incorrect (or incomplete) software faulty requirement, design flaw, coding error –Fault (a.k.a. defect) - a condition within the system that causes a unit of the system to not function properly GPF, Abend, crash, lock-up, dead-lock, error message, etc. –Failure - required function (I.e. the goal) cannot be performed An error results in a fault which may cause one or more failures.

43 MTTF Mean Time To Failure (MTTF) - measures how long the software can run before it encounters a “crash” Difficult measurement to obtain because it’s tied to the “real” use of the product Easier to define requirements for special purpose software than general use software MTTF is not widely used by commercial software developers for these reasons

44 Defect Density Defect Density (a.k.a. Defect Rate) - is the number of estimated defects Estimated because defects are found throughout the entire life-cycle of the product Important for cost and resource estimates for the maintenance phase of the life cycle

45 Defect Density More specific –Defect Density (rate) = number of defects / opportunities for errors during a specified time –Number of defects can be approximated as equal to the number of unique causes of observed failures –Opportunities for error can be expressed as KLOC –Time frame (life of product or LOP) varies

46 Defect Density Defect Density Example –Product is one year old, and is 10 KLOC Unique causes of observed failures = 50 Current Defect Density = 50/10K x 1K = 5 defects per KLOC per year –After second year Unique causes of observed failures = 75 Current Defect Density = 75 / 10K x 1K = 7.5 defects per KLOC per 2year or 3.75 per KLOC per year

47 Defect Density Comparison Issues –How LOC is calculated Count only executable lines –Note - what is an executable line?? HLL vs. Assembler Count executable lines, plus data definitions Count executable lines, plus data definitions, plus comments Count executable lines, plus data definitions, plus comments, plus job control language Count physical lines Count logical lines (terminated by ‘;’) Function Point Analysis (FPA) is an alternative measure of program size

48 Defect Density Comparison Issues (cont.) –Timeframes must be the same Cannot compare (current) defect rate for a one year old product to the (current) defect rate of a four year old product IBM considers life of product to be 4 years –Must account for new and modified code in LOC count (otherwise metric is skewed) –LOC counting must remain consistent –Defect rate should be calculated for each release (must use change flags)

49 Defect Density Change Flags Example: /* Module A - Prolog */ /* Release 1.1 modifications 12/01/97 @R11 */ /* Fix for problem report #1127 03/15/98 @F1127 */... Total_Records = 0; /* Init records @R11A */... Bad_Records = Total_Records - Good_Records; /* Calculate num bad recs @F1127C */ Flags (a.k.a. Change Control) - CMM level 2+ A - line added by release/fix C - line changed by release/fix M - line moved by release/fix D - line deleted by release/fix (optional)

50 Defect Density IBM Example: SSI (current release) = SSI (previous release) + CSI - Deleted - Changed SSI - Shipped Source Instructions CSI - Changed (and new) Source Instructions Defect Rate Metrics for Current Release: TVUA/KSSI - all APARS (defects) reported on the total release (inclusive of previous release) TVUA/KCSI - all APARS (defects) reported on the new release code –APAR - Authorized Program Analysis Report (Severity 1-4) –TVUA - Total Valid Unique Apars

51 Customer Problem Metrics In addition to valid defects, other issues are viewed as ‘problems’ by customers: –Usability –Unclear documentation/information –Missing documentation/information –Duplicate problems (counted as invalid) –User errors (traps)

52 Customer Problem Metrics From customers’ perspective, the total problem space is the combination of the defect-oriented problems and the non-defect-oriented problems. They all impact the customer, regardless of how the SW company classifies them. Total problems can be expressed as Problems per User Month (PUM) PUM = Total Problems / License-Months License-Months = Total number of licenses x number of months in calculation period

53 Customer Problem Metrics PUM example: Total defects = 75, Licenses = 50, Months = 6 PUM = 75 / (50 x 6) =.25 problems/user month PUM is usually calculated for each month after a software release, and averaged for the year. Note - PUM counts a defect multiple times, depending on how pervasive it is (I.e. mainstream function defects are costly) Ways to lower PUM: –Improve the development process to reduce defects –Reduce non-defect-oriented problems (better documentation, usability, etc.) –Increase the number of licenses (?!)

54 Customer Satisfaction PUM and Defect Rate are not true measurements of customer satisfaction, but they do contribute. Timing, availability, company image, services, and (customized) customer solutions also contribute. Customer satisfaction is usually measured using the five point (Likert scale), via a customer survey 1. - Very dissatisfied 2. - Dissatisfied 3. - Neutral 4. - Satisfied 5. - Very satisfied

55 Customer Satisfaction Common metrics for Customer Satisfaction: –Percent of very satisfied customers –Percent of satisfied customers (very satisfied and satisfied) –Percent of dissatisfied customers (dissatisfied and very dissatisfied) –Percent of non-satisfied (neutral, dissatisfied, and very dissatisfied) Scope of three quality metrics (defects, customer problems, customer satisfaction). Fig. 4.1 p. 94

56 Defect Density During Machine Testing Machine Testing - testing after code is integrated into the system library (I.e. integration testing, function testing, system testing, regression testing) Commonly held beliefs: –There is a positive correlation between defect rates found during testing and the number of defects injected during development. –There is a positive correlation between the defect rates found during testing and the defect rate once product is released. Counter argument: Better testing will uncover more defects (I.e. maybe the code is just being tested better)

57 Defect Density During Machine Testing Release quality: If defect rate during testing is the same or lower than previous release then: If current release testing is worse then: testing needs to be improved (inconclusive about quality) Else if release testing is the same (or better): the quality is better than previous release If defect rate during testing is higher than previous release then: If testing process was improved then: the quality is the same or better then previous release Else if testing process was not improved then: the quality is worse than previous release (more defects)

58 Defect Arrival Rate During Machine Testing Defect arrival rate provides more information to supplement the defect density metric This metric is a view of the patterns and time between defects. Different arrival patterns (can) indicate different quality levels in the software. Objective - to see declining and stabilizing arrival rates over time –Supports the idea of “shake-out” testing where you attempt to find all the highest level bugs first so that additional testing is not impacted.

59 Defect Arrival Rate During Machine Testing Three different metrics for arrival rate: –Raw defect (includes duplicates, and invalids) arrivals during testing phase per some time interval (day, week, month, etc). –Valid defect arrivals during testing phase per some time interval –Defect backlog over time. This is a measure of workload which could adversely affect quality.

60 Phased-Based Defect Removal Pattern An extension of defect density metric. Defects are tracked at all (inspection/test) phases of development cycle (design reviews, code reviews, unit test, integration test, function test, and system test). This metric can be correlated to inspection coverage, and test coverage metrics. Helps to identify the overall defect removal ability of the development process. Fig. 4.3 p. 103

61 Defect Removal Effectiveness Defect Removal Effectiveness (DRE): DRE = (Defects removed in the phase / defects latent in product) x 100% Where the latent defects can be calculated as the sum of all defects found in later phases, and the field (this is a constantly changing number)

62 Defect Removal Effectiveness Example - Defects per phase: HLD (I0) review I0 = 5 (found= 5, latent=4, total=9), DRE=(5/9x100%)=55% LLD (I1) review I0 = 3, I1 = 4 (found = 4, latent=6, total=10), DRE=(4/10x100%)=40% Code inspection (I2) I0 = 1, I1 = 1, I2 = 10 (found = 10, latent=6, total=16), DRE=(6/16x100%)=38% Unit Test (UT) I0 = 0, I1 = 1, I2 = 5, UT = 3 (found = 3, latent=1, total=4), DRE = (3/4x100%) = 75% Component Test (CT) I0 = 0, I1 = 0, I2 = 1, UT = 1, CT = 3 (found = 3, latent=1, total=4), DRE=(3/4x100%)= 75% System Test (ST) I0 = 0, I1 = 0, I2 = 0, UT = 0, CT = 1, ST = 2 (found = 2, latent=1, total=3), DRE = (2/3x100%) = 67% Field = 2 I0 = 0, I1 = 0, I2 = 0, UT = 0, CT = 0, ST = 1

63 Defect Removal Effectiveness Example - Defects per phase: HLD (I0) review I0 = 5 (found= 5, latent=4, total=9), DRE=(5/9x100%)=55% LLD (I1) review I0 = 3, I1 = 4 (found = 4, latent=6, total=10), DRE=(4/10x100%)=40% Code inspection (I2) I0 = 1, I1 = 1, I2 = 10 (found = 10, latent=6, total=16), DRE=(10/16x100%)=62.5% Unit Test (UT) I0 = 0, I1 = 1, I2 = 5, UT = 3 (found = 3, latent=1, total=4), DRE = (3/4x100%) = 75% Component Test (CT) I0 = 0, I1 = 0, I2 = 1, UT = 1, CT = 3 (found = 3, latent=1, total=4), DRE=(3/4x100%)= 75% System Test (ST) I0 = 0, I1 = 0, I2 = 0, UT = 0, CT = 1, ST = 2 (found = 2, latent=1, total=3), DRE = (2/3x100%) = 67% Field = 2 I0 = 0, I1 = 0, I2 = 0, UT = 0, CT = 0, ST = 1 Found Latent

64 Defect Removal Effectiveness Notes Must account for where a defect was introduced. As number of field bugs increases DRE must be recalculated. Latent - present but not evident (at this phase).

65 Function Point Analysis (FPA) Alternative size measure to LOC Can measure productivity (function points per person), and quality (defects per function point) Idea: The defect rate should be measured against how many functions the software provides Functionality is independent of code size

66 Function Point Analysis (FPA) Function Points is a weighted total of five major components –External inputs x 4 –External outputs x 5 –Logical internal files x 10 –External interface files x 7 –External inquiries x 4

67 Function Point Analysis (FPA) Low and high weighting factors are used to account for complexity –External inputs, low = 3, high = 6 –External outputs, low = 4, high = 7 –Logical internal files, low = 7, high = 15 –External interface files, low = 5, high = 10 –External inquiries, low = 3, high = 6 Function Count (FC) is then calculated –FC = sum of each component

68 Function Point Analysis (FPA) 14 system characteristics are then accessed for impact on scale of 0 to 5 1. Data communications 2. Distributed functions 3. Performance 4. Heavily used configuration 5. Transaction rate 6. On-line data entry 7. End-user efficiency 8. On-line update 9. Complex processing 10. Reusability 11. Installation ease 12. Operational ease 13. Multiple sites 14. Facilitation of change

69 Function Point Analysis (FPA) Value Adjustment Factor (VAF) then calculated (a.k.a Processing Complexity Adjustment) VAF = 0.65 + (0.01 x C) where C the sum of all the complexity ratings Then Function Points (FP) are calculated FP = FC x VAF The resulting value is the function point rating for the software. This number can also be used to convert to a LOC rating for comparison reasons.

70 Summary Product Quality Metrics - focus on quality aspects of product, both intrinsic and from customer view point –Mean Time To Failure –Defect Density –Problems per User Month –Customer Satisfaction

71 Summary (cont.) Process quality metrics - focus on quality and effectiveness of the process. –Defect density during machine testing –Defect arrival rate during machine testing –Phased based defect removal –Defect removal effectiveness Function Point analysis –An alternative method to LOC counting

1 Quiz T/F: TQM is a clearly defined quality management process standard. Define the following: –Defect Rate –FPA –Ratio Scale –OO –Ordinal Scale List.

Similar presentations

Presentation on theme: "1 Quiz T/F: TQM is a clearly defined quality management process standard. Define the following: –Defect Rate –FPA –Ratio Scale –OO –Ordinal Scale List."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Quiz T/F: TQM is a clearly defined quality management process standard. Define the following: –Defect Rate –FPA –Ratio Scale –OO –Ordinal Scale List.

Similar presentations

Presentation on theme: "1 Quiz T/F: TQM is a clearly defined quality management process standard. Define the following: –Defect Rate –FPA –Ratio Scale –OO –Ordinal Scale List."— Presentation transcript:

Similar presentations

About project

Feedback