Why Metrics in Software Testing? How would you answer questions such as: –Project oriented questions How long would it take to test? How much will it cost to test? –Product oriented questions How bad/good is the product ? How many problems still remain in the software? –Test activities oriented questions Will testing be completed on time? Was the testing effective? How much effort went into testing All these Questions require some type of measurements and record keeping in order to answer properly.
Some Basic Concepts on Measurement What do we need before we can measure something? –Clear understanding and definition of the attribute/characteristic that we are trying to gauge –The metric that may be used to gauge that attribute –The methodology for performing the measurement.
1.Clarifying the Attribute to be Measured Characterizing the attribute of interest –Size Attribute: Physical height is a size sub-attribute of many items. –Height of a building, person, tree not a problem –Height of a ball or ocean ? not comfortable? Why? Physical weight is a size sub-attribute of many items What is the size attribute for software? What does it address? –The source statements with screens? with db tables? –The storage space that the object code occupies in memory ? –Quality Attribute: For a car ? how fast it can accelerate? Number of times the car stalled? Number times the lights don’t work? For software? how many times we need to “re-boot”?, how good the screen looks? How many times we need to call help- line? Or (# of times not Meeting customer requirements)
2. Metric for Gauging the Attribute Metric – a unit used for describing or for measuring an attribute –Inches is a metric used for measuring the length attribute (simple metric) –Miles per hour is a metric for measuring the speed attribute (complex metric – requires 2 metrics) –Lines of code is a metric for measuring the size attribute of software (not a very good one) –Problems found per thousand lines of source code is a metric for defect discovery rate attribute of software. (complex – requires 2 metrics)
3. Conducting the Measurement Once the attribute is defined and the associated metric is defined, the actual methodology to determine the extent of an attribute using that metric has to be spelled out. –How do you measure the length of a person using inches? –How do you measure the distance from earth to the moon using inches? –How do you measure the size of the computer program using bytes? –How do you measure the defects in a program using problems found during program testing? ( note: problems found may be counted in many ways unique ones, accepted ones, etc.)
Some General, Test Measurements Time is used to measure the length of period expended for testing –Time to setup and conduct (run) a test or a set of tests Units of measurement in minutes or hours –Time to design and document test cases Units of measurement in minutes or hours Keeping track of time gives us one parameter to help us plan for future testing; but time must be balanced with the “size” of the test. –2 seconds to run a simple query –5 seconds to run a complete purchase transaction with confirmation “Size” of test is needed to make “time of test” more meaningful or conversely can amount of “test time” be used as a metric for size of test attribute?
Size of Test Test size attribute may use different metrics: –Amount of time to run test: Small size : less than or equal to 3 seconds Medium size: between 3 seconds and 1 minute Large size: 1 minute or above –Number of lines of statements to document the test case: Small size: less than or equal to 3 statements Medium size: between 4 and 7 statements Large size: 8 or more statements Any suggestions ?
Quality : # of Problems The attribute, Quality, is often measured with the metric of number of problems found; but number of problems alone does not tell the whole story consider –Severity of problems High Medium low –Type of problems UI Database Network outage Etc.
Quality (cont.) Both Severity and Type are important –# of problems found by severity –# of problems found by type –# of problems found when (when during development) –# of problems found when (months after release) –# of problems found where (UI,DB, Logic, Network, etc.) Quality Information is relevant to both: –Software providers –Customers/users Why important to users? What would they do with it?
Problem Find Rate Problem Find Rate # of Problems Found per hour Time Day 1 Day 2 Day 3 Day 4 Day 5 Problem Find Rate During Functional Test Does severity of problem matter here?
Problem Fix Rate Problem Fix Rate # of Problems Fixed per hour Time Day 1 Day 2 Day 3 Day 4 Day 5 Problem Find Rate During Functional Test Problem Fix Rate During Functional Test Would this fix rate present a problem ? Would you also want to keep a backlog # by day ?
Problem Density Density Area # of problems found per KLOC Module 1Module 2Module 3Module 4 Note: Just the # of problems found by area does not normalize the measurement; we need the per KLOC.
Test Coverage Rate Not all the planned test cases are actually run. –# of test cases executed / # of test cases planned By functional areas By test phases –# of source statements executed / total # of source statements By functional areas By modules
Test Activity Effectiveness Defect discovery and eradication activities occur at all phases of development. To see which is more effective one may use: –# of problems found / total # of problems found By development phase (req. rev., design rev., func. test, system, etc.) –# of problems found / person-days of effort By test activities
Fix Effectiveness Not all problem fixes resolve the problems. – # of fixes that worked / total # of fixes The first time – # of fixes that required more than 1 fix / total number of fixes
Fix Cost Fix cost is usually measured by amount of effort expended. –# of person-hours expended / fix By severity By areas By phase type (including post-release) If the fix cost for post-release is higher than that of all of the pre-release phases, then that will be one reason for test and reviews.
Problem Cost Comparison Effort expended in discovering a problem and the effort expended in fixing that problem is the “test” cost during pre-release. Effort expended in fixing a problem and releasing it to the customer is the support (problem resolution) cost during post release. Compare: (effort in people hours) effort expended / problem found and fixed.vs. effort expended / problem resolved
How “Big” is it (testing w/o fix) ? 1.# of test cases planned by size High – 35 Medium – 200 Low – 40 2.Average effort required to plan and test High – 1 person hour Medium – 15 person minutes Low – 5 minutes 3.How “Big” is Testing ? (35X60) + (200x15) + (40x5) = 5,330 person-minutes or person-hours In this case --- how big is testing? It is 275 test cases. It is person hours of effort. How would you answer this?
How Long Would it take? Use the same example of person-hours of test planning and execution effort. You need to make some assumptions: – assume 2 testers of about equal ability –split the work effort evenly – 88.33people-hours/2 people = hours –further assume that each person works 6 hours a day – hours/ 6hours-perday = 7.3 days So this will take 2 testers working 6 hours a day for 7.3 days