Testing Xiaojun Qi
Testing There are two basic types of testing “V & V” Execution-based testing Non-execution-based testing “V & V” Verification: Determine if the workflow was completed correctly Validation: Determine if the product as a whole satisfies its requirements (specification) Term “Testing” is used instead of “V & V”
Software Quality The quality of software is the extent to which the product satisfies its specifications. Defect is a generic term for Fault: It happens when a human makes a mistake Failure: It is the observed incorrect behavior of the software product as a consequence of a fault Error: It is the amount by which a result is incorrect. Every software professional is responsible for ensuring that his or her work is correct Quality must be built in from the beginning
Quality Issues: Software Quality Assurance The members of the SQA group must ensure the quality of the software process and thereby ensure the quality of the product. Test at the end of each workflow Test when the product is complete Quality assurance must be applied to the process itself Develop various standards to which the software must conform Establish the monitoring procedures for assuring compliance with those standards.
Quality Issues: Managerial Independence There must be managerial independence between the development group and the SQA group Neither group should have power over the other More senior management must decide whether to Deliver the faulty product on time, or Test further and deliver the product late The decision must take into account the interests of the client and the development organization
Non-Execution-Based Testing Non-execution-based testing means that testing software without running test cases. Review software (Walkthrough or Inspection) is a common non-execution-based testing method. Its underlying principles are: We should not review our own work Group synergy: A document is painstakingly checked by a team of software professionals with a broad range of skills.
Non-Execution-Based Testing: Walkthroughs A walkthrough team consists of four to six members It includes representatives of The team responsible the current workflow The manager responsible for the current workflow The team responsible for the next workflow The SQA group The clients
Managing Walkthroughs The walkthrough is preceded by preparation A list of items Items not understood Items that appear to be incorrect The walkthrough team is chaired by the SQA representative since The quality of the product is a direct reflection of the professional competence of the SQA group.
Managing Walkthroughs (Cont.) In a walkthrough we detect faults for later correction, not correct them A correction produced by a committee is likely to be of low quality The cost of a committee correction is too high Not all items flagged as faults are actually incorrect A walkthrough should not last longer than 2 hours so there is no time to correct faults
Managing Walkthroughs (contd) Since verbalization leads to fault finding, there are two ways of conducting a walkthrough: Participant-driven Document-driven Document-driven is likely to be more thorough since: The majority of faults are spontaneously detected by the presenter A walkthrough should never be used for performance appraisal
Non-Execution-Based Testing: Inspections An inspection has five formal steps An overview of the document to be inspected The preparation: List the statistics of fault types ranked by frequency The inspection: Find and document the faults The rework: The individual responsible for the document resolves all faults and problems noted in the written report. The follow-up: Every issue has been resolved and all fixes must be checked to ensure that no new faults have been introduced.
Inspections (Cont.) An inspection team has four members Moderator: Both manager and leader A member of the team performing the current workflow A member of the team performing the next workflow A member of the SQA group IEEE standard also recommends: Reader: Leads the team through the design Recorder: Produce a written report of the detected faults.
An Essential Component of an Inspection: Fault Statistics Faults are recorded by severity Major faults: They cause premature termination or damage a database Minor faults Faults are recorded by fault type Examples of design faults: Not all specification items have been addressed Interface faults: Actual and formal arguments do not correspond Logic faults:
Use of Fault Statistics We compare current fault rates with those of previous products for a given workflow. We take action if there are a disproportionate number of faults of a particular type in an artifact Check other code artifacts and take corrective action Redesigning from scratch is a good alternative We carry forward fault statistics to the next workflow We may not detect all faults of a particular type in the current inspection
Successful Stories on Inspections IBM inspections showed up 82 percent of all detected faults (1976); 70 percent of all detected faults (1978); 93 percent of all detected faults (1986) Switching system 90 percent decrease in the cost of detecting faults (1986) JPL Four major faults, 14 minor faults per 2 hours (1990) ; Savings of $25,000 per inspection; The number of faults decreased exponentially by phase (1992) However, fault statistics should never be used for performance appraisal
Comparison of Inspections and Walkthroughs Walkthrough: Two-step, informal process Preparation Analysis Inspections: Five-step, formal process Overview Inspection Rework Follow-up
Strengths and Weaknesses of Reviews (Walkthrough or Inspection) A review is an effective way to detect a fault. Faults are detected early in the process Weaknesses: Reviews are less effective if the process is inadequate. The two solutions to this weakness are: Large-scale software should consist of smaller, largely independent pieces The documentation of the previous workflows has to be complete and available online
Metrics for Inspections Inspection rate: design pages inspected per hour Fault density: faults per page inspected or faults per 1000 lines of code (KLOC) inspected Fault detection rate: the number of faults detected per hour Fault detection efficiency: number of major and minor faults detected per person-hour Question: Does a 50 percent increase in the fault detection rate mean that Quality has decreased? Or The inspection process is more efficient?
Execution-Based Testing Organizations spend up to 50 percent of their software budget on testing But delivered software is frequently unreliable “Program testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence” (Dijkstra; 1972) The only information that can be deduced from the particular test is that the product runs correctly on that particular set of test data.
What Should Be Tested? Definition of execution-based testing “The process of inferring certain behavioral properties of the product based, in part, on the results of executing the product in a known environment with selected inputs” This definition has troubling implications “Inference”: We have a fault report, the source code, and often nothing else. This makes the inference difficult. “Known environment”: We never really can know our environment, either the hardware or the software “Selected inputs”: Sometimes we cannot provide the inputs we want Solution: Simulator is needed: A simulator is a working model of the environment in which the product executes.
Items Should be Tested: Utility The extent to which a user’s needs are met when a correct product is used under conditions permitted by its specifications Ease of use Useful functions Cost effectiveness
Items Should be Tested: Reliability A measure of the frequency and criticality of product failure Mean time between failures: How often the product fails Mean time to repair: How long it takes, on average, to repair the failure Time (and cost) to repair the results of a failure
Items Should be Tested: Robustness A function of The range of operating conditions The possibility of unacceptable results with valid input The effect of invalid input
Items Should be Tested: Performance The extent to which the product meets its constraints with regard to response time or space requirements. Real-time software is characterized by hard real-time constraints If data are lost because the system is too slow There is no way to recover those data
Items Should be Tested: Correctness A product is correct if it satisfies its output specifications, independent of its use of computing resources, when operated under permitted conditions. Technically, correctness is Not necessary Example: C++ compiler Not sufficient Example: trickSort
Importance of the Correctness of Specifications Incorrect specification for a sort: Function trickSort which satisfies this specification: Figure 6.1
Importance of the Correctness of Specifications (Cont.) Corrected specification for the sort:
Testing versus Correctness Proofs A correctness proof is a mathematical technique for showing that a product is correct, in other words, that it satisfies its specification. It is an alternative to execution-based testing
Example of a Correctness Proof The code segment to be proven correct Its corresponding flowchart
Add the followings: Input specification Output specification Loop invariant Assertions
Example of a Correctness Proof (Cont.) Loop Invariant: A mathematical expression that holds irrespective of whether the loop has been executed 0, 1, or many times. An informal proof (using induction) appears in Section 6.5.1. That is: given the input specification, it can be proven that loop invariant holds. Furthermore, it can be proven that after n iterations the loop terminates and the output specification is satisfied. The code segment therefore is mathematically proven to be correct.
Correctness Proof Mini Case Study Dijkstra (1972): “The programmer should let the program proof and program grow hand in hand” “Naur text-processing problem” (1969) Episode 1: Naur constructed a 25-line procedure and informally proved its correctness Episode 2: Reviewer in Computing Reviews found 1 fault in 1970. Episode 3: London found 3 more faults in 1971. Episode 4: Goodenough and Gerhart found 3 further faults in 1975. Lesson: Even if a product has been proved correct, it must STILL be tested.
Three Myths of Correctness Proving Software engineers do not have enough mathematics for proofs Most computer science majors either know or can learn the mathematics needed for proofs Proving is too expensive to be practical Economic viability is determined from cost–benefit analysis Proving is too hard Many nontrivial products have been successfully proved Tools like theorem provers can assist us
Difficulties with Correctness Proving Can we trust a theorem prover? How do we find input and output specifications, and loop invariants or their equivalents in other logics? What if the specifications are wrong? We can never be sure that specifications or a verification system are correct
Correctness Proofs and Software Engineering Correctness proofs are a vital software engineering tool. They are appropriate: When human lives are at stake When indicated by cost–benefit analysis When the risk of not proving is too great Also, informal proofs can markedly improve the quality of the product Use the assert statement
Who Should Perform Execution-Based Testing? Programmers should not test their own code artifacts since: Programming is constructive Testing is destructive A successful test finds a fault Solution: The programmer debugs the module The programmer does informal testing The SQA group then does systematic testing
Test Cases All test cases must be Planned beforehand, including the expected output, and Retained afterwards for regression testing Testing stops only when the product has been irrevocably discarded