Verification and Validation CSCI 5801: Software Engineering
Verification and Validation
Basic Facts about Errors 30-85 errors per 1000 lines of source code Extensively tested software contains 0.5-3 errors per 1000 lines of source code Error distribution: 60% design 40% implementation 66% of the design errors are not discovered until the software has become operational.
Faults vs. Failures Fault: a static flaw in a program What we usually think of as “a bug” Failure: an observable incorrect behavior of a program as a result of an error Not every fault ever leads to a failure (at least now) Good goal: detect failures and correct before shipping Impossible in practice! Better goal: detect faults and correct before failures occur
Verification vs. Validation Verification: evaluate a product to see whether it satisfies the specifications: Have we built the system right? Validation: evaluate a product to see whether it actually does what the customer wants/needs: Have we built the right system? Key assumption: know desired result of the test! Verification: System passes all test cases Validation: Have correct test cases to begin with!
Verification vs. Validation Verification testing Discover faults in the software where its behavior is incorrect or not in conformance with its specification A successful makes the system perform incorrectly and so exposes a defect in the system Tests show the presence not the absence of defects Validation testing Demonstrate to the developer and the customer that the software meets requirements A successful test shows that the system operates as intended
Stages of Testing Unit testing is the first phase, done by developers of modules Integration testing combines unit-tested modules and tests how they interact System testing tests a whole program to make sure it meets requirements (including most nonfunctional) Acceptance testing by users to see if system meets actual user/customer requirements
Glass Box Testing Use source code (or other structure beyond the input/output specifications) to design test cases Also known as white box, clear box, or structural testing Unit testing: based on structure of individual methods Conditions, loops Data structures Integration testing: based on overall structure of system Which methods call other methods
Black Box Testing Based on external requirements Testing that does not look at source code or internal structure of the system Send a program a stream of inputs, observe the outputs, decide if the system passed or failed the test Based on external requirements Unit testing: Do the methods of a class meet requirements of API Integration/System testing: Does system as a whole meet requirements of RSD Abstracts away the internals – a useful perspective for integration and system testing
Test Suites Key goal: Create test suite of cases most likely to find faults in current code Problems: Cannot exhaustively try all possible test cases Random statistical testing (choosing input values at random) does not work either: Faults generally not uniformly distributed Related problem: How can we evaluate the “quality” of a test suite?
Test Suites Need systematic strategy for creating test suite Tests designed to find specific kinds of faults Best created by multiple types of people: Cases chosen by the development team are effective in testing known vulnerable areas (glass box) Cases chosen by experienced outsiders and clients are effective at checking requirements (black box) Cases chosen by inexperienced users can find other faults (validation, etc.)
Coverage-Based Testing Test suite quality often based on idea of coverage All requirements tested (black box testing) All parts of structure covered (glass box testing) Statement coverage (unit testing) At least one case should execute each statement Decision coverage (unit testing) At least one test case for true/false in control structures Path coverage (integration testing) All paths through a program’s control flow graph (aka sequence diagram) are taken in the test suite
Fault Modeling Idea: Many programs have similar faults regardless of purpose Example: Without knowing purpose of program, what would you try to “break” this: Non-numeric value (“Fred”) Non-integer (0.5) Negative number (-1) Too large (1000000000000000000) Too small (0.00000000000000001) Illegal characters (^C, ^D) Buffer overflow (10000000000 characters) Enter a number: © SE, Testing, Hans van Vliet
User Interface Attacks Try all types of unexpected input Test default values Try with no input Delete any default values in fields Illegal combinations Time from 1100 to 1000 1000 rows ok, 1000 columns ok, both not ok Repeat same command to force overflow Add 1000000 courses Force screen refresh Is everything still redrawn?
File System attacks Full storage Timeouts Invalid filenames Save to full diskette Timeouts Drive being used by other processes and not available Invalid filenames Save to “fred’s file.txt” Access permission Save to read-only device Wrong format Database with missing fields Fields in wrong format Files in wrong forms (XML vs. CSF, etc.)
Operating System Attacks No local memory available New command returns null pointer System overloaded Multiple apps running simultaneously, causing timeouts Unable to access external devices Network Peripheral devices May require fault injection software to simulate Fault Injection Software new System null
Fault Seeding How do we know that we have a “good” set of test cases capable of finding faults? Idea: Deliberately “seed” code with known faults Run test set to see if it finds the seeded faults Higher % of seeded faults found Higher confidence that test set finds actual faults © SE, Testing, Hans van Vliet
Mutation testing procedure insert(a, b, n, x); begin bool found:= false; for i:= 1 to n do if a[i] = x then found:= true; goto leave endif enddo; leave: if found then b[i]:= b[i] + 1 else n:= n+1; a[n]:= x; b[n]:= 1 endif end insert; n-1 In each variation, mutant, one simple change is made. 2 - © SE, Testing, Hans van Vliet
How to use mutants in testing If a test produces different results for one of the mutants, that mutant is said to be dead If a test set leaves us with many live mutants, that test set is of low quality If we have M mutants, and a test set results in D dead mutants, then the mutation adequacy score is D/M A larger mutation adequacy score means a better test set © SE, Testing, Hans van Vliet
Regression Testing Problem: Fixing bugs can introduce other errors 10% to 20% of bug fixes create new bugs Often happens as part of maintenance Developer who changes one module not familiar with rest of system Big problem: What if new bugs in code already tested? Module A tested first Fixing B creates bugs in A If A not retested, will be shipped with bugs! Module B tested second
Regression Testing Regression testing: Retesting with all test cases after any change to code Must do after any change to code Must do before checking modified code back into repository Must do before releasing code Comprehensive list of test cases Fix bugs Retest with all test cases Find bugs
Automated Testing Problem: May be thousands of tests! Too many for interactive debugging Goal: Automate comprehensive testing Run all test cases Notify developer of incorrect results Approaches: Creating testing “script” in driver Read test cases and desired results from file Use testing tools (JUnit)
JUnit Background Integrated into most Java IDEs (such as Netbeans) Will automatically generate “skeletons” to test each method Based on assert package (from C) http://www.junit.org/apidocs/junit/framework/Assert.html fail(message): Shuts down program and displays message to standard output assertEquals(message, value1, value2): Causes fail(message) if the values not equivalent
JUnit Testing Create object for testing Call methods to put in expected state Use inspector to get actual state Use assertEquals to compare to desired state
JUnit Testing Create method for each test to be performed Constructor state Normal methods Validation …
Test-Driven Development First write the tests, then do the design/ implementation Makes sure testing done as early as possible Perform testing throughout development (regression testing) Based on ideas from agile/XP Write (automated) test cases first Then write the code to satisfy tests
Test-Driven Development Add a test case that fails, but would succeed with the new feature implemented Run all tests, make sure only the new test fails Write code to implement the new feature Rerun all tests, making sure the new test succeeds (and no others break)
Test-Driven Development
Test-Driven Development Advantages: Helps insure all required features covered Will incrementally develop extensive regression test suite that covers all required features Since code written specifically for tests, insures good test coverage of all code
Test-Driven Development Disadvantages: Management must understand that as much time will be spent of writing tests as writing code Requires extensive validation of requirements, as any feature for which test case not created or incorrect will not be created correctly