Topics in Testing We’ve Covered

Slides:



Advertisements
Similar presentations
Software Testing. Quality is Hard to Pin Down Concise, clear definition is elusive Not easily quantifiable Many things to many people You'll know it when.
Advertisements

SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
(c) 2007 Mauro Pezzè & Michal Young Ch 9, slide 1 Test Case Selection and Adequacy Criteria.
Software Testing.
1 Basic Definitions: Testing What is software testing? Running a program In order to find faults a.k.a. defects a.k.a. errors a.k.a. flaws a.k.a. faults.
(Quickly) Testing the Tester via Path Coverage Alex Groce Oregon State University (formerly NASA/JPL Laboratory for Reliable Software)
Working Software (Testing) Today’s Topic – Why testing? – Some basic definitions – Kinds of testing – Test-driven development – Code reviews (not testing)
1 Today Another approach to “coverage” Cover “everything” – within a well-defined, feasible limit Bounded Exhaustive Testing.
(c) 2007 Mauro Pezzè & Michal Young Ch 10, slide 1 Functional testing.
Testing an individual module
Software Testing. “Software and Cathedrals are much the same: First we build them, then we pray!!!” -Sam Redwine, Jr.
Basic Definitions: Testing
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
SOFTWARE TESTING WHITE BOX TESTING 1. GLASS BOX/WHITE BOX TESTING 2.
1 Joe Meehean. 2 Testing is the process of executing a program with the intent of finding errors. -Glenford Myers.
Terms: Test (Case) vs. Test Suite
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
System/Software Testing
TESTING.
University of Palestine software engineering department Testing of Software Systems Fundamentals of testing instructor: Tasneem Darwish.
CMSC 345 Fall 2000 Unit Testing. The testing process.
Verification and Validation Overview References: Shach, Object Oriented and Classical Software Engineering Pressman, Software Engineering: a Practitioner’s.
Testing CSE 140 University of Washington 1. Testing Programming to analyze data is powerful It’s useless if the results are not correct Correctness is.
Introduction to Software Testing
Coverage Literature of software testing is primarily concerned with various notions of coverage Four basic kinds of coverage: Graph coverage Logic coverage.
1 Software testing. 2 Testing Objectives Testing is a process of executing a program with the intent of finding an error. A good test case is in that.
Dr. Tom WayCSC Testing and Test-Driven Development CSC 4700 Software Engineering Based on Sommerville slides.
CSE403 Software Engineering Autumn 2001 More Testing Gary Kimura Lecture #10 October 22, 2001.
Test Coverage CS-300 Fall 2005 Supreeth Venkataraman.
1 Program Testing (Lecture 14) Prof. R. Mall Dept. of CSE, IIT, Kharagpur.
Introduction to Software Testing Paul Ammann & Jeff Offutt Updated 24-August 2010.
Introduction to Software Testing. OUTLINE Introduction to Software Testing (Ch 1) 2 1.Spectacular Software Failures 2.Why Test? 3.What Do We Do When We.
Software Engineering 2004 Jyrki Nummenmaa 1 BACKGROUND There is no way to generally test programs exhaustively (that is, going through all execution.
Testing. Today’s Topics Why Testing? Basic Definitions Kinds of Testing Test-driven Development Code Reviews (not testing) 1.
Chapter 1: Fundamental of Testing Systems Testing & Evaluation (MNN1063)
1 Basic Definitions: Testing What is software testing? Running a program In order to find faults a.k.a. defects a.k.a. errors a.k.a. flaws a.k.a. faults.
Software Construction Lecture 19 Software Testing-2.
1 Working Software (Testing) Today’s Topic Why testing? Some basic definitions Kinds of testing Test-driven development Code reviews (not testing) Today.
Testing CSE 160 University of Washington 1. Testing Programming to analyze data is powerful It’s useless (or worse!) if the results are not correct Correctness.
Chapter 1 Software Engineering Principles. Problem analysis Requirements elicitation Software specification High- and low-level design Implementation.
Software Quality Assurance and Testing Fazal Rehman Shamil.
1. Black Box Testing  Black box testing is also called functional testing  Black box testing ignores the internal mechanism of a system or component.
Dynamic Testing.
Workshop on Integrating Software Testing into Programming Courses (WISTPC14:2) Friday July 18, 2014 Introduction to Software Testing.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
SOFTWARE TESTING LECTURE 9. OBSERVATIONS ABOUT TESTING “ Testing is the process of executing a program with the intention of finding errors. ” – Myers.
Testing (final thoughts). equals() and hashCode() Important when using Hash-based containers class Duration { public final int min; public final int sec;
1 Software Testing. 2 What is Software Testing ? Testing is a verification and validation activity that is performed by executing program code.
Introduction to Software Testing (2nd edition) Chapter 5 Criteria-Based Test Design Paul Ammann & Jeff Offutt
Paul Ammann & Jeff Offutt
Software Testing.
Testing Verification and the Joy of Breaking Code
Software Testing.
Software Testing.
Dr. Eng. Amr T. Abdel-Hamid
CompSci 230 Software Construction
Input Space Partition Testing CS 4501 / 6501 Software Testing
Verification and Testing
Types of Testing Visit to more Learning Resources.
Testing UW CSE 160 Spring 2018.
UNIT-4 BLACKBOX AND WHITEBOX TESTING
Software Testing (Lecture 11-a)
Testing UW CSE 160 Winter 2016.
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
CSE403 Software Engineering Autumn 2000 More Testing
UNIT-4 BLACKBOX AND WHITEBOX TESTING
Software Testing.
Presentation transcript:

Topics in Testing We’ve Covered Black box (Finite State Machine) testing Design for testability Coverage measures Random testing Constraint-based testing Debugging and test case minimization Using model checkers for testing Coverage revisited (“small model property”)

Topics in Testing We’ve Covered Black box (Finite State Machine) testing There “are no Turing machines” Vasilevskii and Chow algorithm for conformance testing based on spanning trees and distinguishing sets Exhaustive testing that cannot miss bugs is often computationally intractable a a b d

Topics in Testing We’ve Covered Design for testability Controllability and observability Simulation and stubbing, assertions, downward scalability, etc.

Topics in Testing We’ve Covered Coverage measures Not necessarily correlated with fault detection! Still useful! Graph coverage: node and edge (statement and branch coverage) Logic coverage Input space partitioning Syntax-based coverage 4 1 2 3 x >= y x < y x = y y = 0 x = x + 1 ((a <= b) && !G) || (x >= y)

Topics in Testing We’ve Covered Random testing Generate inputs at random Explore very large numbers of executions Relies on a good automatic test oracle Feedback to bias choices away from redundant and irrelevant inputs is useful Good baseline for evaluating other methods, and often very effective

Topics in Testing We’ve Covered Constraint-based testing Addresses weaknesses of random testing E.g., finding needles in haystacks, such as where hash(x) = y Combines concrete and symbolic execution to generate inputs Concrete execution helps where symbolic solvers choke

Topics in Testing We’ve Covered Debugging and test case minimization Automatic minimization of test cases is very valuable for debugging and reducing regression suite size Debugging can be considered as an application of the scientific method Various techniques exist for using test cases to localize faults

Topics in Testing We’ve Covered Using model checkers for testing Testing based on states, rather than on executions or paths Use abstractions to reduce state space Use automatic instrumentation to handle the engineering difficulties

NOW BEGINS THE REVIEW Hang onto your hats It’s going to be a fast ride Anything in these slides is fair game for the test: anything not even mentioned in these slides is not fair game (so I’ll mention valgrind right now to let you know it might show up…) So ask questions as we go if something is unclear (so that you think even re-reading the slides isn’t going to help)

Basic Definitions: Testing What is software testing? Running a program In order to find faults a.k.a. defects a.k.a. errors a.k.a. flaws a.k.a. faults a.k.a. BUGS

Testing What isn’t software testing? Purely static analysis: examining a program’s source code or binary in order to find bugs, but not executing the program Good stuff, and very important, but it’s not testing We’ll get back to this in a future class Fuzzy borderline: if we only symbolically execute the program For this class, we’ll call it testing when the program actually runs (but maybe in a virtual machine)

Why Testing? Ideally: we prove code correct, using formal mathematical techniques (with a computer, not chalk) Extremely difficult: for some trivial programs (100 lines) and many small (5K lines) programs Simply not practical to prove correctness in most cases – often not even for safety or mission critical code

Why Testing? Nearly ideally: use symbolic or abstract model checking to prove the system correct Automatically extracts a mathematical abstraction from a system Proves properties over all possible executions In practice, can work well for very simple properties (“this program never crashes in this particular way”), but can’t handle complex properties (“this is a working file system”) Doesn’t work well for programs with complex data structures (like a file system)

Why Does Testing Matter? Ariane 5: exception-handling bug : forced self destruct on maiden flight (64-bit to 16-bit conversion: about 370 million $ lost) NIST report, “The Economic Impacts of Inadequate Infrastructure for Software Testing” (2002) Inadequate software testing costs the US alone between $22 and $59 billion annually Better approaches could cut this amount in half Major failures: Ariane 5 explosion, Mars Polar Lander, Intel’s Pentium FDIV bug Insufficient testing of safety-critical software can cost lives: THERAC-25 radiation machine: 3 dead We want our programs to be reliable Testing is how, in most cases, we find out if they are Mars Polar Lander crash site? THERAC-25 design

Testing and Monitoring In this class, we’ll look at which executions of a program to run I’ll call this problem “the” testing problem Second problem: how do we know if an execution reveals a bug? Key question when monitoring deployed programs to handle faults or send in bug reports from the field I’ll (mostly) take this for granted: we have a reference model or assertions to check

Example: File System Testing How hard would it be to just try “all” the possibilities? Consider only core 7 operations (mkdir, rmdir, creat, open, close, read, write) Most of these take either a file name or a numeric argument, or both Even for a “reasonable” (but not provably safe) limitation of the parameters, there are 26610 executions of length 10 to try Not a realistic possibility (unless we have 1012 years to test)

The Testing Problem This is a primary topic of this class: what “questions” do we pose to the software, i.e., How do we select a small set of executions out of a very large set of executions? Fundamental problem of software testing research and practice An open (and essentially unsolvable, in the general case) problem

Terms: Verification and Validation These two terms appear a lot, often in vague or sloppy ways, in the literature Verification is checking that a program matches a specification Validation is making sure it meets the original requirements – satisfies customers, operates ok onboard the spacecraft, etc. Verification: “you built it right” Validation: “you built the right thing” (our focus, for the most part)

Terms: Unit, Integration, System Testing Stages of testing Unit testing is the first phase, done by developers of modules Integration testing combines unit tested modules and tests how they interact System testing tests a whole program to make sure it meets requirements “Design testing” is testing prototypes or very abstract models before implementation – seldom mentioned, but when possible it can save your bacon Exhaustive model checking may be possible at this stage

Terms: Functional Testing Functional testing is a related term Tests a program from a “user’s” perspective – does it do what it should? Opposed to unit testing, which often proceeds from the perspective of other parts of the program Module spec/interface, not user interaction Sort of a fuzzy line – consider a file system – how different is the use by a program and use of UNIX commands at a prompt by a user? Building inspector does “unit testing”; you, walking through the house to see if its livable, perform “functional testing” Kick the tires vs. take it for a spin?

Terms: Regression Testing Changes can break code, reintroduce old bugs Things that used to work may stop working (e.g., because of another “fix”) – software regresses Usually a set of cases that have failed (& then succeeded) in the past Finding small regressions is an ongoing research area – analyze dependencies “. . . as a consequence of the introduction of new bugs, program maintenance requires far more system testing. . . . Theoretically, after each fix one must run the entire batch of test cases previously run against the system, to ensure that it has not been damaged in an obscure way. In practice, such regression testing must indeed approximate this theoretical idea, and it is very costly." - Brooks, The Mythical Man-Month

Terms: The Oracle Problem (oracle: a magical source of truth, often cryptic, given by the gods) The oracle problem How to know if a test fails If the oracle says every execution is good, why bother running the program? Some obvious, easily automated approaches: The program probably shouldn’t crash Assertions shouldn’t be violated Automatable, but more difficult to apply: Differential testing (McKeeman, etc.) – when you have another program, likely correct, that does the same thing, just compare outputs over same inputs Last resort, not automatable: Hand inspection of executions

Terms: Test (Case) vs. Test Suite Test (case): one execution of the program, that may expose a bug Test suite: a set of executions of a program, grouped together A test suite is made of test cases Tester: a program that generates tests Line gets blurry when testing functions, not programs – especially with persistent state

Terms: Black Box Testing Treats a program or system as a That is, testing that does not look at source code or internal structure of the system Send a program a stream of inputs, observe the outputs, decide if the system passed or failed the test Abstracts away the internals – a useful perspective for integration and system testing Sometimes you don’t have access to source code, and can make little use of object code True black box? Access only over a network

Terms: White Box Testing Opens up the box! (also known as glass box, clear box, or structural testing) Use source code (or other structure beyond the input/output spec.) to design test cases Brings us to the idea of coverage

Terms: Coverage Coverage measures or metrics Abstraction of “what a test suite tests” in a structural sense Best explained by giving examples Common measures: Statement coverage A.k.a line coverage or basic block coverage Which statements execute in a test suite Decision coverage Which boolean expressions in control structures evaluated to both true and false during suite execution Path coverage Which paths through a program’s control flow graph are taken in the test suite

Terms: Mutation Testing A mutation of a program is a version of the program with one or more random changes Mutation testing is another way to measure the quality of a test suite Amman and Offutt call it syntax-based coverage Idea: generate a large number of mutants Run the test suite on these If few mutants are detected, the test suite may not be very good Difficulties Cost of testing many versions of a program How to generate mutants (operators) In principle, can subsume many other forms of coverage

Faults, Errors, and Failures Fault: a static flaw in a program What we usually think of as “a bug” Error: a bad program state that results from a fault Not every fault always produces an error Failure: an observable incorrect behavior of a program as a result of an error Not every error ever becomes visible

To Expose a Fault with a Test Reachability: the test much actually reach and execute the location of the fault Infection: the fault must actually corrupt the program state (produce an error) Propagation: the error must persist and cause an incorrect output – a failure

Controllability and Observability Goals for a test case: Reach a fault Produce an error Make the error visible as a failure In order to make this easy the program must be controllable and observable Controllability: How easy it is to drive the program where we want to go Observability: How easy it is to tell what the program is doing

Design for Testability If a program is not designed to be controllable and observable, it generally won’t be We have to start preparing for testing before we write any code Testing as an after-the-fact, ad hoc, exercise is often limited by earlier design choices

Test-Driven Development One way to design for testability is to write the test cases before the code Idea arising from Extreme Programming and agile development Write automated test cases first Then write the code to satisfy tests Helps focus attention on making software well-specified Forces observability and controllability: you have to be able to handle the test cases you’ve already written (before deciding they were impractical) Reduces temptation to tailor tests to idiosyncratic behaviors of implementation

Controllability: Simulation and Stubbing A key to controllable code is effective simulation and stubbing Simulation of low-level hardware devices through a clean driver interface Real hardware may be slow May be impossible/expensive to induce some hardware failure modes on real hardware Real hardware may be a limited resource Stubbing for other routines and code Other code/modules may not be complete May be slow and irrelevant to test May need to simulate failure of other modules

Controllability: Downwards Scalability Another important aspect of controllability is to make code “downwards scalable” Many faults cause an error only in a corner case due to a resource limit An effective strategy for finding errors is to reduce the resource limits Test a version of the program with very tight bounds Finding corner cases is easier if the corners are close together Too many programs hard-code resource limits or make assumptions about resources unconnected to defined limits E.g., not checking the result of malloc

Observability: Assertions Assertions improve observability by making (some) errors into failures Even if the effect of a fault doesn’t propagate, it may be visible if an assertion checks the state at the right time Assertions also improve observability by making the error, rather than failure, visible Know how the state was corrupted directly, not just eventual effect

Observability: Invariant Checkers Can extend the idea of assertions to writing “full” invariant checkers Do a crawl of code’s basic data structures Check various invariants that would be too expensive to check at runtime Invariant checker can be written to be easy-to-use: recursion, memory allocation, etc. Won’t run on actual system But be careful! If your invariant checker has a bug and changes the system state. . .

Graph Coverage Cover all the nodes, edges, or paths of some graph related to the program Examples: Statement coverage Branch coverage Path coverage Data flow (def-use) coverage Model-based testing coverage Many more – most common kind of coverage, by far

Statement/Basic Block Coverage if (x < y) { y = 0; x = x + 1; } else x = y; Statement coverage: Cover every node of these graphs 4 1 2 3 x >= y x < y x = y y = 0 x = x + 1 3 1 2 x >= y x < y y = 0 x = x + 1 if (x < y) { y = 0; x = x + 1; } Treat as one node because if one statement executes the other must also execute (code is a basic block)

Branch Coverage if (x < y) { y = 0; x = x + 1; } else x = y; Branch coverage vs. statement coverage: Same for if-then-else 4 1 2 3 x >= y x < y x = y y = 0 x = x + 1 3 1 2 x >= y x < y y = 0 x = x + 1 if (x < y) { y = 0; x = x + 1; } But consider this if-then structure. For branch coverage can’t just cover all nodes, but must cover all edges – get to node 3 both after 2 and without executing 2!

Path Coverage How many paths through this code are there? Need one test case for each to get path coverage if (x < y) { y = 0; x = x + 1; } else x = y; 4 1 2 3 x >= y x < y x = y y = 0 x = x + 1 To get statement and branch coverage, we only need two test cases: 1 2 4 5 6 and 1 3 4 6 6 4 5 x >= y x < y y = 0 x = x + 1 Path coverage needs two more: 1 2 4 5 6 1 3 4 6 1 2 4 6 1 3 4 5 6 In general: exponential in the number of conditional branches!

Data Flow Coverage 1 2 3 4 4 5 6 7 x = 3; y = 3; if (w) { x = y + 2; } if (z) { y = x – 2; n = x + y x = 3 Def(x) Annotate program with locations where variables are defined and used (very basic static analysis) 2 y = 3 Def(y) 5 3 4 !w w x = y + 2 Def-use pair coverage requires executing all possible pairs of nodes where a variable is first defined and then used, without any intervening re-definitions Def(x) Use(y) 7 4 6 !z z y = x - 2 E.g., this path covers the pair where x is defined at 1 and used at 7: 1 2 3 5 6 7 Def(y) Use(x) May be many pairs, some not actually executable But this path does NOT: 1 2 3 4 5 6 7 n = x + y Use(x) Use(y)

((a>b) || G)) && (x < y) ((a <= b) && !G) || (x >= y) Logic Coverage What if, instead of: if (x < y) { y = 0; x = x + 1; } 1 ((a>b) || G)) && (x < y) y = 0 x = x + 1 2 ((a <= b) && !G) || (x >= y) 3 we have: if (((a>b) || G)) && (x < y)) { y = 0; x = x + 1; } Now, branch coverage will guarantee that we cover all the edges, but does not guarantee we will do so for all the different logical reasons We want to test the logic of the guard of the if statement

Active Clause Coverage ( (a > b) or G ) and (x < y) 1 T F T T 2 F F T F With these values for G and (x<y), (a>b) determines the value of the predicate duplicate 3 F T T T 4 F F T F With these values for (a>b) and (x<y), G determines the value of the predicate With these values for (a>b) and G, (x<y) determines the value of the predicate 5 T T T T 6 T T F F 43

Input Domain Partitioning Partition scheme q of domain D The partition q defines a set of blocks, Bq = b1 , b2 , … bQ The partition must satisfy two properties: blocks must be pairwise disjoint (no overlap) together the blocks cover the domain D (complete) b1 b2 b3 bi  bj = ,  i  j, bi, bj  Bq  b = D b  Bq Coverage then means using at least one input from each of b1, b2, b3, . . . 44

Syntax-Based Coverage Based on mutation testing (a pet topic of Amman and Offutt, who are heavily into this research area) Bit different kind of creature than the other coverages we’ve looked at Idea: generate many syntactic mutants of the original program Coverage: how many mutants does a test suite kill (detect)? 45

Generation vs. Recognition Generation of tests based on coverage means producing a test suite to achieve a certain level of coverage As you can imagine, generally very hard Consider: generating a suite for 100% statement coverage easily reaches “solving the halting problem” level Obviously hard for, say, mutant-killing Recognition means seeing what level of coverage an existing test suite reaches

Coverage and Subsumption Sometimes one coverage approach subsumes another If you achieve 100% coverage of criteria A, you are guaranteed to satisfy B as well For example, consider node and edge coverage (there’s a subtlety here, actually – can you spot it?) What does this mean? Unfortunately, not a great deal If test suite X satisfies “stronger” criteria A and test suite Y satisfies “weaker” criteria B Y may still reveal bugs that X does not! For example, consider our running example and statement vs. branch coverage It means we should take coverage with a grain of salt, for one thing

Levels of Testing Adapted from Beizer, by Amman and Offutt Level 0: Testing is debugging Level 1: Testing is to show the program works Level 2: Testing is to show the program doesn’t work Level 3: Testing is not to prove anything specific, but to reduce risk of using program Level 4: Testing is a mental discipline that helps develop higher quality software

What’s So Good About Coverage? Consider a fault that causes failure every time the code is executed Don’t execute the code: cannot possibly find the fault! That’s a pretty good argument for statement coverage int findLast (int a[], int n, int x) { // Returns index of last element // in a equal to x, or -1 if no // such. n is length of a int i; for (i = n-1; i >= 0; i--) { if (a[i] = x) return i; } return 0; }