1 Basic Definitions: Testing What is software testing? Running a program In order to find faults a.k.a. defects a.k.a. errors a.k.a. flaws a.k.a. faults.

Slides:

Advertisements

Similar presentations

Software Testing. Quality is Hard to Pin Down Concise, clear definition is elusive Not easily quantifiable Many things to many people You'll know it when.

Advertisements

Test process essentials Riitta Viitamäki,

Verification and Validation

White Box and Black Box Testing Tor Stålhane. What is White Box testing White box testing is testing where we use the info available from the code of.

SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.

Test-Driven Development and Refactoring CPSC 315 – Programming Studio.

1 Basic Definitions: Testing What is software testing? Running a program In order to find faults a.k.a. defects a.k.a. errors a.k.a. flaws a.k.a. faults.

Topics in Testing We’ve Covered

(Quickly) Testing the Tester via Path Coverage Alex Groce Oregon State University (formerly NASA/JPL Laboratory for Reliable Software)

1 Today Another approach to “coverage” Cover “everything” – within a well-defined, feasible limit Bounded Exhaustive Testing.

4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)

Software Testing. “Software and Cathedrals are much the same: First we build them, then we pray!!!” -Sam Redwine, Jr.

1 Today More on random testing + symbolic constraint solving (“concolic” testing) Using summaries to explore fewer paths (SMART) While preserving level.

Basic Definitions: Testing

1 Functional Testing Motivation Example Basic Methods Timing: 30 minutes.

Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.

Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.

Computer Programming and Basic Software Engineering 4. Basic Software Engineering 1 Writing a Good Program 4. Basic Software Engineering.

Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Verification and Validation.

CMSC 345 Fall 2000 Unit Testing. The testing process.

CS4311 Spring 2011 Unit Testing Dr. Guoqiang Hu Department of Computer Science UTEP.

What is Software Testing? And Why is it So Hard J. Whittaker paper (IEEE Software – Jan/Feb 2000) Summarized by F. Tsui.

Coverage – “Systematic” Testing Chapter 20. Dividing the input space for failure search Testing requires selecting inputs to try on the program, but how.

Introduction to Software Testing

Coverage Literature of software testing is primarily concerned with various notions of coverage Four basic kinds of coverage: Graph coverage Logic coverage.

Regression Testing. 2  So far  Unit testing  System testing  Test coverage  All of these are about the first round of testing  Testing is performed.

SWE 637: Test Criteria and Definitions Tao Xie Prepared based on Slides by ©Paul Ammann and Jeff Offutt Revised by Tao Xie.

Dr. Tom WayCSC Testing and Test-Driven Development CSC 4700 Software Engineering Based on Sommerville slides.

Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.

DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.

Grey Box testing Tor Stålhane. What is Grey Box testing Grey Box testing is testing done with limited knowledge of the internal of the system. Grey Box.

Test Coverage CS-300 Fall 2005 Supreeth Venkataraman.

637 – Introduction (Ch 1) Introduction to Software Testing Chapter 1 Jeff Offutt Information & Software Engineering SWE 437 Software Testing

Chapter 22 Developer testing Peter J. Lane. Testing can be difficult for developers to follow  Testing’s goal runs counter to the goals of the other.

White-box Testing.

1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.

1 Introduction to Software Testing. Reading Assignment P. Ammann and J. Offutt “Introduction to Software Testing” ◦ Chapter 1 2.

Introduction to Software Testing Paul Ammann & Jeff Offutt Updated 24-August 2010.

Introduction to Software Testing. OUTLINE Introduction to Software Testing (Ch 1) 2 1.Spectacular Software Failures 2.Why Test? 3.What Do We Do When We.

Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.

Software Engineering 2004 Jyrki Nummenmaa 1 BACKGROUND There is no way to generally test programs exhaustively (that is, going through all execution.

Software Construction Lecture 19 Software Testing-2.

Testing CSE 160 University of Washington 1. Testing Programming to analyze data is powerful It’s useless (or worse!) if the results are not correct Correctness.

1. Black Box Testing  Black box testing is also called functional testing  Black box testing ignores the internal mechanism of a system or component.

Dynamic Testing.

Workshop on Integrating Software Testing into Programming Courses (WISTPC14:2) Friday July 18, 2014 Introduction to Software Testing.

Week 6 MondayTuesdayWednesdayThursdayFriday Testing III Reading due Group meetings Testing IVSection ZFR due ZFR demos Progress report due Readings out.

SOFTWARE TESTING LECTURE 9. OBSERVATIONS ABOUT TESTING “ Testing is the process of executing a program with the intention of finding errors. ” – Myers.

Testing (final thoughts). equals() and hashCode() Important when using Hash-based containers class Duration { public final int min; public final int sec;

Introduction to Software Testing (2nd edition) Chapter 5 Criteria-Based Test Design Paul Ammann & Jeff Offutt

Testing Verification and the Joy of Breaking Code

Software Testing.

Software Testing.

Dr. Eng. Amr T. Abdel-Hamid

Software Testing and Maintenance 1

Input Space Partition Testing CS 4501 / 6501 Software Testing

Graph Coverage for Specifications CS 4501 / 6501 Software Testing

Types of Testing Visit to more Learning Resources.

Testing UW CSE 160 Spring 2018.

UNIT-4 BLACKBOX AND WHITEBOX TESTING

Introduction to Software Testing Chapter 2 Model-Driven Test Design

Software Testing (Lecture 11-a)

Fundamentals of Data Representation

Graph Coverage for Specifications CS 4501 / 6501 Software Testing

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

Control Structure Testing

UNIT-4 BLACKBOX AND WHITEBOX TESTING

Software Testing.

Presentation transcript:

1 Basic Definitions: Testing What is software testing? Running a program In order to find faults a.k.a. defects a.k.a. errors a.k.a. flaws a.k.a. faults a.k.a. BUGS Hrm... that’s a lot of “a.k.a”s Let’s refine this terminology a bit

2 Faults, Errors, and Failures Fault: a static flaw in a program What we usually think of as “a bug” Error: a bad program state that results from a fault Not every fault always produces an error Failure: an observable incorrect behavior of a program as a result of an error Not every error ever becomes visible

3 To Expose a Fault with a Test Reachability: the test much actually reach and execute the location of the fault Infection: the fault must actually corrupt the program state (produce an error) Propagation: the error must persist and cause an incorrect output – a failure

4 An Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == x) return i; } return -1; } Find the fault

5 An Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == x) return i; } return -1; } Here’s a test case: a = {} n = 0 x = 2 Does not even reach the fault

6 An Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == x) return i; } return -1; } Here’s another: a = {3, 9, 4} n = 3 x = 2 Reaches the fault Infects state with error But no failure

7 An Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == x) return i; } return -1; } And finally: a = {2, 9, 4} n = 3 x = 2 Reaches the fault Infects state with error And fails – returns -1 instead of 0

8 Controllability and Observability Goals for a test case: Reach a fault Produce an error Make the error visible as a failure In order to make this easy the program must be controllable and observable Controllability: How easy it is to drive the program where we want to go Observability: How easy it is to tell what the program is doing

9 Design for Testability If a program is not designed to be controllable and observable, it generally won’t be We have to start preparing for testing before we write any code Testing as an after-the-fact, ad hoc, exercise is often limited by earlier design choices

10 Test-Driven Development One way to design for testability is to write the test cases before the code Idea arising from Extreme Programming and agile development Write automated test cases first Then write the code to satisfy tests Helps focus attention on making software well-specified Forces observability and controllability: you have to be able to handle the test cases you’ve already written (before deciding they were impractical) Reduces temptation to tailor tests to idiosyncratic behaviors of implementation

11 Controllability: Simulation and Stubbing A key to controllable code is effective simulation and stubbing Simulation of low-level hardware devices through a clean driver interface Real hardware may be slow May be impossible/expensive to induce some hardware failure modes on real hardware Real hardware may be a limited resource Stubbing for other routines and code Other code/modules may not be complete May be slow and irrelevant to test May need to simulate failure of other modules

12 Simulation and Stubbing: JPL Example When testing JPL flash storage modules we rely on software simulation of flash devices Real flash devices are slow Can’t do aggressive random testing Real flash devices are expensive JPL only has a few boards – constant competition to test on these Running hundreds of thousand of tests will wear the flash hardware out Enables us to introduce rare hardware failures System resets, spontaneous bad blocks and write failures, etc.

13 Controllability: Downwards Scalability Another important aspect of controllability is to make code “downwards scalable” Many faults cause an error only in a corner case due to a resource limit An effective strategy for finding errors is to reduce the resource limits Test a version of the program with very tight bounds Finding corner cases is easier if the corners are close together Too many programs hard-code resource limits or make assumptions about resources unconnected to defined limits E.g., not checking the result of malloc

14 Downwards Scalability: JPL Example Flight flash hardware is usually 1-4 GB device E.g., 64 blocks of 32 pages of 8192 bytes We primarily test with much smaller “devices” (using software simulation) 6 blocks of 4 pages of 64 bytes Forces flash file system to compact storage more often Tests assumptions about how space is used on flash Forces more multi-page writes and directory entries over multiple pages

15 Downwards Scalability: JPL Example Easier to explore various combinations of states of blocks/pages of the device Used page Free page Dirty page Bad block

16 Controllability Other important themes for controllability Network/file access If program reads from the network or to remote files, this is hard to control Again, simulation and stubbing are key System calls Similarly, reading the time from the operating system can be hard to control Simulation and stubbing – Operating System Abstraction Layer etc. GUI control Allow scripted control of GUI elements so tests can be automated

17 Observability: Assertions Assertions improve observability by making (some) errors into failures Even if the effect of a fault doesn’t propagate, it may be visible if an assertion checks the state at the right time Assertions also improve observability by making the error, rather than failure, visible Know how the state was corrupted directly, not just eventual effect

18 Observability: Invariant Checkers Can extend the idea of assertions to writing “full” invariant checkers Do a crawl of code’s basic data structures Check various invariants that would be too expensive to check at runtime Invariant checker can be written to be easy-to-use: recursion, memory allocation, etc. Won’t run on actual system But be careful! If your invariant checker has a bug and changes the system state...

19 Observability Other important themes for observability Logging Especially critical for GUI interfaces, to mirror GUI events in ordered parseable messages Network/file access If program writes to the network or to remote files, this is hard to observe

20 Controllability & Observability: Memory Allocation More extreme case: embedded code for mission or safety critical systems May be running without memory protection Dynamic allocation often forbidden Design module to accept a static block allocated elsewhere, and only access this memory Controllability: allows us to introduce memory faults, simulate warm reboots Observability: allows us to easily instrument code with low-overhead checks to find memory safety violations during testing

21 Coverage Literature of software testing is primarily concerned with various notions of coverage Ammann and Offutt identify four basic kinds of coverage: Graph coverage Logic coverage Input space partitioning Syntax-based coverage

22 Graph Coverage Cover all the nodes, edges, or paths of some graph related to the program Examples: Statement coverage Branch coverage Path coverage Data flow (def-use) coverage Model-based testing coverage Many more – most common kind of coverage, by far

23 Graph Coverage Most FSM testing algorithms can be seen as graph coverage Consider VC – computing a spanning tree to nodes is standard graph exploration Beizer: “find a graph and cover it”

24 Statement/Basic Block Coverage if (x < y) { y = 0; x = x + 1; } else { x = y; } x >= yx < y x = y y = 0 x = x + 1 if (x < y) { y = 0; x = x + 1; } x >= y x < y y = 0 x = x + 1 Statement coverage: Cover every node of these graphs Treat as one node because if one statement executes the other must also execute (code is a basic block)

25 Branch Coverage if (x < y) { y = 0; x = x + 1; } else { x = y; } x >= yx < y x = y y = 0 x = x + 1 if (x < y) { y = 0; x = x + 1; } x >= y x < y y = 0 x = x + 1 Branch coverage vs. statement coverage: Same for if-then-else But consider this if-then structure. For branch coverage can’t just cover all nodes, but must cover all edges – get to node 3 both after 2 and without executing 2!

26 Path Coverage if (x < y) { y = 0; x = x + 1; } else { x = y; } if (x < y) { y = 0; x = x + 1; } x >= yx < y x = y y = 0 x = x x >= y x < y y = 0 x = x + 1 How many paths through this code are there? Need one test case for each to get path coverage To get statement and branch coverage, we only need two test cases: and Path coverage needs two more: In general: exponential in the number of conditional branches!

27 Data Flow (Def-Use) Coverage x = 3; y = 3; if (w) { x = y + 2; } if (z) { y = x – 2; } n = x + y !z z y = x !w w x = y n = x + y x = 3 y = 3 Def(x) Def(y) Def(x) Use(y) Use(x) Def(y) Annotate program with locations where variables are defined and used (very basic static analysis) Def-use pair coverage requires executing all possible pairs of nodes where a variable is first defined and then used, without any intervening re-definitions E.g., this path covers the pair where x is defined at 1 and used at 7: But this path does NOT: May be many pairs, some not actually executable

28 Logic Coverage if (((a>b) || G)) && (x < y)) { y = 0; x = x + 1; } 312 ((a = y) ((a>b) || G)) && (x < y) y = 0 x = x + 1 if (x < y) { y = 0; x = x + 1; } What if, instead of: we have: Now, branch coverage will guarantee that we cover all the edges, but does not guarantee we will do so for all the different logical reasons We want to test the logic of the guard of the if statement

29 Active Clause Coverage ( (a > b) or G ) and (x < y) 1 T F T T 2 F F T F duplicate 3 F T T T 4 F F T F 5 T T T T 6 T T F F With these values for G and (x b) determines the value of the predicate With these values for (a>b) and (x<y), G determines the value of the predicate With these values for (a>b) and G, (x<y) determines the value of the predicate

30 Input Domain Partitioning Partition scheme q of domain D The partition q defines a set of blocks, Bq = b 1, b 2, … b Q The partition must satisfy two properties: 1. blocks must be pairwise disjoint (no overlap) 2. together the blocks cover the domain D (complete) b i  b j = ,  i  j, b i, b j  B q b1b1 b2b2 b3b3  b = D b  Bq Coverage then means using at least one input from each of b 1, b 2, b 3,...

31 Input Domain Partitioning Some subtleties here… What’s wrong with this partition of file contents? { b 1 : Sorted ascending file b 2 : Sorted descending file b 3 : Neither sorted ascending nor sorted descending } b i  b j = ,  i  j, b i, b j  B q b1b1 b2b2 b3b3  b = D b  Bq

32 Syntax-Based Coverage Based on mutation testing (a pet topic of Amman and Offutt, who are heavily into this research area) Bit different kind of creature than the other coverages we’ve looked at Idea: generate many syntactic mutants of the original program Coverage: how many mutants does a test suite kill (detect)?

33 Mutating Our Buggy Program int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == x) return i; } return -1; }

34 Mutant #1 int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n; i > 0; i--) { if (a[i] == x) return i; } return -1; }

35 Mutant #2 int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == x) return i; } return 0; }

36 Mutant #3 int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] != x) return i; } return -1; }

37 Mutant #4 int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == n) return i; } return -1; }

38 Mutant #5: Wait, this one’s the fix! int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i >= 0; i--) { if (a[i] == x) return i; } return -1; }

39 Syntax-Based Coverage Program P MUTANTS OF P 100% coverage means you kill all the mutants with your test suite

40 Generation vs. Recognition Generation of tests based on coverage means producing a test suite to achieve a certain level of coverage As you can imagine, generally very hard Consider: generating a suite for 100% statement coverage easily reaches “solving the halting problem” level Obviously hard for, say, mutant-killing Recognition means seeing what level of coverage an existing test suite reaches

41 Coverage and Subsumption Sometimes one coverage approach subsumes another If you achieve 100% coverage of criteria A, you are guaranteed to satisfy B as well For example, consider node and edge coverage (there’s a subtlety here, actually – can you spot it?) What does this mean? Unfortunately, not a great deal If test suite X satisfies “stronger” criteria A and test suite Y satisfies “weaker” criteria B Y may still reveal bugs that X does not! For example, consider our running example and statement vs. branch coverage It means we should take coverage with a grain of salt, for one thing

42 Testing “for” Coverage Never seek to improve coverage just for the sake of increasing coverage Well, unless it’s a command from-on-high Coverage is not the goal Finding failures that expose faults is the goal No amount of coverage will prove that the program cannot fail “Program testing can be used to show the presence of bugs, but never to show their absence!” – E. Dijkstra, Notes On Structured Programming

43 The Purpose of Testing Dijkstra meant this as a criticism of testing and an argument in favor of more disciplined and total approaches (proving programs correct) But he also points out what testing is good for: exposing errors Coverage is valuable if and only if test sets with higher coverage are more likely to expose failures “Program testing can be used to show the presence of bugs, but never to show their absence!” – E. Dijkstra, Notes On Structured Programming

44 The Purpose of Testing When we first start “testing,” we often want to “see that the program works” Try out some scenarios and watch the program “do its stuff” Surprised (annoyed) when (if) the program fails This is not really testing: testing is not the same as a demonstration Aim to break (your) code, if it can be broken “Program testing can be used to show the presence of bugs”

45 Levels of Testing Adapted from Beizer, by Amman and Offutt Level 0: Testing is debugging Level 1: Testing is to show the program works Level 2: Testing is to show the program doesn’t work Level 3: Testing is not to prove anything specific, but to reduce risk of using program Level 4: Testing is a mental discipline that helps develop higher quality software

46 What’s So Good About Coverage? Consider a fault that causes failure every time the code is executed Don’t execute the code: cannot possibly find the fault! That’s a pretty good argument for statement coverage int findLast (int a[], int n, int x) { // Returns index of last element // in a equal to x, or -1 if no // such. n is length of a int i; for (i = n-1; i >= 0; i--) { if (a[i] = x) return i; } return 0; }

47 What’s So Good About Coverage? We should have an argument for any kind of coverage: “If I don’t cover this, then there is more chance I’ll miss a fault like that” Backed with empirical data, preferably! int findLast (int a[], int n, int x) { // Returns index of last element // in a equal to x, or -1 if no // such. n is length of a int i; for (i = n-1; i >= 0; i--) { if (a[i] = x) return i; } return 0; }

48 Return to Our Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == x) return i; } return -1; } Let’s write a tester for this version of the program (back to the first off-by-one bug) Forget for a moment that we know what the bug is!

49 Return to Our Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == x) return i; } return -1; } What kind of coverage might we want to think about when testing this code?

50 Return to Our Example #define N 5 // 5 is “big enough”? int testFind () { int a[N]; int p, i; for (p = 0; p < N; p++) { random_assign(a, N) a[p] = 3; for (i = p; i < N; i++) { if (a[i] == 3) a[i] = a[i] – 1; } printf (“TEST: findLast({”); print_array(a, N); printf (“}, %d, 3)”, N); assert (findLast(a, N, 3) == p); } What kind of coverage does this tester exploit?

51 Coloretto Simplified Let’s start testing with a “simple” program Similar to project target A game System works as a simple state transformer C structure represents current state of game Actions (library calls) change the game state to reflect what the players are doing

52 Coloretto Simplified 2 player game Deck of 45 cards: 15 red+15 blue+15green Three rows in which to place cards: Row 0 Row 1 Row 2

53 Coloretto Simplified On your turn you can either: Take a (possibly empty) row Draw a card and put it in an empty spot If all rows are full you MUST take a row Once you have taken a row, you are done until both players take a row – other player may get multiple turns Once both players have taken a row, clear rows and start again with player who took last row Row 0 Row 1 Row 2

54 Coloretto Simplified Game end is triggered when there are only 5 cards left in the deck After both players take a row in such a state, the game ends Scoring: each player scores the square of the number of cards of their two highest counts, MINUS the square of their lowest count Row 0 Row 1 Row Score = 5^2 + 7^2 – 5^2 = 49

55 A bit of play Row 0 Row 1 Row 2 Player 0 draws a card

56 A bit of play Row 0 Row 1 Row 2 Player 0 puts it in a row

57 A bit of play Row 0 Row 1 Row 2 Player 1 draws a card

58 A bit of play Row 0 Row 1 Row 2 Player 1 puts it in a row

59 A bit of play Row 0 Row 1 Row 2 Player 0 takes row 2

60 A bit of play Row 0 Row 1 Player 1 draws a card and places it

61 A bit of play Row 0 Row 1 Player 1 draws another card and places it

62 A bit of play Row 0 Row 1 Player 1 MUST now take a row

63 A bit of play Row 0 Row 1 Row 2 Discard the leftover card and start again